I'm using Fluentd as my log shipper in kubernetes, with a plugin extracting metadata using a RegExp. The plugin currently uses the following regexp -'var\.log\.containers\.(?<pod_name>[a-z0-9]([-a-z0-9]*[a-z0-9])?(\.[a-z0-9]([-a-z0-9]*[a-z0-9])?)*)_(?<namespace>[^_]+)_(?<container_name>.+)-(?<docker_id>[a-z0-9]{64})\.log#x27;
For parsing records structured like so -kubernetes.soluto.var.log.containers.my-nice-api-67459fc4f6-g9vk7_namespace-name_container-name-1e1eeab6b6ce257cf6a7a03057159f3b0873dcd5c0cc713cd8c43ed66c5b6b03.log
I'm trying to alter the regexp so it will separate the pod_name value into 2 parts -
the pod name derived from the deployment name/explicitly given by the .yaml configuration
The hash appended to the pod_name (separated from the pod_name by 2 hyphens)
I could've used the hyphen character as a delimiter and be done with it easily, but since the pod_name itself may contain hyphens this is impossible.
I therefore have to find a pattern that will capture a group that will contain the pattern without the last two hyphen delimited parts.
I've constructed this regexp and played around with it, but it does not behave as expected. Would love any help with this.
if the group named "pod_name" should just contain the pod name derived from the deployment name, try this:
kubernetes\.var\.log\.containers\.(?:(?<pod_name>[a-z0-9]+(?:-[a-z0-9]+)*))-(?<=-)[a-z0-9]+-(?:(?<=-)[a-z0-9]+(?=_))_(?<namespace>[^_]+)_(?<container_name>.+)-(?<docker_id>[a-z0-9]{64})\.logif the group named "pod_name" should contain both parts, i.e. the pod name derived from the deployment name and the hash appended to it, try this:
kubernetes\.var\.log\.containers\.(?:(?<pod_name>[a-z0-9]+(?:-[a-z0-9]+)*-(?<=-)[a-z0-9]+-(?:(?<=-)[a-z0-9]+(?=_))))_(?<namespace>[^_]+)_(?<container_name>.+)-(?<docker_id>[a-z0-9]{64})\.log