You can monitor a Groundplex deployed in Kubernetes (K8s) with Datadog, or any observablity tool that supports OpenTelemetry (OTEL). This requires deployment of an OTEL collector to harvest the metrics and the logs that Groundplex nodes generate. The OTEL collection daemons should run in the same pods as the Groundplex nodes. You can achieve this by deploying the OTEL collector as a K8s Daemonset. For more details, refer to the OTEL documentation: Important Components for Kubernetes.
The GitHub open-telemetry-collector-contrib repository provides resources for deploying an OTEL collector in Kubernetes. This page shows how to deploy the collector as a daemon set and view log output in Datadog, using resources from the GitHub repo.
Prerequisites:
A working K8s cluster
The
configmap.yaml
,service.yaml
,roles.yaml
,serviceaccount.yaml
andopentelemetry.yaml
files from the open-telemetry-collector-contrib repository (available in an attached zip file for your convenience)A prepared SnapLogic Groundplex Kubernetes configuration
Your Datadog API key
A few pipelines running on the Groundplex to produce the events to log
Get started
We’ve provided a ZIP file containing the pre-configured YAML files required by Kubernetes. To add your Datadog API key, you only need to edit one of the files.
Download and extract the
file.Open
configmap.yaml
.On line 60, replace
<DD api key>
with your Datadog API key. For example:60f0**************************1c
View the comments and configuration:
apiVersion: v1 kind: ConfigMap metadata: name: otel-agent-conf labels: app: opentelemetry component: otel-agent-conf data: otel-agent-config: | receivers: otlp: protocols: grpc: http: # The hostmetrics receiver is required to get correct infrastructure metrics in Datadog. hostmetrics: collection_interval: 10s scrapers: paging: metrics: system.paging.utilization: enabled: true cpu: metrics: system.cpu.utilization: enabled: true disk: filesystem: metrics: system.filesystem.utilization: enabled: true load: memory: network: processes: filelog: include_file_path: true poll_interval: 500ms include: # This will ensure that logs from the following path are collected. - /var/log/**/*otel-collector*/*.log # # Uncomment this block below to get access to system metrics regarding # # the OpenTelemetry Collector and its environment, such as spans or metrics # # processed, running and sent, queue sizes, uptime, k8s information # # and much more. # # # The prometheus receiver scrapes essential metrics regarding the OpenTelemetry Collector. # prometheus: # config: # scrape_configs: # - job_name: 'otelcol' # scrape_interval: 10s # static_configs: # - targets: ['0.0.0.0:8888'] exporters: logging: datadog: api: key: <DD api key> processors: resourcedetection: # ensures host.name and other important resource tags # get picked up detectors: [system, env, docker] timeout: 5s override: false # adds various tags related to k8s k8sattributes: passthrough: false auth_type: "serviceAccount" pod_association: - sources: - from: resource_attribute name: k8s.pod.ip - sources: - from: resource_attribute name: k8s.pod.uid - sources: # If neither of those work, use the request's connection to get the pod IP. - from: connection extract: metadata: - k8s.pod.name - k8s.pod.uid - k8s.deployment.name - k8s.node.name - k8s.namespace.name - k8s.pod.start_time - k8s.replicaset.name - k8s.replicaset.uid - k8s.daemonset.name - k8s.daemonset.uid - k8s.job.name - k8s.job.uid - k8s.cronjob.name - k8s.statefulset.name - k8s.statefulset.uid - container.image.name - container.image.tag - container.id - k8s.container.name - container.image.name - container.image.tag - container.id labels: - tag_name: kube_app_name key: app.kubernetes.io/name from: pod - tag_name: kube_app_instance key: app.kubernetes.io/instance from: pod - tag_name: kube_app_version key: app.kubernetes.io/version from: pod - tag_name: kube_app_component key: app.kubernetes.io/component from: pod - tag_name: kube_app_part_of key: app.kubernetes.io/part-of from: pod - tag_name: kube_app_managed_by key: app.kubernetes.io/managed-by from: pod batch: # Datadog APM Intake limit is 3.2MB. Let's make sure the batches do not # go over that. send_batch_max_size: 1000 send_batch_size: 100 timeout: 10s service: # This will make the collector output logs in JSON format telemetry: logs: encoding: "json" initial_fields: # Add the service field to every log line. It can be used for filtering in Datadog. - service: "otel-collector" pipelines: metrics: receivers: [hostmetrics, otlp] processors: [resourcedetection, k8sattributes, batch] exporters: [logging, datadog] traces: receivers: [otlp] processors: [resourcedetection, k8sattributes, batch] exporters: [logging, datadog] logs: receivers: [filelog, otlp] processors: [batch] exporters: [logging, datadog]
Kubernetes requires a service definition, role, and service account for the otel-collector.
These are configured in the service.yaml
, roles.yaml
, and serviceaccount.yaml
manifests. Refer to the Kubernetes documentation on services.
The service.yaml file
View the service.yaml
file extracted from the zip file. The following entries configure the otel-collector
service. The example defines both GRPC and HTTP ports for your reference.
Datadog does not use the HTTP port. If you are not using Datadog, check your tool documentation for its requirements.
apiVersion: v1 kind: Service metadata: name: otel-collector spec: ports: - name: grpc-otlp port: 4317 protocol: TCP targetPort: 4317 - name: http-otlp port: 4318 protocol: TCP targetPort: 4318 selector: app.kubernetes.io/name: otel-collector type: ClusterIP
The roles.yaml file
Open the roles.yaml
and view the otel-collector-role
, its rules, and binding as shown below.
apiVersion: rbac.authorization.k8s.io/v1 kind: ClusterRole metadata: name: otel-collector-role rules: rules: - apiGroups: - '' resources: - 'pods' - 'namespaces' verbs: - 'get' - 'watch' - 'list' - apiGroups: - 'apps' resources: - 'replicasets' verbs: - 'get' - 'list' - 'watch' - apiGroups: - 'extensions' resources: - 'replicasets' verbs: - 'get' - 'list' - 'watch' --- apiVersion: rbac.authorization.k8s.io/v1 kind: ClusterRoleBinding metadata: name: otel-collector subjects: - kind: ServiceAccount name: otel-collector-account namespace: default roleRef: kind: ClusterRole name: otel-collector-role apiGroup: rbac.authorization.k8s.io
The serviceaccount.yaml file
Open the serviceaccount.yaml
file and view the service account definition:
apiVersion: v1 kind: ServiceAccount metadata: name: otel-collector-account namespace: default
The daemonset.yaml file
View the daemonset.yaml
file and note that the otel-collector contrib
image version on line 26:
apiVersion: apps/v1 kind: DaemonSet metadata: name: otel-agent labels: app: opentelemetry component: otel-collector spec: selector: matchLabels: app: opentelemetry component: otel-collector template: metadata: labels: app.kubernetes.io/name: otel-collector app: opentelemetry component: otel-collector spec: serviceAccountName: otel-collector-account containers: - name: collector command: - "/otelcol-contrib" - "--config=/conf/otel-agent-config.yaml" image: otel/opentelemetry-collector-contrib:0.101.0 resources: limits: cpu: 1 memory: 2Gi requests: cpu: 200m memory: 400Mi ports: - containerPort: 4318 # default port for OpenTelemetry HTTP receiver. hostPort: 4318 - containerPort: 4317 # default port for OpenTelemetry gRPC receiver. hostPort: 4317 - containerPort: 8888 # Default endpoint for querying metrics. volumeMounts: - name: otel-agent-config-vol mountPath: /conf - name: varlogpods mountPath: /var/log/pods readOnly: true - name: varlibdockercontainers mountPath: /var/lib/docker/containers readOnly: true env: - name: POD_IP valueFrom: fieldRef: fieldPath: status.podIP # The k8s.pod.ip is used to associate pods with k8sattributes. # It is useful to have in the Collector pod because receiver metrics can also # benefit from the tags. - name: OTEL_GRPC_URL value: "k8s.pod.ip=$(POD_IP):4317" volumes: - name: otlpgen hostPath: path: /otlpgen - name: otel-agent-config-vol configMap: name: otel-agent-conf items: - key: otel-agent-config path: otel-agent-config.yaml # Mount nodes log file location. - name: varlogpods hostPath: path: /var/log/pods - name: varlibdockercontainers hostPath: path: /var/lib/docker/containers
Deploy the OTEL collector
Save the pre-configured Kubernetes files in the
/etc/kubernetes/manifests/
folder of your K8s installation.Execute the following commands in the
/etc/kubernetes/manifests/
folder:
kubectl apply -f configmap.yaml kubectl apply -f serviceaccount.yaml kubectl apply -f roles.yaml kubectl apply -f service.yaml kubectl apply -f daemonset.yaml
After Kubernetes applies the files, the OTEL collector starts up. Use the kubectl get all
command to confirm. The relevant lines are highlighted in the following screenshot.
Deploy Snaplogic Groundplex nodes to Kubernetes
After deploying a Groundplex as described in Install a Groundplex on Kubernetes, add the information to connect the otel-collector
to the Groundplex nodes. You can use the name of the K8s service or the CLUSTER-IP
:
The name won’t change on re-deployment, but the IP address can. Our example uses the name for this reason.
CLUSTER-IP
is preferable if you plan to change the name of the service.
Find these values by executing the kubectl cluster-info
command to view the services:
Open
deployment.yaml
anddeployment-feed.yaml
from the Groundplex installationhelm_chart/templates
directory. Add the following in thespec: containers:
section.- name: OTEL_GRPC_URL value: http://otel-collector:4317 - name: POD_IP valueFrom: fieldRef: fieldPath: status.podIP - name: OTEL_RESOURCE_ATTRIBUTES value: "k8s.pod.ip=$(POD_IP)" - name: HOST_IP valueFrom: fieldRef: fieldPath: status.hostIP
Save the file and make sure that all other installation steps are finished.
Execute the command to start the Groundplex nodes:
helm install <snaplogic_name> <name of helm chart folder>
Wait until the nodes spin up and are visible in SnapLogic Monitor.
Execute some pipelines on the Groundplex.
In Datadog, check the logs. You should see log entries related to asset executions. Log entries contain
nodeLabel
info, which is helpful for understanding which node asset was executed:Check the values on the Metrics page:
You can use widgets to create a dashboard that captures the most important metrics: