Deploy OTEL as a Daemonset on Kubernetes to log Groundplex metrics and alerts

You can monitor a Groundplex deployed in Kubernetes (K8s) with Datadog, or any observablity tool that supports OpenTelemetry (OTEL). This requires deployment of an OTEL collector to harvest the metrics and the logs that Groundplex nodes generate. The OTEL collection daemons should run in the same pods as the Groundplex nodes. You can achieve this by deploying the OTEL collector as a K8s Daemonset. For more details, refer to the OTEL documentation: Important Components for Kubernetes.

The GitHub open-telemetry-collector-contrib repository provides resources for deploying an OTEL collector in Kubernetes. This page shows how to deploy the collector as a daemon set and view log output in Datadog, using resources from the GitHub repo.

Prerequisites:

  • A working K8s cluster

  • The configmap.yaml, service.yaml, roles.yaml, serviceaccount.yaml and opentelemetry.yaml files from the open-telemetry-collector-contrib repository (available in an attached zip file for your convenience)

  • A prepared SnapLogic Groundplex Kubernetes configuration

  • Your Datadog API key

  • A few pipelines running on the Groundplex to produce the events to log

Get started

We’ve provided a ZIP file containing the pre-configured YAML files required by Kubernetes. To add your Datadog API key, you only need to edit one of the files.

  1. Download and extract the file.

  2. Open configmap.yaml.

  3. On line 60, replace <DD api key> with your Datadog API key. For example: 60f0**************************1c

  4. View the comments and configuration:

    apiVersion: v1 kind: ConfigMap metadata: name: otel-agent-conf labels: app: opentelemetry component: otel-agent-conf data: otel-agent-config: | receivers: otlp: protocols: grpc: http: # The hostmetrics receiver is required to get correct infrastructure metrics in Datadog. hostmetrics: collection_interval: 10s scrapers: paging: metrics: system.paging.utilization: enabled: true cpu: metrics: system.cpu.utilization: enabled: true disk: filesystem: metrics: system.filesystem.utilization: enabled: true load: memory: network: processes: filelog: include_file_path: true poll_interval: 500ms include: # This will ensure that logs from the following path are collected. - /var/log/**/*otel-collector*/*.log # # Uncomment this block below to get access to system metrics regarding # # the OpenTelemetry Collector and its environment, such as spans or metrics # # processed, running and sent, queue sizes, uptime, k8s information # # and much more. # # # The prometheus receiver scrapes essential metrics regarding the OpenTelemetry Collector. # prometheus: # config: # scrape_configs: # - job_name: 'otelcol' # scrape_interval: 10s # static_configs: # - targets: ['0.0.0.0:8888'] exporters: logging: datadog: api: key: <DD api key> processors: resourcedetection: # ensures host.name and other important resource tags # get picked up detectors: [system, env, docker] timeout: 5s override: false # adds various tags related to k8s k8sattributes: passthrough: false auth_type: "serviceAccount" pod_association: - sources: - from: resource_attribute name: k8s.pod.ip - sources: - from: resource_attribute name: k8s.pod.uid - sources: # If neither of those work, use the request's connection to get the pod IP. - from: connection extract: metadata: - k8s.pod.name - k8s.pod.uid - k8s.deployment.name - k8s.node.name - k8s.namespace.name - k8s.pod.start_time - k8s.replicaset.name - k8s.replicaset.uid - k8s.daemonset.name - k8s.daemonset.uid - k8s.job.name - k8s.job.uid - k8s.cronjob.name - k8s.statefulset.name - k8s.statefulset.uid - container.image.name - container.image.tag - container.id - k8s.container.name - container.image.name - container.image.tag - container.id labels: - tag_name: kube_app_name key: app.kubernetes.io/name from: pod - tag_name: kube_app_instance key: app.kubernetes.io/instance from: pod - tag_name: kube_app_version key: app.kubernetes.io/version from: pod - tag_name: kube_app_component key: app.kubernetes.io/component from: pod - tag_name: kube_app_part_of key: app.kubernetes.io/part-of from: pod - tag_name: kube_app_managed_by key: app.kubernetes.io/managed-by from: pod batch: # Datadog APM Intake limit is 3.2MB. Let's make sure the batches do not # go over that. send_batch_max_size: 1000 send_batch_size: 100 timeout: 10s service: # This will make the collector output logs in JSON format telemetry: logs: encoding: "json" initial_fields: # Add the service field to every log line. It can be used for filtering in Datadog. - service: "otel-collector" pipelines: metrics: receivers: [hostmetrics, otlp] processors: [resourcedetection, k8sattributes, batch] exporters: [logging, datadog] traces: receivers: [otlp] processors: [resourcedetection, k8sattributes, batch] exporters: [logging, datadog] logs: receivers: [filelog, otlp] processors: [batch] exporters: [logging, datadog]

Kubernetes requires a service definition, role, and service account for the otel-collector. These are configured in the service.yaml, roles.yaml, and serviceaccount.yaml manifests. Refer to the Kubernetes documentation on services.

The service.yaml file

View the service.yaml file extracted from the zip file. The following entries configure the otel-collector service. The example defines both GRPC and HTTP ports for your reference.

Datadog does not use the HTTP port. If you are not using Datadog, check your tool documentation for its requirements.

apiVersion: v1 kind: Service metadata: name: otel-collector spec: ports: - name: grpc-otlp port: 4317 protocol: TCP targetPort: 4317 - name: http-otlp port: 4318 protocol: TCP targetPort: 4318 selector: app.kubernetes.io/name: otel-collector type: ClusterIP

The roles.yaml file

Open the roles.yaml and view the otel-collector-role, its rules, and binding as shown below.

apiVersion: rbac.authorization.k8s.io/v1 kind: ClusterRole metadata: name: otel-collector-role rules: rules: - apiGroups: - '' resources: - 'pods' - 'namespaces' verbs: - 'get' - 'watch' - 'list' - apiGroups: - 'apps' resources: - 'replicasets' verbs: - 'get' - 'list' - 'watch' - apiGroups: - 'extensions' resources: - 'replicasets' verbs: - 'get' - 'list' - 'watch' --- apiVersion: rbac.authorization.k8s.io/v1 kind: ClusterRoleBinding metadata: name: otel-collector subjects: - kind: ServiceAccount name: otel-collector-account namespace: default roleRef: kind: ClusterRole name: otel-collector-role apiGroup: rbac.authorization.k8s.io

 

The serviceaccount.yaml file

Open the serviceaccount.yaml file and view the service account definition:

The daemonset.yaml file

View the daemonset.yaml file and note that the otel-collector contrib image version on line 26:

Deploy the OTEL collector

  1. Save the pre-configured Kubernetes files in the /etc/kubernetes/manifests/ folder of your K8s installation.

  2. Execute the following commands in the /etc/kubernetes/manifests/ folder:

After Kubernetes applies the files, the OTEL collector starts up. Use the kubectl get all command to confirm. The relevant lines are highlighted in the following screenshot.

Deploy Snaplogic Groundplex nodes to Kubernetes

After deploying a Groundplex as described in Install a Groundplex on Kubernetes, add the information to connect the otel-collector to the Groundplex nodes. You can use the name of the K8s service or the CLUSTER-IP:

  • The name won’t change on re-deployment, but the IP address can. Our example uses the name for this reason.

  • CLUSTER-IP is preferable if you plan to change the name of the service.

Find these values by executing the kubectl cluster-info command to view the services:

  1. Open deployment.yaml and deployment-feed.yaml from the Groundplex installation helm_chart/templates directory. Add the following in the spec: containers: section.

  2. Save the file and make sure that all other installation steps are finished.

  3. Execute the command to start the Groundplex nodes:

  1. Wait until the nodes spin up and are visible in SnapLogic Monitor.

  2. Execute some pipelines on the Groundplex.

  3. In Datadog, check the logs. You should see log entries related to asset executions. Log entries contain nodeLabel info, which is helpful for understanding which node asset was executed:

    Check the values on the Metrics page:

    You can use widgets to create a dashboard that captures the most important metrics:






    Â