Third-party Observability Tools Integration (Beta)

In this article

Overview

This feature is a private beta. Contact your CSM to set up your organization with SnapLogic third-party Observability Tools Integration.

IT organizations often leverage centralized tools to monitor their systems. Typically, these centralized tools enable IT groups to do the following:

  • Avoid reliance on the tool to monitor itself.

  • Observe reduced Total Cost of Ownership (TCO) to manage the monitoring, alerts, and notification system in one place.

  • Manage the history by the retention of the data for audits.

The SnapLogic® Third-party Observability feature offers the capability to integrate pipeline runtime logs with your third-party monitoring tools. This feature enables your IT organization to track, troubleshoot, analyze, and optimize your integrations in production environments.

The SnapLogic platform uses OpenTelemetry to support telemetry data integration with third-party Observability tools. This beta feature release includes the implementation of the service to enable you to monitor your pipeline execution runtime logs in Datadog and New Relic.

Supported Third-Party Observability Tools

  • Datadog

  • New Relic

Support for additional third-party monitoring tools such as Azure Monitor, Grafana, Kafka, Prometheus, Splunk, is on the product roadmap.

Prerequisites

  • Must have Groundplexes

  • Must enable Open Telemetry Feature Flag for your Org

Support for Cloudplexes is on the product roadmap.

Implement OTEL Services

OpenTelemetry is an open-source, vendor-agnostic observability framework and toolkit designed to create and manage OpenTelemetry data. For the SnapLogic application, you can capture metrics and logs.

How OpenTelemetry Works

The OpenTelemetry Collector receives, processes, and exports OpenTelemetry data. When you deploy the OpenTelemetry service, the Collector retrieves the data from the JCC node logs and routes them to your third-party monitoring tool using the OpenTelemetry Protocol (OTLP).

The following three components comprise the OpenTelemetry Collector:

  • Receiver - Defines how the logs are sent to the OpenTelemetry Collector.

  • Processor - Defines the method of log retrieval. In most cases, batch mode is preferred.

  • Exporter - Specifies the third-party tool.

The diagram below provides shows the architecture of the OpenTelemetry service that is implemented in the SnapLogic Platform.

SnapLogic OpenTelemetry architecture

This document describes only the retrieval of logs sent in and supported by the OpenTelemetry format, not the original log format from the Groundplex JCC node. Support for capturing the log data directly from the Groundplex JCC node is on the product roadmap.

Workflow

  1. Enable the feature flag for your Org.

  2. Set up the OpenTelemetry Service.

  3. Run your pipelines.

  4. Observe pipeline runtime data in the monitoring tool.

Enable Feature Flag for the OpenTelemetry Service

Work with your CSM to enable the following feature flag to your target Orgs/Environments:

"com.snaplogic.cc.log.RuntimeLogWriter.OPEN_TELEMETRY_LOGGER_ENABLED": true

Datadog Monitoring Tools Support Workflow

Step 1: Install the OpenTelemetry package.

Step 2: Configure the OpenTelemetry service.

Step 3: Deploy the OpenTelemetry service.

Step 1: Install the OpenTelemetry Package

  1. Download the OpenTelemetry Collector Contrib package.

  2. Save the package on the same JCC node of the machine that hosts the Groundplex.

Step 2: Configure the OpenTelemetry Services

  1. Create the YAML configuration file.
    You can use the following template:

  1. Specify the gRPC URL in the YAML template. This is in the environment variable in your host machine.

  2. For batch-mode processing of the logs, define the following values for the processors.

    • send_batch_max_size: 100

    • send_batch_size: 10

    • timeout: 10s

  3. For exporters, add the values for your Datadog API:

The following is an example of a YAML configuration file:

Step 3: Deploy the OpenTelemetry Service

  1. From the JCC node with the OpenTelemetry package, you can deploy on any Groundplexes running the following:

    • Linux

    • Microsoft Windows

    • Docker container

  2. (Optional) To run the service on a separate Docker container, run the following Docker commands (where $DD_API_KEY is the Datadog API key environment variable).

  1. Test the OpenTelemetry Service by running some pipelines on the target Groundplexes, then check your monitoring tool for the runtime data.

Access Additional Metrics

In the November release, a metrics collection has become available to the OpenTelemetry service. These metrics can be transmitted to your monitoring tool by uncommenting the following lines in the YAML template:

# Data sources: metrics filter: metrics: include: match_type: metric_names:

To choose which metrics to use for reporting, filter by regex, as shown in the YAML setting below:

Refer to the Metrics Reference for definitions of each metric.

Monitor Logs in Datadog

You should start seeing logs after OpenTelemetry Collector is started. When you run a pipeline, the details are captured in the pipeline runtimes.

Notice that some log files are generated with the runtime.

In the Datadog UI, we can observe the log files immediately.

 

Click the log file to open up a details pane with real-time information.

New Relic Platform Observability Support Workflow

The SnapLogic application supports integrations with the New Relic platform. You can stream pipeline runtime logs to a New Relic endpoint to track execution status and details. After implementing the OTEL Collector, you can use a YAML file to implement the service.

All pipeline runtime Logs in New Relic with 100 ms replication lag or less are observable. All execution logs contain the Status of INFO.

Step 1: Install the OpenTelemetry Package

  1. Download the OpenTelemetry Collector Contrib package.

  2. Save the package on the same JCC node of the machine that hosts the Groundplex.

Step 2: Prepare the YAML File

Download the YAML file, shown below.

extensions: health_check: pprof: endpoint: 0.0.0.0:1777 zpages: endpoint: 0.0.0.0:55679 receivers: otlp: protocols: grpc: http: opencensus: jaeger: protocols: grpc: thrift_binary: thrift_compact: thrift_http: zipkin: processors: batch: end_batch_max_size: 100 send_batch_size: 10 timeout: 10s exporters: otlp: endpoint: https://otlp.nr-data.net:4317 headers: "api-key": <NEW RELIC LICENSE API KEY> logging: verbosity: detailed service: pipelines: traces: receivers: [otlp, opencensus, jaeger, zipkin] processors: [batch] exporters: [logging] metrics: receivers: [otlp, opencensus] processors: [batch] exporters: [otlp, logging] logs: receivers: [otlp] processors: [batch] exporters: [otlp, logging] extensions: [health_check, pprof, zpages]

Set up the Connection to New Relic

  1. Set up the OTEL Collector, using the steps in Step 1: Install the OpenTelemetry package.
    If you already have the OTEL Collector set up, go to the next step.

  2. Either create an account in the New Relic application, or if you already have an account, continue with the next step.

  3. Copy the API key from the Administration > API keys page in the New Relic UI.

     

  4. Save the YAML file to the following location: /etc/otelcol

  5. Add the API key to the YAML file under headers:

     

  6. Apply changes to the YAML file and restart the Open Telemetry service.

Monitor Runtime Logs in New Relic

Run some tasks or pipelines on the target Snaplex, and you can observe the summary of pipeline execution logs in the New Relic UI, as shown in the example image:

Click each log to open a view of the Log Details, as shown below:

Metrics Reference

Metric Name

Description

Metric Name

Description

Java Heap Metrics

plexnode.java.heap.used.bytes

Number of bytes used of the heap.

plexnode.java.heap.total.bytes

Number of bytes allocated for the heap.

plexnode.java.heap.used.pct

Percentage of the heap being used (used / total * 100%).

plexnode.java.heap.max.bytes

The maximum heap size.

plexnode.java.heap.used.of.max.pct

Percentage of the max heap being used (used / max * 100%). This is the same as what the product dashboard reflects.

Java Non-Heap Metrics

plexnode.java.nonheap.used.bytes

Number of bytes used of the off-heap.

plexnode.java.nonheap.total.bytes

Number of bytes allocated for the off-heap.

plexnode.java.nonheap.used.pct

Percentage of the off heap being used (used / total * 100%).

plexnode.java.nonheap.max.bytes

Number of bytes the JVM allowed to allocate off heap.

plexnode.java.nonheap.used.of.max.pct

Percentage of the max off heap being used (used / max * 100%).

CPU Usage Metrics

plexnode.cpu.vcpus.count

Number of vCPUs available on the machine.

plexnode.cpu.load.1min.average

Last minute load average on the machine.

plexnode.cpu.load.pct

System CPU utilization as % of total.

plexnode.cpu.process.load.pct

JCC Process CPU utilization as % of total.

Disk Usage Metrics

plexnode.disk.total.bytes

Overall amount of bytes available on the disk.

plexnode.disk.used.bytes

Amount of bytes used from the disk.

plexnode.disk.used.pct

Percentage of bytes used from the disk.

plexnode.disk.usable.bytes

Amount of bytes available on the disk.

plexnode.disk.usable.pct

Percentage of bytes available on the disk.

Memory Usage Metrics

plexnode.mem.physical.total.bytes

Overall amount of memory in bytes available on the machine.

plexnode.mem.physical.free.bytes

The amount of memory in bytes available for allocation on the machine.

plexnode.mem.physical.used.bytes

The amount of memory in bytes allocated on the machine.

plexnode.mem.physical.free.pct

Percentage of memory available for allocation on the machine.

plexnode.mem.physical.used.pct

Percentage of memory allocated on the machine.

plexnode.mem.swap.total.bytes

The amount of bytes allocated for swap memory on the machine.

plexnode.mem.swap.free.bytes

The amount of bytes available in the swap memory on the machine.

plexnode.mem.swap.used.bytes

The amount of bytes being allocated in the swap memory on the machine, -1 if the swap is not enabled.

plexnode.mem.swap.free.pct

Percentages of total swap memory available on the machine, or -1 if swap is not enabled.

plexnode.mem.swap.used.pct

Percentage of total swap memory allocated on the machine, or -1 if swap is not enabled.

plexnode.mem.virtual.committed.bytes

The amount of virtual memory that is guaranteed to be available to the running process in bytes.

File Utilization Metrics

plexnode.file.descriptor.used.count

Amount of used file descriptors in the system.

plexnode.file.descriptor.free.count

Amount of free file descriptors in the system.

plexnode.file.descriptor.max

Amount of available (configured) file descriptors in the system.

plexnode.file.descriptor.used.pct

Percentage of used file descriptors in the system.

plexnode.file.descriptor.free.pct

Percentage of free file descriptors in the system.

Thread Utilization Metrics

plexnode.thread.jvm.count

The current number of live threads including both daemon and non-daemon threads within JVM.

Slots Utilization Metrics

plexnode.slots.leased

Amount of slots leased at the moment.

NOTE: if the slots are leased and released in-between the scrapes, the change won’t be reflected in the values.

plexnode.slots.max

Amount of slots available (configured) on the node.

plexnode.slots.leased.meter

Amount of slots leased at the moment for the pipeline.

NOTE: if the slots are leased and released in-between the scrapes, the change won’t be reflected in the values.

Network IO Metrics

plexnode.net.received.bytes

Overall amount of bytes received through the interface.

plexnode.net.sent.bytes

Overall amount of bytes sent through the interface.

plexnode.net.received.packets

Overall amount of packets received though the interface.

plexnode.net.sent.packets

Overall amount of packets sent through the interface.

plexnode.net.in.errors

Overall amount of input errors through the interface.

plexnode.net.out.error

Overall amount of output errors through the interface.

plexnode.net.in.drops

Incoming/received dropped packets per interface. On Microsoft Windows, returns discarded incoming packets.

Pipeline Activity Metrics

plexnode.pipelines.intiated.meter

The pipeline is being initiated and being prepared/started for execution.

plexnode.pipelines.finished.meter

The pipeline finished its execution.

plexnode.pipelines.requested.meter

The pipeline has been requested to get ready but has not actually started. This happens for some flows of the UI/Designer, for example, when opening a configuration dialog for the Snap.

plexnode.pipelines.active.total

The number of active pipelines at the moment

Feedmaster Broker Metrics

plexnode.feedmaster.destination.enqueued

The amount of messages the consumer has read out of the queue (applies to Ultra Pipelines).

plexnode.feedmaster.destination.dequeued

The amount of messages the producer has written into the queue (applies to Ultra Pipelines).

Snaplex State Metrics

plexnode.state.leader

1 if node considers itself a leader, otherwise 0.

plexnode.state.neighbors.active

The number of neighbors visible to the node including itself. The node considers to be visible (active) from standing point of the other node if the heartbeat is successful and the state is either RUNNING or COOLDOWN.