Third-party Observability Tools Integration (Public Preview)
In this article
Overview
IT organizations often leverage centralized tools to monitor their systems. Typically, these centralized tools enable IT groups to do the following:
Avoid reliance on the tool to monitor itself.
Observe reduced Total Cost of Ownership (TCO) to manage the monitoring, alerts, and notification system in one place.
Manage the history by the retention of the data for audits.
The SnapLogic® Third-party Observability (Public Preview) feature offers the capability to integrate pipeline runtime logs with your third-party monitoring tools. This feature enables your IT organization to track, troubleshoot, analyze, and optimize your integrations in production environments.
The SnapLogic platform uses OpenTelemetry (OTEL) to support telemetry data integration with third-party observability tools. This public preview feature release includes the implementation of the service to enable you to monitor your pipeline execution runtime logs in Datadog and New Relic.
This feature requires a subscription. Contact your CSM to turn on the Open Telemetry feature.
Certified Third-Party Observability Tools
Datadog
New Relic
This integration solution is designed to be compatible with any vendor or open-source Observability tool that supports the OpenTelemetry Collector-based approach for the collection, processing, and export of telemetry data. Due to the extensive variety of tools available in the observability space, we currently do not plan to certify every tool on the market.
Prerequisites
Groundplexes are installed and set up.
The Open Telemetry feature subscription must be enabled for your Org/Environment
Support for Cloudplexes is on the product roadmap.
Implement OTEL Services
OpenTelemetry is an open-source, vendor-agnostic observability framework and toolkit designed to create and manage OpenTelemetry data. For the SnapLogic application, you can capture metrics and logs. OTEL reporting sends messages of Error severity levels to your connected 3rd-party monitoring tools. Messages of all severity levels are retained in the logs.
How OpenTelemetry Works
The OpenTelemetry Collector receives, processes, and exports OpenTelemetry data. When you deploy the OpenTelemetry service, the Collector retrieves the data from the JCC node logs and routes them to your third-party monitoring tool using the OpenTelemetry Protocol (OTLP).
The following three components comprise the OpenTelemetry Collector:
Receiver - Defines how the logs are sent to the OpenTelemetry Collector.
Processor - Defines the method of log retrieval. In most cases, batch mode is preferred.
Exporter - Specifies the third-party tool.
The diagram below provides shows the architecture of the OpenTelemetry service that is implemented in the SnapLogic Platform.
This document describes only the retrieval of logs sent in and supported by the OpenTelemetry format, not the original log format from the Groundplex JCC node. Support for capturing the log data directly from the Groundplex JCC node is on the product roadmap.
Workflow
Set up the OpenTelemetry Service.
Run your pipelines.
Observe pipeline runtime data in the monitoring tool.
Datadog Monitoring Tools Support Workflow
This page describes how to perform those steps to monitor Groundplexes deployed on Windows, Linux, or Docker. Refer to Deploy OTEL as a Daemonset for Groundplexes running on Kubernetes.
Install the OpenTelemetry Package
Download the OpenTelemetry Collector Contrib package.
Save the package on the same JCC node of the machine that hosts the Groundplex.
Configure the OpenTelemetry Services
Create the YAML configuration file.
You can use the following template:
Specify the gRPC URL in the YAML template. This is in the environment variable in your host machine.
For batch-mode processing of the logs, define the following values for the processors.
send_batch_max_size
: 100send_batch_size
: 10timeout
: 10s
For
exporters
, add the values for your Datadog API:site
: "http://datadoghq.com "key
: ${env:DD_API_KEY}
The following is an example of a YAML configuration file:
Deploy the OpenTelemetry Service
From the JCC node with the OpenTelemetry package, you can deploy on any Groundplexes running the following:
Linux
Microsoft Windows
Docker container
(Optional) To run the service on a separate Docker container, run the following Docker commands (where
$DD_API_KEY
is the Datadog API key environment variable).
Test the OpenTelemetry Service by running some pipelines on the target Groundplexes, then check your monitoring tool for the runtime data.
Access Additional Metrics
In the November release, a metrics collection has become available to the OpenTelemetry service. These metrics can be transmitted to your monitoring tool by uncommenting the following lines in the YAML template:
# Data sources: metrics
filter:
metrics:
include:
match_type:
metric_names:
To choose which metrics to use for reporting, filter by regex, as shown in the YAML setting below:
Refer to the Metrics Reference for definitions of each metric.
Monitor Logs in Datadog
You should start seeing logs after OpenTelemetry Collector is started. When you run a pipeline, the details are captured in the pipeline runtimes.
Notice that some log files are generated with the runtime.
In the Datadog UI, we can observe the log files immediately.
Click the log file to open up a details pane with real-time information.
New Relic Platform Observability Support Workflow
The SnapLogic application supports integrations with the New Relic platform. You can stream pipeline runtime logs to a New Relic endpoint to track execution status and details. After implementing the OTEL Collector, you can use a YAML file to implement the service.
All pipeline runtime Logs in New Relic with 100 ms replication lag or less are observable.
Step 1: Install the OpenTelemetry Package
Download the OpenTelemetry Collector Contrib package.
Save the package on the same JCC node of the machine that hosts the Groundplex.
Step 2: Prepare the YAML File
Download the YAML file, shown below.
extensions:
health_check:
pprof:
endpoint: 0.0.0.0:1777
zpages:
endpoint: 0.0.0.0:55679
receivers:
otlp:
protocols:
grpc:
http:
opencensus:
jaeger:
protocols:
grpc:
thrift_binary:
thrift_compact:
thrift_http:
zipkin:
processors:
batch:
end_batch_max_size: 100
send_batch_size: 10
timeout: 10s
exporters:
otlp:
endpoint: https://otlp.nr-data.net:4317
headers:
"api-key": <NEW RELIC LICENSE API KEY>
logging:
verbosity: detailed
service:
pipelines:
traces:
receivers: [otlp, opencensus, jaeger, zipkin]
processors: [batch]
exporters: [logging]
metrics:
receivers: [otlp, opencensus]
processors: [batch]
exporters: [otlp, logging]
logs:
receivers: [otlp]
processors: [batch]
exporters: [otlp, logging]
extensions: [health_check, pprof, zpages]
Set up the Connection to New Relic
Set up the OTEL Collector, using the steps in Step 1: Install the OpenTelemetry package.
If you already have the OTEL Collector set up, go to the next step.Either create an account in the New Relic application, or if you already have an account, continue with the next step.
Copy the API key from the Administration > API keys page in the New Relic UI.
Save the YAML file to the following location:
/etc/otelcol
Add the API key to the YAML file under
headers
:Apply changes to the YAML file and restart the Open Telemetry service.
Monitor Runtime Logs in New Relic
Run some tasks or pipelines on the target Snaplex, and you can observe the summary of pipeline execution logs in the New Relic UI, as shown in the example image:
Click each log to open a view of the Log Details, as shown below:
Metrics Reference
Metric Name | Description |
---|---|
Java Heap Metrics | |
| Number of bytes used of the heap. |
| Number of bytes allocated for the heap. |
| Percentage of the heap being used ( |
| The maximum heap size. |
| Percentage of the max heap being used ( |
Java Non-Heap Metrics | |
| Number of bytes used of the off-heap. |
| Number of bytes allocated for the off-heap. |
| Percentage of the off-heap space being used ( |
| The number of bytes the JVM allowed to allocate off-heap space. |
| Percentage of the max off-heap being used ( |
CPU Usage Metrics | |
| Number of vCPUs available on the machine. |
| Last minute load average on the machine. |
| System CPU utilization as % of the total. |
| JCC Process CPU utilization as % of the total. |
Disk Usage Metrics | |
| The overall amount of bytes available on the disk. |
| Amount of bytes used from the disk. |
| Percentage of bytes used from the disk. |
| Amount of bytes available on the disk. |
| Percentage of bytes available on the disk. |
Memory Usage Metrics | |
| The overall amount of memory in bytes available on the machine. |
| The amount of memory in bytes available for allocation on the machine. |
| The amount of memory in bytes allocated on the machine. |
| Percentage of memory available for allocation on the machine. |
| Percentage of memory allocated on the machine. |
| The amount of bytes allocated for swap memory on the machine. |
| The amount of bytes available in the swap memory on the machine. |
| The amount of bytes being allocated in the swap memory on the machine, -1 if the swap is not enabled. |
| Percentages of total swap memory available on the machine, or -1 if swap is not enabled. |
| Percentage of total swap memory allocated on the machine, or -1 if swap is not enabled. |
| The amount of virtual memory that is guaranteed to be available to the running process in bytes. |
File Utilization Metrics | |
| Amount of used file descriptors in the system. |
| Amount of free file descriptors in the system. |
| Amount of available (configured) file descriptors in the system. |
| Percentage of used file descriptors in the system. |
| Percentage of free file descriptors in the system. |
Thread Utilization Metrics | |
| The current number of live threads including both daemon and non-daemon threads within JVM. |
Slots Utilization Metrics | |
| Amount of slots leased at the moment. NOTE: if the slots are leased and released in-between the scrapes, the change won’t be reflected in the values. |
| Amount of slots available (configured) on the node. |
| Amount of slots leased at the moment for the pipeline. NOTE: if the slots are leased and released in-between the scrapes, the change won’t be reflected in the values. |
Network IO Metrics | |
| Overall amount of bytes received through the interface. |
| Overall amount of bytes sent through the interface. |
| Overall amount of packets received though the interface. |
| Overall amount of packets sent through the interface. |
| Overall amount of input errors through the interface. |
| Overall amount of output errors through the interface. |
| Incoming/received dropped packets per interface. On Microsoft Windows, returns discarded incoming packets. |
Pipeline Activity Metrics | |
| The pipeline is being initiated and being prepared/started for execution. |
| The pipeline finished its execution. |
| The pipeline has been requested to get ready but has not actually started. This happens for some flows of the UI/Designer, for example, when opening a configuration dialog for the Snap. |
| The number of active pipelines at the moment |
Feedmaster Broker Metrics | |
| The amount of messages the consumer has read out of the queue (applies to Ultra Pipelines). |
| The amount of messages the producer has written into the queue (applies to Ultra Pipelines). |
Snaplex State Metrics | |
| 1 if node considers itself a leader, otherwise 0. |
| The number of neighbors visible to the node including itself. The node is considered visible and therefore active from stand point of the other node if the heartbeat is successful and the state is either |
Have feedback? Email documentation@snaplogic.com | Ask a question in the SnapLogic Community
© 2017-2024 SnapLogic, Inc.