Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

iIn this Article

Table of Contents
minLevel1
maxLevel3

Overview

Groundplex deployment comprises many factors for considerationThere are several factors to consider when deploying a self-managed Snaplex (Groundplex). Most of these considerations are due to because of IT requirements in your computing environment, but some also depend on Pipeline production and the types of Pipelines you plan to run in production.

Info

While Groundplex and Snaplex refer to the same thing, this article uses Snaplex in general when referring to the SnapLogic Manager, and Groundplex in the context of the computing resources underlying itThe term Groundplex is used throughout this document to refer specifically to your self-managed Snaplex.

Computing Requirements

The Groundplex must conform to the following minimum specifications:

Nodes (Min/Rec)

Minimum: 1

Recommended: 2 or more nodes

SnapLogic Project and Enterprise platform package nodes can be configured in the following sizes:

  • Medium: 2 vCPU and

8GB
  • 8 GB RAM 

  • Large: 4 vCPU and

16GB
  • 16 GB RAM 

  • X-Large: 8 vCPU and

32GB
  • 32 GB RAM

  • 2 X-Large: 16 vCPU and

64GB
  • 64 GB RAM

We recommend two nodes for high availability. For requirements about clustering nodes,

see

refer to Node Cluster.

All nodes within a Snaplex must be of the same size.

RAM (Min)

Minimum:

8GB

8 GB

Depending on the size, number, and

nature

design of Pipelines, more RAM is required to maintain an acceptable level of performance.

CPU (Min)

Minimum: 2 core

All Snaps execute in parallel in their own threads: the more cores that are available to the Snaplex, the more performant the system.

DISK

Disk (Min/Rec)

Minimum:

40GBRecommended: 100GB

80 GB

Recommended: 100 GB

  • The recommended disk space is for both total and available, assuming that the Groundplex nodes are not running other software.

  • The Snaplex installation has two directories: SL_ROOT and java.io.tmpdir Both require their own filesystems.

  • Local disk space is required for logging and for any Snap that uses the local disk for temporary storage (for example, Sort and Join Snaps). For details,

see 
in anyway restrict
  • have restrictions on the disk size of your Groundplex nodes.

Info

Memory (RAM) is used by the Pipelines to execute. Some Snaps,

...

such as Sort Snaps, which accumulate many documents, consume more memory; the amount of memory used is influenced by the volume and size of the documents being processed. For an optimum sizing analysis based on your requirements, contact your SnapLogic Sales Engineer.

Supported Operating Systems

The SnapLogic on-premises Snaplex is supported on Groundplexes support the following operating systems:

  • CentOS (or Red Hat) Linux 6.3 or newer.

  • Debian and Ubuntu Linux.

  • Windows Server 2008 64 bit, Windows Server 2012, and Windows Server 2016/2019/2022 with a minimum of 8GB 8 GB RAM.

You can also deploy a Groundplex:

  • On a Docker container

  • In a Kubernetes Environment

For improved security, the Groundplex machine timestamp is verified to check if it is

...

synchronized with the

...

time stamp on the SnapLogic Cloud. Running a time service on the Groundplex node ensures that the timestamp is always kept

...

synchronized.

  • For Linux,

...

...

...

A large clock skew can also affect communication between the FeedMaster and the JCC nodes. The Date.now() expression language function might be different between Snaplex nodes, and Internal log messages might have skewed

...

time stamps, making it more difficult to debug issues.

Note

We apply security updates to the Snaplex via auto-update, except for the base monitor process. To update the Monitor, install the latest RPM or Docker image directly.

CPU Architecture

We support x86 architecture and do not support ARM.

Network Guidelines and Requirements

The following network guidelines and requirements apply to Groundplex deployments:

  • Network throughput

  • Network firewall

Network Throughput Guidelines

You should consider that a Groundplex requires Groundplexes require connectivity to the SnapLogic Integration Cloud when running, as well as , and also connectivity to the Cloud cloud applications which may be used in the processes/Pipelines created and run in your solution. your Tasks and Pipelines.

To optimize performance, we recommend the following network throughput guidelines:

Guideline

Minimum

Recommended

Network In (Min/Rec)

Min: 10MB/sec, Recommended: 15MB/sec+

Depends on usage

10 MB/second

15 MB/second+

Network Out (Min/Rec)

Min: 5MB/sec, Recommended: 10MB/sec+

Depends on usage

5 MB/second

10MB/second+

Network Firewall Requirements

To communicate with the SnapLogic Integration Cloud, SnapLogic On-premises Snaplex uses Control Plane, Groundplexes use a combination of HTTP/HTTPS requests and WebSockets communication over the TLS (SSL) tunnel. In order for For this combination to operate effectively, you must configure the firewall to allow the following network communication requirements:

Feature

Component

Required

Consequence

HTTP outbound Port 443

Yes

Does not function

HTTP HEAD

Desired

Without HEAD support, a full GET request requires more time and bandwidth

Compression

Desired

Slower data transfer

Websockets

WebSockets (WSS protocol)

Yes

Does not function

Snaps using HTTP client without proxy support

Desired

Needs to use Snaps that support proxy settings

Port 8081 

JCC node: 8090 (HTTP), 8081 (HTTPS)

Yes

Unable to reach Snaplex neighbor - https://hostname:8081

Needs to be available for communication between the nodes in a Snaplex.

Feedmaster: 8090 (HTTP), 8084 (HTTPS), 8089 (Message queue)

  • The nodes of a Snaplex need to communicate among themselves, so it is important that they each node can each resolve each other's hostnameshost names. This requirement is required crucial when you are making local calls into the Snaplex nodes for the execution of Pipelines rather than going the Pipelines instead of initiating it through the SnapLogic Platform. The Pipelines are load-balanced by SnapLogic with Tasks passed to the target node.

  • Communication between the customer-managed Groundplex and the SnapLogic-managed S3 bucket is over HTTPS, with TLS enforced by default. The AWS-provided S3 URL also uses an HTTPS connection, with TLS enforced by default. If direct access from the Groundplex to the SnapLogic AWS S3 bucket is blocked, then the connection to the AWS S3 bucket communication falls back to a connection via through the SnapLogic Control Plane that still uses TLS 1.2.

  • To successfully implement the Zero Trust policy in any environment, use the following S3 URLs.

Learn more about Snaplex network setup.

Network Guidelines for Snap Usage

In the SnapLogic Platform, the Snaps tactually communicate to and from the applicationsapplication endpoints. The protocols and ports required for application this communication are mostly determined by the endpoint applications endpoints themselves , and not by SnapLogic. It is common for Cloud/ Cloud and SaaS applications to commonly communicate using HTTPS, although older applications and non-cloud /or SaaS applications might have their own requirements. 

For example, the following table shows some of these requirements:

Application

Protocol

Default Port

RedShift

Netezza

TCP

5480

Oracle

TCP

5439

1521

Netezza

RedShift

TCP

5480

5439

Salesforce

HTTPS

443

Oracle

TCP

1521

Each of these application connections may might allow the use of a proxy for the network connection, but it is a configuration option of the application’s connection—not one applied by SnapLogic.

FeedMaster Node Ports

The For Ultra Pipelines, the FeedMaster node listens on the following two ports:

  • 8084—The FeedMaster's HTTPS port. Requests for the pipelines are Pipelines are sent here as well as in addition to some internal requests from the other Groundplex nodes.

  • 8089—The FeedMaster's embedded ActiveMQ broker SSL port. The other Groundplex nodes connects connect to this port to send /and receive messages.

The machine hosting the FeedMaster nodes needs to have those ports open on the local firewall, and the other Groundplex nodes need to allow outbound requests to the FeedMaster nodes on those ports.

Snaplex Network Binding

By default, a Snaplex starts and binds to the localhost network interface on port 8090. Any clients can connect to the JCC note only if the client is also running on the same machine. This default is chosen since the Snaplex does not receive any inbound requests normally. Instead, it uses an outbound WebSocket connection to receive its requests from the SnapLogic Cloud services. If requests need to be sent to the Snaplex from the customer network, then you should configure the Snaplex to listen on its network interfaces. This would be required when a feed URL Pipeline execution request is done by pointing directly at the Snaplex host instead of pointing at the Cloud URL. To do this, set (default is 127.0.0.1):

...

Code Block
 jcc.jetty_host = 0.0.0.0 

...

If you need to configure the hostname used by the Snaplex to be a different value than the machine name (for example newname.mydomain.com), add:

...

Code Block
jcc.jvm_options = -DSL_INTERNAL_HOST_NAME=newname.mydomain.com -DSL_EXTERNAL_HOST_NAME=newname.mydomain.com

in the etc/global.properties by adding it to the Update Snaplex dialog, Node Properties tab, Global properties table.

JCC Node Communication within a Groundplex

We recommend that you set up JCC nodes in a Snaplex within the same network and data center. Communication between JCC nodes in the same Snaplex is required for the following reasons:

  • The Pipeline Execute Snap communicates directly with neighboring JCC nodes in a Snaplex to start child Pipeline executions and send documents between parent and child Pipelines.

  • The data displayed in Data Preview is written to and read from neighboring JCC nodes in the Snaplex.

  • The requests and responses made in Ultra Pipelines are exchanged between a FeedMaster node and all of the JCC nodes in a  Snaplex.

  • A Ground Triggered Task (invoked from a Groundplex) can be executed on a neighboring JCC node due to load-balancing—in which case, the prepare request and the bodies of the request and response are transferred between nodes.

Therefore, any extra latency or network hops between neighboring JCC nodes can introduce performance and reliability problems.

Name

Groundplex Name and Associated Nodes

Every Snaplex requires a name, for example, ground-dev or ground-prod. In the SnapLogic Designer, you can choose on which the Snaplex where Pipelines are executed. The Snaplex configuration also has an Environment variable associated with it,

Your nodes are associated with a Groundplex through the Environment variable: for example, dev or prod. When you configure the nodes for the Snaplexyour Groundplex, you must set the jcc.environment to the Environment value that you have configured for the Snaplexprovided in the Create Snaplex dialog. You can change this variable in the Update Snaplex dialog.

The hostname host name of the system used by a Snaplex a Groundplex can not use have an underscore (_) in its name as per DNS standards. Avoid special characters as well.

After the Snaplex service is started on a node, the service connects to the SnapLogic Cloud service. Runtime logs from the Snaplex are written to the the following folder: 

  • Linux: /opt/snaplogic/run/log 

  • Windows: 

...

  • c:\opt\snaplogic\run\log

...

The Dashboard shows the nodes that are currently connected nodes for each Snaplex.

Snaplex Node Configuration

Snaplex nodes are typically configured using a slpropz configuration file, located in the $SL_ROOT/etc folder. 

If you use the slpropz file as your Snaplex configuration, then:

  • After a Snaplex node is started with the slpropz configuration, subsequent configuration updates are applied automatically.

  • Changing the Snaplex properties in Manager causes each Snaplex node to download the updated slpropz and initiates a rolling restart with no downtime on Snaplex instances with more than one node.

  • Some configuration changes, such as an update to the logging properties does not require a restart and are applied immediately.

...

If you have an older Snaplex installation and its configuration is defined in the global.properties file, then the Environment value must match the jcc.environment value In the JCC global.properties file. To migrate your Snaplex configuration to the slpropz mechanism, see Migrating Older Snaplex Nodes.

...

You should always configure your Snaplex instances using the slpropz file because you do not have to edit the slpropz files manually and changes to the Snaplex done through Manager are applied automatically to all nodes in that Snaplex, making configuration issues, which may prevent the Snaplex from starting, automatically reverted.

...

Understanding the Distribution of Data Processing across Snaplex Nodes

When a Pipeline or Task is executed, the work is assigned to one of the JCC nodes in the Snaplex. Depending on a number of variables, the distribution of work across JCC nodes is determined by number of threads in use, the amount of memory available, and the average system load.The scheduling of pipelines across nodes in a Snaplex is based on an algorithm that is least-loaded, with priority on memory usage. If there are similarly loaded nodes, the algorithm randomizes the pipeline execution across them.

Info

To ensure that requests are being shared across JCC nodes, we recommend that you set up a load

...

balancer to distribute the work across JCC nodes in the Snaplex.

Node Cluster

Starting multiple nodes with the JCC service pointing to the same Snaplex configuration automatically forms a cluster of nodes , as long as if you follow these requirements for nodes in a Snaplex:

  • The nodes need to communicate to with each other on the following ports: 8081, 8084, and 8090.

  • The nodes should have a reliable, low-latency network connection between them.

  • The nodes should be homogeneous in that they should have similar the same CPU and memory configurations, as well as access to the same network endpoints.

Temporary Folder

This section explains what data traversing the SnapLogic platform is encrypted or unencrypted. The temporary folder stores unencrypted data.

Encrypted data:

  • Transit data – Is always encrypted, assuming the endpoint supports encryption.

  • Preview and Account data – Is always encrypted.

Unencrypted data:

  • Snaplex – Data processing on Groundplex, Cloudplex, and eXtremeplex nodes occur principally in-memory as streaming, which is unencrypted.

  • Larger dataset – When larger datasets are processed that exceed the available compute memory, some Snaps like Sort and Join, which process multiple documents, write Pipeline data to the local disk as unencrypted during Pipeline execution to help optimize the performance.

These temporary files are deleted when the Snap/Pipeline execution completes. You can update your Snaplex to point to a different temporary location in the Global properties table of the Node Properties tab in the Update Snaplex dialog:

...

Code Block
jcc.jvm_options = -Djava.io.tmpdir=/new/tmp/folder

The following Snaps write to temporary files on your local disk:

  • Anaplan: Upload, Write

  • Binary: Sort, Join

  • Box: Read, Write

  • Confluent Kafka: All Snaps that use either Kerberos and SSL accounts

  • Database: When using local disk staging for read-type Snaps

  • Email: Sender

  • Hadoop: Read, Write (Parquet and ORC formats)

  • JMS: When the user provides a JAR file

  • Salesforce: Bulk Query, Snaps that process CSV data

  • Script: PySpark

  • Snowflake: When using internal staging

  • Teradata: TPT FastExport

  • Transform: Aggregate, Avro Parser, Excel Parser, Join, Unique, Sort

  • Vertica: Bulk Load

  • Workday Prism Analytics: Bulk Load

  • Data Science (Machine Learning) Snaps: Profile, AutoML, Sample, Shuffle, Deduplicate, MatchAll JCC nodes should be the same size. All FeedMaster nodes should be the same size for load balancing. Worker and FeedMaster nodes can be of different sizes.

Node Diagnostics

Snaplex Diagnostics helps you verify your Snaplex host environment and troubleshoot any issues. Each Snaplex node is a JCC instance running on a host, and the node diagnostic test highlights the hardware and thread limits requirements. It checks for minimum hardware requirements such as RAM and disk storage. The details of the test are listed in the table below. For more information on how to view the node details panel, refer to https://docs.snaplogic.com/monitor/node-details-panel.html

Diagnostic test

Recommended value

Examples of current value displayed in the diagnostic test

Nodes have insufficient swap space

If the maximum value is not present, the system displays the value of 50% of the RAM configured or 8GB, whichever is greater.

If the value is not as per the recommended value the current value is displayed in red.
Example: 1 GiB

Max Slots

If there is no minimum value, then the recommended value is calculated as follows: RAM configured or 8GB) * 2000 (max value) rounded to the nearest 500.

Example: If the maximum value is 3840, the current value displayed is 4000

Thread limit

Minimum value = 65000

Displays the thread limit in red if the value is below the recommended value.

Example: 4000

Max file descriptors

If there is no maximum value and the minimum value is 65000, then the recommended value should be 65000.

Example: 65535

Max jvm heap

The minimum and the recommended value is calculated as follows:
RAM configured *.85

Minimum value = 12GiB

Recommended value = 12GiB

Example: 12.44 GiB

RAM configured

Minimum value = 4GiB

Recommended value = 4 GiB

Example: 2.73%

RAM available

More than 15 minute period per day where memory utilization is > 75% or average memory utilization is > 60%

Example: 2.78%

Disk storage configured

Minimum value: 40GiB

Recommended value: 100GiB

Example: 39.98 GiB

JCC Node Communication Requirements

Each JCC node publishes its IP addresses to the control plane. DNS is not required for communication between nodes. We recommend setting up all the nodes inside a Snaplex in the same network and data center. Communication between JCC nodes in the same Snaplex is required for the following reasons:

  • The Pipeline Execute Snap communicates directly with neighboring JCC nodes in a Snaplex to start child Pipeline executions and send documents between parent and child Pipelines.

  • The data displayed in Data Preview is written to and read from neighboring JCC nodes in the Snaplex.

  • The requests and responses made in Ultra Pipelines are exchanged between a FeedMaster node and all JCC nodes in a Snaplex.

  • A Ground Triggered Task (invoked from a Groundplex) can be executed on a neighboring JCC node because of load-balancing, in which case, the Pipeline prepare request, and the bodies of the request and response, are transferred between nodes.

Any extra latency or network hops between neighboring JCC nodes can introduce performance and reliability problems.

Related Content