Multi File Reader

In this article

Overview

You can use this read type Snap to read binary data from various sources such as SLDB, HTTP, S3, SFTP, HDFS, and produces a binary data stream at the output. Unlike the File Reader Snap, this Snap can read more than one file in the given directory and its subdirectories recursively.

Important

We plan to introduce additional S3 features exclusively in Amazon S3 Snaps, while Binary Snaps with S3 support will not contain these updates. Therefore, we recommend you to use the Amazon S3 Snap Pack for all your S3 operations within your pipelines. However, Binary Snaps will be retained as is to maintain backward compatibility, but be aware that we will no longer provide S3 support for the Binary Snaps.

Learn more: Migration from Binary Snaps to Amazon S3 Snaps.


Snap Type

Multi File Reader is a READ-type Snap.

Prerequisites

IAM Roles for Amazon EC2

The 'IAM_CREDENTIAL_FOR_S3' feature is used to access S3 files from EC2 Groundplex, without Access-key ID and Secret key in the AWS S3 account in the Snap. The IAM credential stored in the EC2 metadata is used to gain access rights to the S3 buckets. To enable this feature, set the Global properties (Key-Value parameters) and restart the JCC:
jcc.jvm_options = -DIAM_CREDENTIAL_FOR_S3=TRUE

This feature is supported in the EC2-type Groundplex only. Learn more.  


Connect to FTP server:

To connect to the FTP server that needs to reuse the session for data transfer over TLS protocol, add:

-DFTPS_SSL_TLS_PROTOCOL=TLSV1.2  (or) TLSV1.3property as a JVM option under the Global properties of the Node Properties tab:

Support for Ultra Pipelines

Works in Ultra Pipelines

Limitations

  • For most file protocols, the Snap behaves the same way in both Snaplex and Groundplex. However, the HDFS protocol works only in a Groundplex. The Hadoop cluster must open to the Groundplex server instance without any authentication.
  • Do not use sldb as a file system or storage. File Assets are intended only for specialized files that a pipeline uses to reference specific data, such as accounts, expressions, or JAR files. Use a Cloud storage provider to store production data. File Assets should not be used as a file source or as a destination in production pipelines. When you configure the Multi File Reade, set the file path to a cloud provider or external file system.

Known Issues

  • This Snap Pack no longer natively supports RSA-SHA1 authentication with the Secure File Transfer Protocol (SFTP). To enable support for RSA-SHA1 authentication, set the following property from the Node Properties section of the Snaplex UI:

-Djsch.server_host_key=ssh-rsa -Djsch.client_pubkey=ssh-rsa

With the 4.33 GA release of the Binary Snap Pack, support for some algorithms for SFTP connection negotiation is removed for improved security and because we’ve updated the library used to connect to SFTP sources. If you want to revert to the previous settings, you can set the following jcc.jvm_options from the Node Properties section of the Snaplex UI. To update Cloudplexes, contact SnapLogic Support.

-Djsch.kex=ecdh-sha2-nistp256,ecdh-sha2-nistp384,ecdh-sha2-nistp521,diffie-hellman-group14-sha1,diffie-hellman-group-exchange-sha256,diffie-hellman-group-exchange-sha1,diffie-hellman-group1-sha1
-Djsch.server_host_key=ssh-rsa,ssh-dss,ecdsa-sha2-nistp256,ecdsa-sha2-nistp384,ecdsa-sha2-nistp521
-Djsch.client_pubkey=ssh-rsa,ssh-dss,ecdsa-sha2-nistp256,ecdsa-sha2-nistp384,ecdsa-sha2-nistp521
-Djsch.cipher=aes128-ctr,aes128-cbc,3des-ctr,3des-cbc,blowfish-cbc,aes192-ctr,aes192-cbc,aes256-ctr,aes256-cbc
-Djsch.check_ciphers=aes256-ctr,aes192-ctr,aes128-ctr,aes256-cbc,aes192-cbc,aes128-cbc,3des-ctr,arcfour,arcfour128,arcfour256
-Djsch.check_kexes=diffie-hellman-group14-sha1,ecdh-sha2-nistp256,ecdh-sha2-nistp384,ecdh-sha2-nistp521
-Djsch.check_signatures=ecdsa-sha2-nistp256,ecdsa-sha2-nistp384,ecdsa-sha2-nistp521


Snap Views

TypeFormatNumber of ViewsExamples of Upstream and Downstream SnapsDescription
Input 

Document



  • Min:0
  • Max:1
N/AN/A
Output

Binary

  • Min:1
  • Max:1
  • File Writer
  • CSV Parser
  • JSON Parser
  • XML Parser

Binary data read from the source specified in the Selected files property.

Error

Error handling is a generic way to handle errors without losing data or failing the Snap execution. You can handle the errors that the Snap might encounter while running the Pipeline by choosing one of the following options from the When errors occur list under the Views tab. The available options are:

  • Stop Pipeline Execution: Stops the current pipeline execution when the Snap encounters an error.

  • Discard Error Data and Continue: Ignores the error, discards that record, and continues with the rest of the records.

  • Route Error Data to Error View: Routes the error data to an error view without stopping the Snap execution.

Learn more about Error handling in Pipelines.

Account

This Snap uses account references created on the Accounts page of SnapLogic Manager to handle access to this endpoint. This Snap supports a Basic auth account, an AWS S3 auth account, SSH Auth account, SMB account, or no account. See Configuring Binary Accounts for information on setting up accounts that work with this Snap. Account types supported by each protocol are as follows:

ProtocolAccount types
sldbno account
s3AWS S3
ftpBasic Auth
sftpBasic Auth, SSH Auth
ftpsBasic Auth
hdfsno account
httpno account
httpsno account
smb

SMB

wasbAzure Storage
wasbsAzure Storage
gs

Google Storage

The FTPS file protocol works only in explicit mode. The implicit mode is not supported.

Required settings for account types are as follows:

Account typeSettings
Basic AuthUsername, Password
AWS S3Access-key ID, Secret key
SSH AuthUsername, Private key, Key Passphrase
SMBDomain, Username, Password
Azure StorageAccount name, Primary access key
Google StorageApproval prompt, Application scope, Auto-refresh token
(Read-only properties are Access token, Refresh token, Access token expiration, OAuth2 Endpoint, OAuth2 token and Access type.)

Snap Settings

  • Asterisk (*): Indicates a mandatory field.

  • Suggestion icon (): Indicates a list that is dynamically populated based on the configuration.

  • Expression icon (  ): Indicates whether the value is an expression (if enabled) or a static value (if disabled). Learn more about Using Expressions in SnapLogic.

  • Add icon (  ): Indicates that you can add fields in the fieldset.

  • Remove icon (): Indicates that you can remove fields from the fieldset.


Field NameField TypeDescription

Label*


Default Value: Multi File Reader
Example
Multi File Reader


String

Specify the name for the Snap. You can modify this to be more specific, especially if you have more than one of the same Snap in your Pipeline.


Selected Files 










Use this field set to define data sources. 

All selected files must be under the same protocol.

Folder/File


Default Value: [None]

Example

  • s3:///<S3_bucket_name>@s3.<region_name>.amazonaws.com/<path>
    For region names and their details, see AWS Regions and Endpoints.
  • sftp://ftp.snaplogic.com:22/dir/filename
  • smb://smb.Snaplogic.com:445/test_files/csv/input.csv




String/Expression

Specify the URL for the data source, which can be a directory or a file. It should begin with a file protocol. The supported file protocols are:

  • http:
  • https:
  • s3:
  • sftp:
  • ftp:
  • ftps:
  • hdfs:
  • sldb:
  • smb:
  • wasb:
  • wasbs:
  • gs:

This Snap supports S3 Virtual Private Cloud (VPC) endpoint. For example, s3://my-bucket@bucket.vpce-028b7814794578709-vu0vvauy.s3.us-west-2.vpce.amazonaws.com


The File property should have the syntax: [protocol]://[host][:port]/[path]

  • _filename (A key/value pair with "filename" key should be defined as a pipeline parameter.)
  • If a Pipeline is created in a project other than the shared project and you want to read the "asset.json" file from the shared project, enter "shared/asset.json" or "sldb:///shared/asset.json".
  • If a Pipeline is created in the shared project and you want to read the "asset.json" file from the shared project, enter "asset.json" or "sldb:///asset.json".


"://" is a separator between the file protocol and the rest of the URL and the host name and the port number should be between "://" and "/". If the port number is omitted, a default port for the protocol is used. The hostname and port number are omitted in the sldb and s3 protocols.

  • Ensure the file name, folder name, or the file path does not contain '?' character because it is not fully supported and when present, the Snap might fail.
  • The File property should be an absolute path for all protocols except sldb. For sldb files, the Snap can access only files in the same project directory or the shared project directory, and cannot access files in other projects.
  • For sldb, http and https protocols, URL for a regular file should be entered. Folders are not supported for these protocols.
    If this property is a regular file, the Wildcard and Include subfolders property are ignored.


In the SnapLogic 4.3.2 release, WASB (Windows Azure Storage Blob) or WASBS protocol (wasb:/// or wasbs:///) support has been added to the Binary Snaps.


In the WASB and WASBS file URL, the top directory should be the name of the 'Azure Storage container'.

  • If an account is not used within the Snap, then use: s3://yourAcccessKeyID:yourSecretKey@s3/yourBucketName/folder1/folder2/ 
  • if an account is not used within the Snap, then use:
    s3://yourAcccessKeyID:yourSecretKey@s3/yourBucketName/folder1/rawData.csv
  • If the Snap is executed in the Windows Groundplex and needs to access D: drive, then use file:///D:/testFolder/ 
  • To read files in the 'testDir' folder in the 'Snaplogic' container, then use wasb:///Snaplogic/testDir/sample.csv  
  • If the bucket name is 'testBucket', then gs:///testBucket/testDir/ 

Wildcard 


Default Value: [None]
Example

  • *.*
  • *.csv
  • *.json
  • *.??? (matches all files with three-character extensions)
String/Expression

Specify the wildcard pattern, if the URL in the Folder/File property is for a directory. All files matching the wildcard pattern are selected. This property is not supported for the sldb, http, and https protocols. The asterisk pattern character ("*", also called "star") and the question mark ("?") are supported. The "*" character matches zero or more characters. The "?" matches exactly one character.

Include Subfolders 


Default ValueNot selected 

Checkbox

Select to search subfolders for the specified Wildcard if Folder/File is set to a directory.

If you select this checkbox and the Folder/File property is a folder, all files in the subfolders matching the given wildcard pattern are selected. This checkbox is not supported for the sldb, http, and https protocols.

Number of retries


Default Value: 0
Example
3

Integer/Expression

Specify the maximum number of retry attempts the Snap must make in case there is a network failure, and the Snap is unable to read the target file.

If the value is larger than 0, the Snap first downloads the target file to a temporary local file. If any error occurs during the download, the Snap waits for the time specified in the Retry interval and attempts to download the file again from the beginning. When the download is successful, the Snap starts to stream the data from the temporary file to the downstream pipeline. All temporary local files are deleted when they are no longer needed.

  • Ensure that the local drive has sufficient free disk space to store the temporary local file.

  • The retry operation is applied for each file the Snap downloads.

Minimum value: 0

Retry interval (seconds)


Default value: 1
Example
3

Integer/Expression

Specify the minimum number of seconds for which the Snap must wait before attempting recovery from a network failure.

Minimum value: 1


Advanced PropertiesUse this field set to define additional properties.
SAS URIDropdown list

Specify the URI of the Shared Access Storage (SAS) you need to access. You can generate the SAS URI either from the SAS Generator Snap or from the Azure portal → Shared access signature.

The supported SAS types are:

  • Service SAS on a container
  • Service SAS on blob
  • Ensure that the URI is specified in the format described here.
  • If you provide SAS URI in this field, then:
    • the Primary access key given in the account settings is overridden while authentication. If you do not provide the SAS URI, the Snap considers the Primary access key in the account settings.
    • only this URL is used and the Snap ignores the SAS URI settings that you have configured in the associated account.


Default value
: N/A

Example: https://myaccount.blob.core.windows.net/sascontainer/sasblob.txt?sv=2015-04-05&st=2015-04-
29T22%3A18%3A26Z&se=2015-04-30T02%3A23%3A26Z&sr=b&sp=rw&sip=168.1.5.60
-168.1.5.70&spr=https&sig=Z%2FRHIX5Xcg0Mq2rqI3OlWTjEg2tYkboXr1P9ZUXDtkk%3D

If the SAS URI value is provided in the Snap settings, then the settings provided in the account (if any account is attached) are ignored.

ValuesString/ExpressionSpecify the value for the property.

Snap Execution

Dropdown list

Select one of the three modes in which the Snap executes. Available options are:

  • Validate & Execute: Performs limited execution of the Snap, and generates a data preview during Pipeline validation. Subsequently, performs full execution of the Snap (unlimited records) during Pipeline runtime.
  • Execute only: Performs full execution of the Snap during Pipeline execution without generating preview data.
  • Disabled: Disables the Snap and all Snaps that are downstream from it.


The Pipeline validation (achieved by pressing "Retry") imposes a 5-minute timeout. If there are a large number of files to be read by the Snap as a result of Wildcard and Include subfolders settings, the Snap validation may fail due to this 5-minute timeout limit.

Output Fields for the Different Protocols

The output fields that the Multi File Reader Snap generates depends on the protocol you select. The following table lists the output fields for the different protocols supported by the Snap:

Protocol
Output Fields
S3
  • content-type
  • content-length
  • last-modified: _snaptype_datetime
  • etag
  • accept-ranges
  • content-location
  • content-disposition
SLDB
  • content-type
  • date
  • x-amz-meta-md5
  • content-length
  • server
  • x-amz-server-side-encryption
  • x-amz-meta-length
  • x-amz-meta-create_time
  • last-modified: _snaptype_datetime
  • x-amz-meta-file_id
  • x-amz-meta-ttl
  • content-disposition
  • x-amz-meta-owner
  • x-amz-meta-expire_time
  • etag
  • x-amz-request-id
  • x-amz-meta-mimetype
  • x-amz-id-2
  • accept-ranges
  • content-location
  • WASB
  • SMB
  • SFTP
  • GStorage
  • content-type
  • content-location
  • content-disposition
Example

Snap Pack History


 Click to view/expand


Release Snap Pack VersionDateTypeUpdates
November 2024main29029 StableUpdated and certified against the current SnapLogic Platform release.
August 2024438patches28197 Latest
  • Fixed an issue in the Binary Snap Pack that caused intermittent errors on the Groundplex nodes when using the SMB protocol. The errors occurred when multiple IPs were returned for the same address, and one of the IPs was unreachable by the SMB server
  • Fixed an issue in the Directory Browser Snap where a mount error was displayed if the hostname contained an underscore, resulting in an empty hostname in the SMB URL.

August 2024

main27765

 

Stable

Upgraded the org.json.json library from v20090211 to v20240303, which is fully backward compatible.

May 2024437patches27146 LatestEnhanced the Azure Storage Account with Managed Identity, which provides applications with an automatically managed identity for connecting to resources that support Microsoft Entra ID authentication.
May 2024437patches26873 LatestFixed an issue with the File Poller Snap where the Snap applied a case-sensitive filter in the Windows operating system.
May 2024437patches26592 Latest

Enhanced the Decompress Snap to support encrypted and unencrypted ZIP and 7z files through the new File Password Account type.

May 2024main26341 Stable

The Azure Data Lake Account has been removed from the Binary Snap Pack because Microsoft retired the Azure Data Lake Storage Gen1 protocol on February 29, 2024. We recommend replacing your existing Azure Data Lake Account with other Azure Accounts.

February 2024436patches25711 Latest

Fixed an SMB (server message block) connectivity issue within the Binary Snap Pack, as the incorrect name provided by the SMB client did not match the Windows cluster virtual name, affecting the SPN (service principal name) connection.

February 2024436patches25332 Latest

Fixed a null pointer exception in the Public Key Account for Binary Snap Pack when the Public Key field value is null. Now, the account displays a configuration exception for a null value.

February 2024436patches25241 Latest

Fixed an issue with the File Writer Snap, which partially writes the file for some FTPS servers by pausing for seconds before closing the output stream as specified in the global property ftpsDelayBeforeClosing.

February 2024436patches25161 LatestFixed an issue with the File Poller Snap that displayed an exception when an unauthorized character ':' was used in the Windows Snaplex.
February 2024main25112 StableUpdated and certified against the current SnapLogic Platform release.
November 2023435patches24525 LatestFixed an issue with the File Poller Snap that caused it to poll recursively for files in the root directory of the S3 bucket.
November 2023435patches23780 Latest

Fixed an issue with the Binary Snaps that failed to build a data connection when connecting to the FTP server over FTPS protocol.

November 2023main23721 Stable

Updated and certified against the current SnapLogic Platform release.

August 2023434patches23502 LatestFixed an issue with the File Operation Snap (SFTP protocol) where the error message did not display during a move operation, although the existing file was available in the target path and the Error if exists checkbox was selected.
August 2023434patches23302 Latest

Fixed an issue that occurred when node properties were used to override default algorithm specifications for SFTP operations.

August 2023434patches22976 LatestFixed an issue that caused account credentials to be visible in the stack trace of some failed SFTP operations.
August 2023434patches22842 Latest

Fixed an issue that caused NTLM authentication issues when trying to access SMB servers.

August 2023434patches22639 Latest

The JSON key field in the Binary Google Service Account supports JSON strings. You can upload the JSON key either from SLDB or dynamically pass the value using a pipeline parameter or access values from Secrets Manager.

August 2023

main22460

 


Stable

Updated and certified against the current SnapLogic Platform release.

May 2023433patches22297 Latest
  • Fixed an issue with the PGP Sign Snap that caused an error when using an encryption subkey for signing.
  • Fixed an issue with the File Writer Snap where the file was unable to validate when the File action field was set to IGNORE for the WASB and WASBs protocols.
May 2023

433patches21913

 Latest
  • Dynamic ports are supported for the SMB file protocol.
  • Fixed an issue with the File Writer Snap that caused slow performance when writing large files and the Flush interval was set to a positive value.
May 2023433patches21870 LatestAdded the PGP Sign Snap, which allows binary data to be signed using PGP.
May 2023

433patches21645

 Latest

With the 4.33 GA release, support for some algorithms for SFTP connection negotiation is removed for improved security and because we’ve updated the library used to connect to SFTP sources. With Snap Pack version 433patches21645, you can modify the global properties. Refer to the Configuration Settings for Snaps documentation for details about how to revert to the previous settings.

May 2023433patches21576 Latest

Fixed the issues with the AES Encrypt and AES Decrypt Snaps, where the Snaps previously did not include the error stack trace in the error view. The Snaps now provide detailed information in case of any issues or errors.

May 2023433patches21482 Latest
  • Fixed an issue that caused a String index out of range error with the SFTP protocol in the File Writer Snap when it attempted to create a top-level directory.

  • Enhanced the PGP Encrypt Snap with the Encryption key ID field, which allows you to specify the key ID for encrypting the data. If you do not specify an encryption key ID, the Snap uses the primary key in the public key (master key).

Starting from version 433patches21482, the PGP Encrypt Snap will no longer support encryption with an expired key. To ensure continued support for encryption, we highly recommend extending the expiration of your PGP key.

May 2023433patches21291 LatestFixed an issue with the Multi File Reader Snap where it failed with the error S3 object not found when the Snap found no matching file to read and the Folder/File property value did not end with a forward slash (/).
May 2023433patches21179 Latest
  • Fixed an issue with the File Delete Snap where the Snap failed with a 404 Not Found error when trying to delete files from an Amazon S3 bucket. This issue occurred only with the Identity and Access Management (IAM) role in an Amazon AWS S3 Account.

  • Fixed an issue where Binary Snaps could not handle region information for the Amazon S3 file protocol, which resulted in an error.

May 2023main21015 Stable

The Key passphrase field in the Private Key Account now supports expressions, allowing dynamic evaluation using pipeline parameters when the expression button is enabled.

February 2023

432patches20458

 Latest

Fixed an issue where the ZipFile Read and ZipFIle Write Snaps failed to display the input schema for the File or File name field when using an expression.

February 2023432patches20431 Latest
  • Fixed an issue where the File Writer Snap would not retry on completing the writing of the file.

  • Added a configuration warning message when using the WASBS protocol with AzCopy, as it supports only the HTTPS protocol.
February 2023432patches20349 Latest

The JSCH library has been upgraded to version 0.2.7.

February 2023main19844 StableUpgraded with the latest SnapLogic Platform release.
November 2022431patches18977 Latest

The PGP Decrypt Snap now allows you to skip the signature verification when you face an issue with the signature in the encrypted file.

November 2022main18944 StableUpgraded with the latest SnapLogic Platform release.
September 2022430patches17933 Latest
  • The File Delete Snap now passes the input document to the error view correctly.

  • The AWS S3 and S3 Dynamic accounts now support the maximum session duration of an IAM role defined in AWS.

August Patch

430patches17292

 Stable and Latest

Fixed an issue with the Directory Browser Snap, which failed with a null pointer exception error when connecting to the SFTP server containing a port number.

The Binary Snap Pack is deployed as both the latest and stable distribution. We recommend that you use this version for your Org when using the recommended Snaplex version (main-13269 - 4.30 GA).

August 2022main17386 Stable
  • The File Operation Snap supports moving data from a local node to an Azure blob through the AZ Copy utility.

  • The Azure Storage Account includes the Request Size (MB) field to set the buffer limit before writing to Azure storage to enhance the performance.

  • The SSH Auth Account supports dynamic values for the following fields that allow you to use Pipeline parameters.

    • Username

    • Private key

    • Key passphrase

4.29 Patch

429patches16569

 Latest
4.29 Patch429patches15842 Latest
  • Improved the tooltip for the File action field in the ZipFile Write Snap.
  • Enhanced the SSH Auth Account with Expression enabler () for the following fields that allow the use of Pipeline parameters to populate the properties.
    • Username
    • Private key
    • Key passphrase
4.29main</