Create the pipeline as shown below:
The Hadoop Directory Browser SnapUse a Hadoop Directory Browser Snap to first check the contents of the target directory. This will help you check whether the new file got added to the HDFS directory as expected, later in the example. Enter the Directory URL as appropriate and specify the File filter as *.zip. This instructs the Snap to list out all the ZIP files in the target directory.
If the Snap executes as expected, you should see the contents of your target directory, as shown below:
Generating a File for UploadYou now need to choose a file to upload into the target directory. You could either select a file directly or use a JSON Generator Snap coupled with a JSON Formatter Snap, as in the example pipeline.
The HDFS ZipFile Writer SnapYour file is now ready. Configure the HDFS ZipFile Writer Snap to upload the file as a ZIP file into the target directory in HDFS, as shown below.
The Hadoop Directory Browser SnapUse a Copy Snap to perform two tasks after the ZIP file is created: first, to check whether the new file was created as expected and second, to try and read the contents to the newly-created ZIP file from the target HDFS directory. To check whether the new file was created, add an HDFS Directory Browser Snap to the pipeline.
If the ZIP file was created, you should see it in the output, as shown below:
HDFS ZipFile ReaderOnce you have confirmed that the new ZIP file has been created, use the HDFS ZipFile Reader Snap to read the new ZIP file. If the contents of the new ZIP file is the same as the contents of the input file, you know that the pipeline works!
To read the output of the HDFS ZipFile Read Snap, use a File Reader Snap:
If the contents of the new file is the same as the contents of the original file, you know the example works. Click here to download this Pipeline. You can also downloaded this pipeline from the Downloads section below. |