Data In Motion — The Power of NIFI
Your boss requests a data pipeline definition that retrieves messages from Kafka, applies content transformations and securely transmits the updated message to an SFTP server.
The first thing you will probably do is to take a pen and a paper and draw few squares describing the components and the processors. Once you are done with the drawing you will think to yourself “why the drawing itself cannot be my pipeline” — So it can!!!!
NIFI is a data flow orchestrator where you can draw your pipeline (In NIFI world your drawing paper is called canvas), configure your squares (In NIFI world your squares are called processor) that you have drawn, click play and your pipeline starts to work. — I’m only missing the feature of copy paste between my paper to NIFI canvas
The same introduction in more sophisticated words
On today’s data-driven world, seamlessly moving information between various systems is crucial. Apache NiFi steps in as a powerful and user-friendly solution for automating this data flow. Developed by the Apache Software Foundation, NiFi acts as a central hub, orchestrating the movement, processing, and distribution of data across diverse applications and platforms.
Simply put, NiFi takes the manual effort out of data pipelines. It allows you to design visual workflows that ingest data from various sources, transform it as needed, and then route it to its designated destinations. This makes data integration efficient and scalable, regardless of the data’s format or origin.
With NiFi, you can:
- Automate complex data pipelines
- Build robust data flows for real-time or batch processing
- Leverage a wide range of processors for data manipulation
- Ensure secure communication through features like HTTPS and access control
NiFi’s intuitive web interface and extensive customization options make it a popular choice for organizations of all sizes. Stay tuned as we delve deeper into the world of NiFi and explore its capabilities in detail!
Blog Goals
In this blog we will fulfill your Boss request - almost
- We will learn how to install NIFI on Linux box
- Configure the NIFI
- Define Username & Password
- Define a basic pipeline — In order not to complicate the exercise, Instead of Kafka we will read files from a folder located on the Linux Box. On the next blog i will show you how to connect to Kafka platform using NIFI
Prerequisites and Ingrediencies
Below are all the prerequisites required to run this exercise:
Required prerequisites
- Linux box where we will install NIFI. Since NIFI is a Java component it can be installed on any Linux distribution and on Windows as well. In our exercise, I will use Ubuntu 22
- Download the NIFI binary from the following link https://nifi.apache.org/download/ — In the exercies we will install nifi version 1.26
- Create the following folders
- Create a “nifi” folder — This will be our main folder
- Create an “input” folder under nifi folder (/nifi/input), the NIFI will fetch messages from this folder.
- Create an “ftp” folder under nifi folder (/nifi/ftp) — This folder will be used for our SFTP server, and NIFI output will be written to this folder. In order to avoid permission issue, give full permission to both input and ftp folders by running the following commands
chmod 777 ftp
chmod 777 input
4. SFTP server — Basically you can use the same Linux box.
5. In the input folder create 2 files
- test1.txt file that will contains the text “This is before”
- test2.txt file that will contains the text “This is after”
Let’s start working
Download and install NIFI
- Install java jdk on your Linux Box by running: sudo apt-get install openjdk-17-jdk.
To verify that java is installed run java — version and verify that you can see the version
3. Install unzip package by running apt install unzip
4. Browser to nifi folder by running cd nifi
5. Download the NIFI binary, you can download it from the NIFI website.
I am using wget command:
wget -K https://dlcdn.apache.org/nifi/1.26.0/nifi-1.26.0-bin.zip (The -K is in order to avoid https verification)
6. Extract NIFI by running: unzip nifi-1.26.0-bin.zip
Once the installation done you will have a folder including the NIFI version, you can change the NIFI folder using mv command. In this exercise I have changed the NIFI application folder to nifi-app
Configure and Run NIFI application
- Check what is your Linux Box IP by running: ip a
My Linux box IP is 10.100.102.13 - By default, the NIFI application is configured to listen on localhost IP (127.0.0.1).
since we want to connect the NIFI from a different computer we need to change the listening host IP. In order to change the IP, run the following steps
- navigate to /nifi/nifi-app/conf
- edit nifi.properties file
- locate “nifi.web.https.host” and change 127.0.0.1 to your Linux Box IP. In our exercise the IP is 10.100.102.13
- If you wish you can change the port as well under “nifi.web.https.port” , default port is 8443
3. By defualt NIFI is generating a complex user and password on the first time you will start the application.
In order to set your own username and password run the following steps
- navigate to /nifi/nifi-app/bin
- Set the username and password by running the following command
./nifi.sh set-single-user-credentials <username> <password— 12 characters>.
In this exercise I have set the username to zbeda and password to 1qaz@WSX3edc - The username and password are now saved under login-identity-providers.xml file that located under nifi/nifi-app/conf folder
4. In order to start the NIFI application navigate to /nifi/nifi-app/bin & run the command
./nifi.sh start
5. In order to verify that NIFI is running, run the command
./nifi.sh status.
In case your NIFI failed to start, navigate to the log folder under /nifi/nifi-app/log and open the nifi-app.log file in order to investigate the issue.
6. Open your browser and browse to: https://<nifi-ip>8443/nifi
7. Enter the username and password that you have define in the previous step
8. Congrats !!! — You have login to NIFI
Create your first pipeline
In this section we will create our pipeline.
- The pipeline will fetch files from /nifi/input directory
- If the file content includes the string “before” it will be replaced by the string “after” if not it will pass the file as is.
- Finally, the file should be moved to SFTP server.
1. Login to the NIFI from your browser
2. Let's create our first processor.
The processor will be listening to the /nifi/input folder and fetch files that are found in the folder.
- Click on the processor Icon and drag it to the canvas (drawing sheet)
- In the filter, write “getfile” and click Add
- In order to configure the processor, right click on the processor → configure & click on the properties tab
- Update the “input directory” with the value /nifi/input , this folder include our text files
- As part of the pipeline, we wish that once the NIFI process the file found in the /nifi/input directory, the file should be removed. Make sure that “Keep Source File” value is set to False
- Click Apply— There are more parameters that you can configure — try them
6. Add a processor that replace text. This processor will look for the string before in each file that was fetched and replace it with the string after
- Click on the processor Icon and drag it to the canvas (drawing sheet)
- In the filter, write “replacetext” and click Add
- In order to configure the processor, right click on the processor → configure & click on the properties tab
- Update the “search value” with the value \bbefore\b , this is a regex filter the looks the string before
- Update the “Replacement value” with the value after. If “before” string will be found it will be replaced with the string “after”
- Click Apply — There are more parameters that you can configure — try them
7. Now we have 2 processors, each is individual processor.
In order to connect between the two, with your moue go to the center for the Getfile processor and you will see a blue arrow, click it and drag it to the Replace text processor. Once you will connect the two processors you will get a “create a connection” window, leave the default and click create.
A Queue was created between the 2 processors. Every time that the Getfile processor will fetch a file it will be moved to the replace-text processor via the queue.
Please note that since the Getfile processor in now connected to a new processor, meaning that the file can be moved to “somewhere”, A red stop button was assigned instead of a yellow warning sign.
8. Let's try and fetch the files
- Right click on the Getfile processor → start ; the processor will turn Green
- In the queue, you can find 2 records
- Right click on the queue → list queue
- The queue is holding the 2 files that were fetched from the /nifi/input folder. If you click on the “eye” icon of one of the files, you can see the file content
9. In order to have the option to start the Replace text processor we need to let the processor the option to move his output to “somewhere”. This “somewhere” should be an SFTP server.
- Click on the processor Icon and drag it to the canvas (drawing sheet)
- In the filter, write “putsftp” and click Add
- In order to configure the processor, right click on the processor → configure & click on the properties tab
- Update the “Hostname” with the value of your FTP server. In our exercise is the same server that running the nifi
- Update the “username” with the value of your FTP server.
- Update the “Password” with the value of your FTP server.
- Update the “Remote Path” with the value of the FTP dir.
- Click Apply — There are more parameters that you can configure — try them
10. Connect between the Replace-text to PutSFTP processor as explained on the previous steps
- “Create connection” window will popup, mark the relation “success” and click add. This will forward only files that the replace-text was able to process.
11. We can see that Replace text processor still showing a warning that failure relationship was not configured.
In the previous step we only configured success relationship, meaning that only if the file was processed Succes by the Replace text processor, move it to the putSFTP processor. In order to configure failure relationship, run the following steps
- Right click on the Replace text processor → configure
- Click on the relationships tab
- On failure, check the terminate — Meaning, that if the replace text processor failed to process the file (For example if it is a binary file or corrupted file) the Replace text processor will terminate \ delete the file.
- The Replace text processor changed to stop
12. Let's Start the Replace text processor
- Right click on the Replace text processor → start
- The messages from the previous queue (Between Getfile and Replace Text) were move to the new queue
- Right click on the queue → list queue
- Click on the “eye” icon next to text1.txt, The file content was replaced, the “before” string was replaced with “after” string”.
13. Final step of the pipeline,
we now need to move the files to the Sftp server. The putSFTP processor showing relationship errors
In order to configure the relationship, run the following steps
- Right click on the PutSFTP processor → configure
- Click on the relationships tab
- On failure, check the terminate — Meaning, that if the putSFTP processor failed to process the file the putSFTP processor will terminate \ delete the file.
- On reject, check the terminate — Meaning, that if the putSFTP processor cannot write the file to the SFTP server, the putSFTP processor will terminate \ delete the file.
- On success, check the terminate — Meaning, that once the putSFTP processor writes the file to the SFTP server, the putSFTP processor will terminate \ delete the file from its memory and NIFI.
14. Let's start the PutSFTP processor
- Right click on the Replace text processor → start
AND we Done!!!!
Verification
- Login to the Linux Box
- Navigate to /nifi/ftp and verify that the files exists
- Verify that the content of text1.txt file was changed. “Before” string was replaced with “after”
- Verify that no files exist under /nifi/input folder
If you liked this blog don’t forget to clap and follow me on both Medium and Linkedin
www.linkedin.com/in/davidzbeda
TIPS, Tricks and other
NIFI Stop \ Start \ Status \ Restart
In order to start, stop and check the NIFI status via script run the following steps
- nevigate to /nifi-app/bin
- run ./nifi.sh start — this will start the nifi application
- run ./nifi.sh stop— this will stop the nifi application
- run ./nifi.sh restart— this will restart the nifi application
- run ./nifi.sh status— this will show you the nifi application status
nifi.properties file
The nifi.properties is located under <nifi main folder>/conf folder. The file contains NIFI main configuration such as
- Listing IP & port
- Retention period
- Keystore location & password — this is required for opening the canvas on different server
Please note that on any configuration changes you must restart the nifi by running /nifi-app/bin/nifi.sh restart
login-identity-providers.xml file
The login-identity-providers.xml is located under <nifi main folder>/conf folder. This file includes the NIFI user and password (Encrypted)
Logs
NIFI logs can be found under <nifi main folder>/logs/nifi-app.log
Debug — Failure relationship
In our exercise we have set the failure relationship to terminate, meaning that if the processor failed to process the file, the processor deletes the file.
Let's assume that the putSFTP failed to write the files to the SFTP server and we want to keep the files instead of deleting them
- Delete the /nifi/ftp folder where the putSFTP processor should write the files
- Create a new processor named “retryflowfile” or “putfile” that will save the file in case of a failure
- Configure the connection between the 2 processors with failure and rejected
- Start the putSFTP processor
- Since the SFTP was not available ,The files were moved to the new queue