Getting Started with Apache NiFi – 5 Common Questions

This article covers some of the most common questions we get about the Apache NiFi. A more complete understanding of NiFi is available in our eBook Apache NiFi for Dummies.

 

NiFi For Dummies eBook – Download Here

 

What happens to any data if the system goes down or there is a power loss?

NiFi stores your data in a repository while it traverses its way through your system. There are three repositories – the “FlowFile Repository,” the “Provenance Repository,” and the “Content Repository.” Content starts out being written to a Flowfile Repository which is streamed to a Content Repository. As each processor does its job, the Provenance Repository is updated. If your operating system cache files remain intact, then you should be able to resume any flow where it left off. If you have a complete power loss, then your data remains, but some repositories may not have been updated, which may mean having to re-process some data. Alternatively, you can avoid relying on your operating system cache by configuring repositories in your NiFi.properties file to sync to disk, but this will hinder performance quite a bit.

 

 

I have set no prioritizers in a processor, what is the default prioritization scheme?

Firstly, check to be sure you have not set any prioritizers. The default prioritization scheme is to be undefined, but this may change depending upon the processor that you use. In most circumstances, a processor with no prioritizers will sort data as per the FlowFile’s Content Claim. The consensus was that this method would provide the highest throughput and most efficient reading of the data. Remember that you can build your own processors, which means you can set your own defaults.

 

 

 

Right click on a connection and select configure

 

How can I make dataflows look nicer?

Apache NiFi canvases can get very complicated and a clean and well organized canvas makes a flow easier to understand. One way to make it look nicer is to add a bend-point (elbow) on a connection by double-clicking the connection at the desired point. Use your mouse cursor to grab the point on the connection and then bend it as you wish. If you wish to remove the bend point, then double click it again. It is also possible to move a label that you put on a connection so that it sits on a bend point.

 

Inserting a improves flow readability

 

When is a piece of data under NiFi’s control?

Whenever content is read/ingested, a Flowfile is created and the Flowfile will take the data through NiFi’s data flow like a bus carrying passengers to different destinations – an office building, a restaurant, a park. Content is read/ingested, a Flowfile is created, and the FlowFile’s content is generated in a ProcessSession. Once that session has been committed, the data is then considered to be under NiFi’s control.

 

 

You can follow along using our Auto-Launching NiFi – Learn How Here

 

How do I send data from a single NiFi instance to another?

One method by which you may send data between NiFi instances is to “Pull.” This sending instance brings data all the way to an output port. The receiving instance has a remote process group pointing at the output port of the sending instance. The other method is to “Push.” This is where the sending instance brings data to a remote process group that is pointing at a receiving NiFi instance, and the receiving NiFi instance has an input port in order to receive the data.

 

Next Steps:

Start using Apache NiFi on AWS – Try it now

Read Apache NiFi for Dummies for learn more

Learn how to use Apache NiFi Controller Services