202 – Streaming Log Data to MySQL on AWS

This course assumes you are familiar with the basics of Apache Nifi – Read more here

 

Follow along using our Auto-Launching Nifi on AWS – Learn how here

 

Streaming data into a MySQL database for later analysis and review is a common use case. Nifi is able to offer an end to end one stop solution from picking up the data to inserting it into your database. In this example we will start by tailing a log file, parsing the data, and landing it in an Amazon RDS (mySQL) database.

 

Start With the Template

There is a Nifi template that accompanies this demo. You can download it here and upload it into your own instance. Download the Template here [Click]

 

Configuring your database

The AWS RDS configuration is mostly just out of the box. Even if you are not using RDS be sure that the database is accessible on port 3306 from the Nifi node. Typically the default security group allows this but your organization may have implemented different rules. Although it isn’t necessarily the tightest security practice, MySQL’s 3306 port can be open to the world as it is password protected. A production based database should have a tighter security group, potentially only white-listing the Nifi node itself.

Key attributes to note down for the database

  • The Endpoint

  • The Port (Default: 3306)

  • Target Schema

  • Target Table

  • Fields in Target Table

 

Mysql Security Group – optionally restrict the source

 

Calculated Systems is an AWS partner and can help you setup your environment end to end. If you have questions or would like help use the chat box in the lower right or email us at info@calculatedsystems.com

 

For our demo we are using

 

 

Configuring your MySQL driver

Depending on the version of MySQL you are integrating with you may need to download a specific driver. For RDS you may need to ssh into your Nifi node and run this one command. The rest of this example assumes this is the method used to acquire the connector. If you used an alternate method be sure to use a different class path.

  • sudo yum install -y mysql-connector*

 
 

Follow along with our Quick Start Nifi on AWS

-and-

Download the template and try it yourself

 

Setting up a pooled controller service

Nifi handles connection pools to databases to help ensure they do not get overloaded by multiple processors all at once. These connection pools are represented as a Controller Service. These controller services can live in a process group. In the template for this exercise(LINK TO TEMPLATE) the Controller Service ‘DBCPConnectionPool’ is part of the ‘Send RDS(Mysql)’ process group. You can view it by right clicking on the process group and going to configure

 

 

In the controller services tab you can configure the process group(gear icon) and go to its properties. Several attributes need to be configured in order to stream data to RDS. For our example we use

  • Database Connection URL: jdbc:mysql://xxxxx.us-east-1.rds.amazonaws.com:3306/Nifi_demo

  • Database Driver Class Name: com.mysql.jdbc.Driver

  • Database Driver Location: /usr/share/java/mysql-connector-java.jar

  • Database User: admin

  • Password: XXXXX

After configuring those variable hit ‘Apply’ and then the lightning bolt next to the gear to start the controller service

 

Using the Template

The template consists of two control groups and a routing processor This is a functional break down of the steps needed to stream data into MySQL

Retrieve & Parse

For a data source we are simply using the nifi-app.log as it is included in every installation of nifi. This log file has new lines appended to it regularly and provides an easy example of a typical tailing use-case. Every time Nifi detects a change it produces a flow file. This is then split into individual lines and parsed to extract the logLevel, logTime, and logMessage.

 

 

RouteonAttribute

This determines if a file is an ‘INFO’ event or not. All log levels except info are dropped at this stage

 

Send to RDS(MySQL)

This processor group takes the previously extracted and parsed messages and lands it in RDS. The first processor, UpdateAttribute renames the relevant attributes to be on par with the target MySQL table. Next the AttributesToJSON processor formats it into an easy to read JSON message before it is converted to a SQL insert and finally landed in the database. This configuration is designed to add lines one at a time but depending on the requirements we could also batch the insert

 

 
Post to MySQL
 

Conclusion

Using Nifi is a quick, easy, and effective way to stream data into MySQL. The combination of drag and drop, real time processing, and flexibility make it ideal for data movement and event based processing.

 

Next Steps:

Download the Template

Learn more about moving data to the cloud