Over the past few years we have seen a rise in cloud native “machine learning” models. These general use models are accessible via API calls and maintained by their respective cloud providers. There is a lot of depth to explain in how the models are used, and why you should try them vs make your own. This is a Crash Course for the five main forms of Amazon’s real-time Comprehend Service.
Detect Entities - This is the method you want to use in order to detect “Named-entities” aka nouns that refer to a specific thing. It is especially useful if you want to detect places, companies, or organizations. Interestingly it also detects “commercial items” which are branded product names. If you feed it this sentence
“Amazon, usually shortened to AWS, is a good cloud provider.”
It will detect two entities, Amazon & AWS with a high degree of confidence. It does not however detect cloud as that is not a named-entity. It is very important to note that Detect Entities will attempt to categorize the entity, something that other methods do not perform.
Key Phrases - This method attempts to detect the most important parts of a sentence but does not to do any further analysis on it. This allows us to cast a broader net at the cost of categorization and deeper syntax.
Sending it the same sentence “Amazon, usually shortened to AWS, is a good cloud provider.”
correctly pulls out the two organizations from Detect entities but also grabs the “cloud provider” section. Further analysis would most likely need to be done on these snippets but it is especially good for a first analysis or feeding later models/rules.
Language - Language is probably the most straightforward of all the NLP methods Amazon offers. It does exactly what it claims to do, tells you what language the text is in. There really isn’t much more to say about it!
Sentiment - This gives you the overall emotion of the text as it falls into one of four categories: Neutral, Positive, Negative, Mixed. One shortcoming of AWS Comprehend Syntax is that it doesn’t break down what phrases contribute to the sentiment. Other platforms can do that and I will be writing about those later. Importantly though this can be used in combination with Detect entities to determine how the text is referring to a specific brand or product. To refer to our other
Syntax - think of this method as your automated ‘mini’ English teacher. Its sole purpose is to determine what part of language each specific word falls into. Much like Detect Entities it can determine that Amazon and AWS are proper nouns but it also adds that the word “to” is an Adposition.
Using AWS Comprehend in Practice
AWS makes its methods available through a collecting of SDKs. Typically to get started you would need to write them into your existing application or create a new lambda function. We have been working on a way to make calling these functions possible through NiFi with our Comprehend Processor Bundle.
This allows you to pass a flowfile into the processor and get back the results. You can pass it either as a flowfile body or attribute and get the results without having to write any additional code.
Sample Statement being sent in via Attribute and outputting results as a flow-file payload.