Introducing Text Analytics in the Azure ML Marketplace

This blog post is authored by Nagender Parimi, Software Engineer at Microsoft.

Understanding and analyzing unstructured text is an increasingly popular field and includes a wide spectrum of problems such as sentiment analysis, key phrase extraction, topic modeling/extraction, aspect extraction and more.

In that context, we are excited to announce the launch of a new Text Analytics service in the Azure ML Marketplace. In this initial release, we offer APIs for sentiment analysis and key phrase extraction of English text. No labeled or training data is needed to use the service – just  bring your text data. This service is based on research and engineering that originated in Microsoft Research and which has been battle-tested and improved over the past few years by product teams such as Bing and Office. This post covers some of the nitty-gritties of how this new service works.

Sentiment Analysis

Let’s say you run a website to sell handicrafts. Your users submit feedback on your site, and you’d like to find out what users think of your brand, and how that changes over time as you release new products and features to your site. Sentiment analysis can help here – given a piece of text, the Azure ML Text Analytics service returns a score between 0 and 1 denoting overall sentiment in the input text. Scores close to 1 indicate positive sentiment, while scores close to 0 indicate negative sentiment.

There are a number of challenges which make sentiment analysis an interesting problem. A simple approach is to keep a lexicon of words or phrases that impart negative or positive sentiment to a sentence, e.g. the words “bad”, “hate”, “not good” would belong to the lexicon of negative words, while “good”, “great”, “like” would belong to the lexicon of positive words. But this means such lexicons must be manually curated, and even then they are not always accurate.  Consider the use of the word unpredictable: in the context of describing an action movie’s plot (“the movie had an unpredictable ending”) it may denote positive sentiment, while in the context of your cellphone’s call quality, it is not a good sign.

The Machine Learning Approach

A more robust approach is to train models that detect sentiment. Here is how the training process works – we obtained a large dataset of text records that was already labeled with sentiment for each record. The first step is to tokenize the input text into individual words, then apply stemming. Next we constructed features from these words; these features are used to train a classifier. Upon completion of the training process, the classifier can be used to predict the sentiment of any new piece of text. It is important to construct meaningful features for the classifier, and our list of features includes several from state-of-the-art research: 

  • N-grams denote all occurrences of n consecutive words in the input text. The precise value of n may vary across scenarios, but it’s common to pick n=2 or n=3. With n=2, for the text “the quick brown fox”, the following n-grams would be generated – [ “the quick”, “quick brown”, “brown fox”]

  • Part-of-speech tagging is the process of assigning a part-of-speech to each word in the input text. We also compute features based on the presence of emoticons, punctuation and letter case (upper or lower)

  • Word embeddings are a recent development in natural language processing, where words or phrases that are syntactically similar are mapped closer together, e.g. in such a mapping, the term cat would be mapped closer to the term dog, than to the term car, since both dogs and cats are animals. Neural networks are a popular choice for constructing such a mapping. For sentiment analysis, we employ neural networks that encode the associated sentiment information as well. The layers of the neural network are then used as features for the classifier.


The classifier trained above was performing well on our internal datasets, so we wanted to compare it against external offerings. We evaluated its performance against two external services – the Stanford NLP Sentiment Analysis engine (using its pre-trained sentiment model), and a popular commercial tool. Here are the comparative benchmarks:

  • On datasets comprising tweets, Azure ML Text Analytics was 10-20% better at identifying tweets with positive vs negative sentiment. We used tweets data from Sentiment140 and CrowdScale. Here is the comparison of the three systems on area under the ROC curve:

    • On user review datasets, Azure ML Text Analytics was 10-15% better. We analyzed sentiment on a dataset of TripAdvisor reviews, here is a comparison of the results based on the F1 score:

As you can see, Azure Machine Learning Text Analytics outperforms other offerings on short as well as long forms of text for the sentiment analysis task. This service can also extract key phrases, which denote the main talking points in the text. We have created a demo website to start playing with this service right away (no sign-up required):

So go ahead and give this new service a spin. We plan to further enhance the capabilities we offer, so if you have ideas for new features you’d like to see, do share your thoughts in the comments below.