Sentiment Analysis: a basic application using NodeXL


In an attempt to spend time during the Lockdown and an effort to better apply certain techniques learnt during my days at Indian School of Business, I decided to do sentiment analysis for #budget2020 with data from Twitter. The time frame chosen is the period just after the Finance Budget presentation at the Assembly. I remember having extracted the data during Term 7, immediately after being introduced to the art of Social Listening. This is a very elementary treatment of a yet more powerful technique and is aimed at building basic understanding of the topic.

What is Sentiment Analysis?
Any brand, business or body that interacts with the society, at large, can expect reactions to its actions (or words). For instance, when Nestle launches a new Maggi variant, say "Fusian Bangkok", it can expect people to either like it, dislike it or just remain indifferent to the launch. Appropriate measurement of such market's reaction can lead to interesting insights. Nestle can always gauge response to its new launch from direct sales reports it has, in terms of purchases..... but there is so much more to know. The former might want to know which quality of the product is delighting customers or alienating them. Such enlightening information might be used to modify product offering or change advertisement narratives or adopt any corrective/ augmentative business decision.

In today's world, Communication has essentially become digital. Businesses communicate with their users digitally and similar, vice versa. People speak about brands on social media channels, in targeted group conversations or react to newsfeeds, commenting freely about their feelings for the same. This is a potential gold mine for any business, in terms of getting access to consumer feedback. It is in this context, that Sentiment Analysis comes in. It provides quantitative and definite means to measure primarily the following:
  • How does the market/ target audience feel about a business's actions - positive or negative, angry or sad, delighted or disappointed?
  • Why does the user feel the way she feels - what drives the sentiment?
Sentiment Analysis depends completely on Text Analytics, to analyse and understand every word in user generated text content. Stop-words like articles and prepositions are generally omitted because of their ubiquitous usage and only specific words like nouns and adjectives are analysed, as they convey definite meanings. A pre-defined lexicon or dictionary is used to refer and assign +ve or -ve numeric weights to words in a comment (or review, etc.) such that every comment ends up with a net sentiment score.

The case
In the context I have chosen, it is Government of India's 2020 budget announcement, whose market reactions we shall analyze. The social media platform from which we shall fetch the data is Twitter and the tool for analysis is NodeXL.

Our goal will be to draw a conclusion regarding whether Twitter users feel positive or negative about the budget. Obviously, this is a sample study and cannot be treated as the final statement of the subject. 

NodeXL – Twitter Search helped me extract 4521 data points or tweets, related to the #budget2020 query. The generated network graph visualization is shown below:
NodeXL network visualization: Twitter - tweets on #budget2020
Note the red dot in the middle: it's Mrs. Nirmala Sitharaman's tweet on the above #tag
The above represents the network of tweets, with each node or dot representing a tweet. Detached circles towards the periphery of the network mean standalone tweets. Whiskers or the lines between the dots represent retweets, mentions or replies between the connected tweets. 

The Analysis
We have used the feature “Graph Metrics – Words & Word pairs” to gauge sentiment among twitter users regarding Budget 2020. The basis for classifying a tweet or a word as “Positive” or “Negative” is in-built list in NodeXL specifying “positive words” and “negative words”.

The most evident output is as follows:

Words in Sentiment List#1: Positive
2947
Words in Sentiment List#2: Negative
2584

This essentially means that for "budget2020", the counts of words flagged as positive and negative are very close. A more skewed distribution would have indicated a definite ideological polarization and imbalance.

The generated output also gives "Count" and "Salience". Whereas, "Count" is self-explanatory, "Salience" is perhaps not. Salience score is a prediction of what a human would consider to be most important in the given text.

I sorted the data set as per Salience, in descending order, such that we had the most important words at the top of the list.A closer examination of the words appearing at the top shows that while there are many a-sentimental words like "india", "speech", "data" and "money", there are also a few words like "discrepancies" and "secretly", which are salient, but do not feature in NodeXL's list of positive/negative words ("trump" has been flagged Positive; Odd but that's how things work!!). My understanding in the context suggests that the latter two words indicate negative sentiments. If I duly correct the "Words in Sentiment List#2:Negative" to incorporate my intuition, the word statistics shared previously changes:

Words in Sentiment List#1: Positive
2947
Words in Sentiment List#2: Negative
5295



Our conclusion immediately gets fortified that Twitter users are having an overall "negative sentiment" about the "budget2020".

However, at the same time, the above analysis is prone to false interpretations, since we have selectively flagged two words "with high count" as negative thus, disbalancing the table.

A more detailed and accurate exercise should involve examining more words (top down in sorted salience table), so that more words (which are context specific, for ex. "changa si", which implies good condition in Punjabi) are flagged either positive or negative. In that way, one would be considering more data before passing  statement on sentiment.

Here, I have not done such exhaustive analysis as pointed above, since the aim is not to do a research study but develop basic understanding.

Whereas, labeling each word as positive or negative can be a bit of a judgmental issue, analysis of “Word pairs” is less prone to individual level judgement. This examination of word pair (words can be analysed in threes or fours, too) appearances and frequencies is very important because often, words mean very little in singularity, but convey the world in pairs.  An examination of the “Word pair” list throws up some interesting and definitive insights. 

The top 20- word pairs have a total appearance count of 16493, among which the negative smelling word pairs contribute a healthy 7066.

The following word pairs appear in the top 20, which cry “negative” of the screen!! (We could not find though any word pair that even moans “positive”!!)

·         discrepancies + budget 
·          data + discrepancies
·          ministry + secretly
·          secretly + corrects
·          seeks + answers
·          data + fudging
·          fudging + bjp
·          govt + secretly
·          secretly  changes  

    The rest of the word pairs do not convey any definite sentiment. Overall, our initial conclusion stands re-validated.
         

This was an example of how to conduct basic Sentiment Analysis from Twitter data using NodeXL.


Comments

Post a Comment

Popular posts from this blog

Post-Covid world; a common narrative

Social Investing in India & Rang De