Sentiment Analysis: a basic application using NodeXL
In an attempt to spend time during the Lockdown and an effort to
better apply certain techniques learnt during my days at Indian School
of Business, I decided to do sentiment analysis for #budget2020 with
data from Twitter. The time frame chosen is the period just after
the Finance Budget presentation at the Assembly. I remember having extracted
the data during Term 7, immediately after being introduced to the art of Social Listening. This is a very elementary treatment of a yet more powerful
technique and is aimed at building basic understanding of the topic.
What is
Sentiment Analysis?
Any brand, business or body that interacts with the society, at
large, can expect reactions to its actions (or words). For instance, when
Nestle launches a new Maggi variant, say "Fusian Bangkok", it
can expect people to either like it, dislike it or just remain indifferent to
the launch. Appropriate measurement of such market's reaction can lead to
interesting insights. Nestle can always gauge response to its new launch from
direct sales reports it has, in terms of purchases..... but there is so much
more to know. The former might want to know which quality of the product is
delighting customers or alienating them. Such enlightening information might be
used to modify product offering or change advertisement narratives or adopt any
corrective/ augmentative business decision.
In today's world, Communication has essentially become digital.
Businesses communicate with their users digitally and similar, vice versa.
People speak about brands on social media channels, in targeted group
conversations or react to newsfeeds, commenting freely about their feelings for
the same. This is a potential gold mine for any business, in terms of getting
access to consumer feedback. It is in this context, that Sentiment Analysis
comes in. It provides quantitative and definite means to measure primarily the
following:
- How
does the market/ target audience feel about a business's actions -
positive or negative, angry or sad, delighted or disappointed?
- Why
does the user feel the way she feels - what drives the sentiment?
Sentiment Analysis depends completely on Text Analytics, to
analyse and understand every word in user generated text content. Stop-words
like articles and prepositions are generally omitted because of their
ubiquitous usage and only specific words like nouns and adjectives are analysed,
as they convey definite meanings. A pre-defined lexicon or dictionary is used
to refer and assign +ve or -ve numeric weights to words in a comment (or
review, etc.) such that every comment ends up with a net sentiment score.
The case
In the context I have chosen, it is Government of India's 2020
budget announcement, whose market reactions we shall analyze. The social media
platform from which we shall fetch the data is Twitter and the tool for
analysis is NodeXL.
Our goal will be to draw a conclusion regarding
whether Twitter users feel positive or negative about the budget.
Obviously, this is a sample study and cannot be treated as the final statement
of the subject.
NodeXL – Twitter Search helped me extract 4521 data points or
tweets, related to the #budget2020 query. The generated network graph
visualization is shown below:
NodeXL network visualization: Twitter - tweets
on #budget2020 Note the red dot in the middle: it's Mrs. Nirmala Sitharaman's tweet on the above #tag |
The above represents the network of tweets, with each node or dot representing a tweet. Detached circles towards the periphery of the network mean standalone tweets. Whiskers or the lines between the dots represent retweets, mentions or replies between the connected tweets.
The
Analysis
We have used the feature “Graph Metrics – Words & Word pairs”
to gauge sentiment among twitter users regarding Budget 2020. The basis for
classifying a tweet or a word as “Positive” or “Negative” is in-built list in
NodeXL specifying “positive words” and “negative words”.
The most evident output is as follows:
Words
in Sentiment List#1: Positive
|
2947
|
Words in Sentiment List#2: Negative
|
2584
|
This essentially means that for "budget2020", the counts
of words flagged as positive and negative are very close. A more skewed
distribution would have indicated a definite ideological polarization and
imbalance.
The generated output also gives "Count" and
"Salience". Whereas, "Count" is self-explanatory,
"Salience" is perhaps not. Salience score is a prediction of what a
human would consider to be most important in the given text.
I sorted the data set as per Salience, in descending order, such
that we had the most important words at the top of the list.A closer
examination of the words appearing at the top shows that while there are many
a-sentimental words like "india", "speech", "data"
and "money", there are also a few words like
"discrepancies" and "secretly", which are salient, but do
not feature in NodeXL's list of positive/negative words ("trump" has
been flagged Positive; Odd but that's how things work!!). My understanding in
the context suggests that the latter two words indicate negative sentiments. If
I duly correct the "Words in Sentiment List#2:Negative" to
incorporate my intuition, the word statistics shared previously changes:
Words
in Sentiment List#1: Positive
|
2947
|
Words in Sentiment List#2: Negative
|
5295
|
Our
conclusion immediately gets fortified that Twitter users are
having an overall "negative sentiment" about the
"budget2020".
However,
at the same time, the above analysis is prone to false interpretations, since
we have selectively flagged two words "with high count" as negative
thus, disbalancing the table.
A
more detailed and accurate exercise should involve examining more words (top
down in sorted salience table), so that more words (which are context
specific, for ex. "changa si", which implies good condition in
Punjabi) are flagged either positive or negative. In that way, one would be
considering more data before passing statement on sentiment.
Here, I have not done such exhaustive analysis as pointed
above, since the aim is not to do a research study but develop basic
understanding.
Whereas,
labeling each word as positive or negative can be a bit of a judgmental issue,
analysis of “Word pairs” is less prone to individual level judgement. This
examination of word pair (words can be analysed in threes or fours, too)
appearances and frequencies is very important because often, words mean very
little in singularity, but convey the world in pairs. An
examination of the “Word pair” list throws up some interesting and definitive
insights.
The
top 20- word pairs have a total appearance count of 16493, among which the
negative smelling word pairs contribute a healthy 7066.
|
The following word pairs appear in the top 20, which cry
“negative” of the screen!! (We could not find though any word pair that even
moans “positive”!!)
·
discrepancies + budget
·
data + discrepancies
·
ministry + secretly
·
secretly + corrects
·
seeks + answers
·
data + fudging
·
fudging + bjp
·
govt + secretly
·
secretly changes
This was an example of how to conduct basic Sentiment Analysis
from Twitter data using NodeXL.
Nice try!!
ReplyDelete👍🙏
Delete