Classification of Airline Tweet Using Nave-Bayes Classifier For Sentiment Analysis
Classification of Airline Tweet Using Nave-Bayes Classifier For Sentiment Analysis
Classification of Airline Tweet Using Nave-Bayes Classifier For Sentiment Analysis
Abstract—Wide range of customers like family persons, This work examined different existing solutions with
business man, sportsman and youth are traveling via Airline. observing data classification algorithm, these approaches
Hence feedback of persons matters a lot if they are involved. are the major approach for feature extraction deriving
Direct feedbacks of customer may be positive or negative but
sentiments and opinions of different users. Here, Naïve
analyzing their Tweets is important for the betterment i.e.
Byes Classifier is considered to classify user opinion
how the Tweets is? Analysis of individual tweet is so much
critical if the volume is high. Most of the times Tweets are
[tweets].
ambiguous, which depends on the nature of customers i.e. Sentiment based approach is presented in this paper for
positive person will always give positive Tweets and in other calculating polarity of ambiguous data. NLP is the Natural
side negative tweets come from negative person. So
Language Processing whose important task are Sentimental
ultimately, our work is to find the sentiments of descriptive
Analysis or Opinion mining. It has gained a wide attention
Tweets as a result via their words and their expression in
quantities format whether they are happy or not. The novel in the modern period. Sentimental polarity categorization
factor inside this work is to examine ambiguous Tweets and problem is tackled in this paper, and it is the fundamental
neutralize them according to proposed algorithm. issue of sentimental analysis. Experimental approach for
The complete work is based on twitter dataset, here we sentimental polarity categorization is proposed in
are using US airline and performs different level of mining descriptive way.
and processing for getting most accurate results. Here,
improved sentiment analysis model has been proposed based Based on contextual ambiguity issue arises of polarity
on naïve Bayes classifier to classify tweets based on calculation for sentimental analysis. Different polarity is
sentiments and neutralized tweets from ambiguous to calculated for different context using opinion keywords, it is
positive or negative. a big challenge for researchers in the area of sentimental
Keywords—Sentimental analysis; Ambiguous analysis. This issue of polarity is resolved ineffectively
words; Airline dataset; Naïve-Bayes classifier from term-level features.
I. INTRODUCTION Earlier, opinion mining effectively deals in progressive way
Sentiment analysis is the only innovative technique that with different scenario by developing document level
facilitates to investigate the thinking and thoughts of the any analysis [1][2][3]. In sentimental analysis many issues arise
kind of user. It can be defined as: “Sentiment analysis, also with subjective detection and sentiment classification.
called opinion mining, is the field of study that analyzes
In general, way sentimental analysis focuses on determining
people’s opinions, sentiments, evaluations, appraisals,
writer’s attitude with respect to complete context of
attitudes, and emotions towards entities such as products,
document with calculating polarity. This attitude of speaker
services, organizations, individuals, issues, events, topics,
and writer evaluates user’s judgment in a state of emotions
and their attributes.”
of writer when writing with emotional communication.
Twitter data or social networking sites are to know about
Important task of sentimental analysis is the classification of
behavior and opinion of users. It may help to track the
polarity of text in a document and also feature level is
interest and connectivity of user with respective of their
expressed through opinion of document either it’s positive
viewpoint. Let’s consider example of feedback system,
or negative or neutral. Sentiment classification for polarity
where opinion and different viewpoint of users can observe
in an emotional state can be as sad, happy, or angry.
from their feedback. Feedback system lies on the concept of
marking leads to calculate the relationship on basis of II. LITERATURE SURVEY
various points, despite descriptive feedback can also plays A. Existing Work
important role to understand best way of opinion. Trupthi et al. [1] described about sentimental analysis and
Descriptive or subjective feedback can help to understand opinion mining, which explores about users’ sentiments and
the demand and need. It can be written as subjective form of thinking. Author uses Naive Bayes classifier for
feedback to express thought or user view. Thus, it may be examination of real data twitter data and for the
considered as the great source for analysis.
71
Authorized licensed use limited to: Somaiya University. Downloaded on December 28,2023 at 13:29:24 UTC from IEEE Xplore. Restrictions apply.
Although, Existing system provides good way to Airline Tweet Dataset from Kaggle data repository. A link
explore user sentiments but still suffer with certain to detail description is cited below;
limitations: https://www.kaggle.com/crowdflower/twitter-airline-
• It considers words as individual effort and sentiment#Tweets.csv
tokenize sentence in form of words. Step 2: Data Pre-processing:
• They state, “Future enhancement to this work a. Data Cleaning: Data cleaning is done to remove
might be to use n-gram classification rather than redundant data, irrelevant tweets, image short path &
limiting to uni-gram” to overcome this issue. unwanted link.
• Existing solution has used Uni-gram Naïve b. Lemmatization: Stanford Lemmatize Library is used.
classifier, which considers word probabilities for Lemmatization refers to vocabulary and morphological
training, and testing purposed both. words. It removes infected word and concentrate on
• Sentiment analysis of whole sentence can help to words involved in dictionary.
reach more close to user viewpoint. c. Tokenization: Here, sensitive data is replaced with
• At time of classification, they consider individual non-sensitive equivalent data and are represented as
word rather actual sentence. They also suggest to tokens.
use on n-gram Naïve classifier rather uni-gram. Step 3: Proposed Estimation:
• Authors do not consider ambiguity as serious a. Ambiguity based polarity estimation: To check polarity
problem and only concentrate to improve of ambiguity word. Whether the word is positive,
performance of sentiment analysis. negative, partial positive, partial negative or neutral.
b. This work will initially classify Tweets in three
• Existing solution may be compromised for
categories which are positive, negative and ambiguous.
taunting tweets or indirect comments which
Example of ambiguous Tweets is cited below;
comes with positive words but used in negative
Ambiguous tweet: “This is good airline service but
purpose.
comes with high tariff. They do not provide good
With using different algorithms, different phrases can
service. Best part of this airline is they are always
be identified with positive or negative annotation based on
available. “
frequency of occurrence of phrase and through it; its
Positive tweet: “This one is the best airline comes with
weight can be calculated based on phrases. Tweets are the
awesome services and facility. 100 % recommendation
online data of social site, which contains variety of flaws
for new once”
with the probability to hinder sentiment analysis.
Negative tweet: “Worst services experience. Air
Limitations arises here is quality of opinion because of
hostess don’t know about their responsibilities”
the right to freely post, meaningless contents due to online
c. This work will investigate ambiguous tweets and
spammers. Another one is truth; only opinions are taken as
observe sentiments of every word. After observing
truth, which can be as positive, negative or neutral.
sentiment for every individual, it also checks sentiment
Sentimental polarity is classified as sentence-level and
of previous and next word. In case of ambiguous word,
Tweets-level, where sentence level define sentiments
it fixes the word polarity based on occurrence of
through positive and negative terms. Data required for
previous and next whether they are positive or
sentence-level needs truth facts to convey sentence, and
negative.
this the significant which arises as the truth.
d. The complete Tweets will be forwarded to Naïve
IV. PROPOSED METHODOLOGY Bayes classifier.
Proposed work includes some modules in which flow Step 4: Classifier:
of the work is explained through proposed architecture. a. Naïve Bayes classifier: It is used for large volume of
Step involved in proposed methodology are as: dataset. Naïve Bayes classifier supposes that features
Functionality and Design: are independent with values of given class labels. It
• Data Acquisition will apply unigram and n-gram technique and
• Human labelling concludes results based on both techniques.
• Feature Extraction Step 5: Tweets Classification:
a. Sentiment calculation: polarity of sentiments is
• Classification
calculated.
Step 1: Data Collection and Preparation:
Data can be collected either directly from the user or
from the existing system. In proposed work, we have used
72
Authorized licensed use limited to: Somaiya University. Downloaded on December 28,2023 at 13:29:24 UTC from IEEE Xplore. Restrictions apply.
12. set threshold value & Naive Bayse's Classification
and run unigram
13. set threshold value & Naive Byse's Classification
and run n-gram
14. Calculate average of 12 & 13 and classify tweets.
15. Classification of positive & Negative Tweets
V. EXPERIMENT ANALYSIS
The complete result examination has been performed for
five different data sample. Proposed solution has been
examine and compared with previous results to justify that
proposed solution perform better than existing solutions.
Here, size of different data samples has been shown in
table 1.
TABLE I. DATASET
73
Authorized licensed use limited to: Somaiya University. Downloaded on December 28,2023 at 13:29:24 UTC from IEEE Xplore. Restrictions apply.
TABLE V. COMPUTATION TIME
Input 2 2459
Input 3 3569
Input 4 4963
Input 5 8050
Title Recall
Input 1 0.89
Input 2 0.91
Input 3 0.92
Input 4 0.93
Input 5 0.96
74
Authorized licensed use limited to: Somaiya University. Downloaded on December 28,2023 at 13:29:24 UTC from IEEE Xplore. Restrictions apply.
5.0.79 as worst and 0.90 as best f-score has been [9] Suchita V Wawre1, Sachin N Deshmukh2 , “Sentiment
Classification using Machine Learning Techniques”, International
observed which denotes a great performance for Journal of Science and Research (IJSR) Volume 5 Issue 4, April
ambiguous tweets 2016.
VII. FUTURE WORK [10] Alessia D’Andrea, Fernando Ferri, Patrizia Grifoni, Tiziana
Guzzo,“Approaches, Tools and Applications for Sentiment
Following future word is predicted for proposed solution. Analysis Implementation”,International Journal of Computer
1. Proposed solution can be implemented using big Applications (0975– 8887) Volume 125 – No.3, September 2015.
[11] Hailong Zhang, Wenyan Gan, Bo Jiang, “Machine Learning and
data technology for large dataset. Lexicon based Methods for Sentiment Classification: A Survey”,
2. Proposed algorithm can be used to classify 978-1-4799-5727-9/14 $31.00 © 2014 IEEE DOI
different category tweets except airlines. 10.1109/WISA.2014.55.
[12] Turney PD, Littman ML. Measuring praise and criticism: inference
3. Proposed algorithm can be implemented with other of semantic orientation from association. ACM Trans Inf Syst.
classification techniques such as SVM and KNN to 2003;21(4):315–46.
achieve better performance.
REFERENCES
75
Authorized licensed use limited to: Somaiya University. Downloaded on December 28,2023 at 13:29:24 UTC from IEEE Xplore. Restrictions apply.