In the last years, Sentiment Analysis has become a hot-trend topic of scientific and market research in the field of Natural Language Processing (NLP) and Machine Learning. Following are the five useful things you need to know about Sentiment Analysis that are connected to Social Media, Datasets, Machine Learning, Visualizations, and Evaluation Methods applied by researchers and market experts.
1. Social Media are the main resource:
Sentiment Analysis examines the problem of studying texts, like posts and reviews, uploaded by users on microblogging platforms, forums, and electronic businesses, regarding the opinions they have about a product, service, event, person or idea.
The most common use of Sentiment Analysis is this of classifying a text to a class. Depending on the dataset and the reason, Sentiment Classification can be binary (positive or negative) or multi-class (3 or more classes) problem.
In addition, among researchers and stakeholders, you can find either similar or completely different opinions concerning the relation between emotion detection and sentiment analysis, depending on their perspective. However, regardless the result or approach, they all adopt the same techniques.
2.Before starting the Sentiment Analysis:
Many evaluations and labeled sentiment datasets have been created, especially for Twitter posts and Amazon product reviews.
The most popular and widespread are:
- Stanford Twitter Sentiment
- Sentiment Strength Twitter Dataset
- Amazon Reviews for Sentiment Analysis
- Large Movie Review Dataset
- Sanders Corpus
- SemEval (Semantic Evaluation) dataset.
Also, anyone using the APIs provided by many platforms and forums can crawl and collect data. The most famous API is that of Twitter.
An initial step in text and sentiment classification is pre-processing. A significant amount of techniques is applied to data in order to reduce the noise of text, reduce dimensionality, and assist in the improvement of classification effectiveness. The most popular techniques include:
- Remove numbers
- Part of speech tagging
- Remove punctuation
- Remove stopwords
3. How to classify Sentiment?
This approach, employes a machine-learning technique and diverse features to construct a classifier that can identify text that expresses sentiment. Nowadays, deep-learning methods are popular because they fit on data learning representations.
This method uses a variety of words annotated by polarity score, to decide the general assessment score of a given content. The strongest asset of this technique is that it does not require any training data, while its weakest point is that a large number of words and expressions are not included in sentiment lexicons.
The combination of machine learning and lexicon-based approaches to address Sentiment Analysis is called Hybrid. Though not commonly used, this method usually produces more promising results than the approaches mentioned above.
4. Evaluation metrics:
As a classification problem, Sentiment Analysis uses the evaluation metrics of Precision, Recall, F-score, and Accuracy. Also, average measures like macro, micro, and weighted F1-scores are useful for multi-class problems. Depending on the balance of classes of the dataset the most appropriate metric should be used.
5. Visualise Results:
To visualize the results of Sentiment Analysis, many people employ well-known techniques, such as graphs, histograms, and confusion matrices. Because of present multiple data domains and tasks, visualizations approaches like wordcloud, interactive maps, sparkline-style plots are also very popular.