m5241117 Masaki Endo
m5241125 Seiyu Majima
m5241144 Yusuke Namiki

Project Proposal

Problem Description

Recently, the social networking services like twitter and facebook more popular and they are commonly used all over the world. They made easy to communicate all over the world. However, they also produced new problem. The slander is spread through the SNS. It sometimes cause serous trouble.
We will work to make a model to predect the sentiment through the twitter. The work will be useful for filtering slander, suggesting to reconsider the message.

Dataset

https://www.kaggle.com/c/tweet-sentiment-extraction/

Data Columns:
textID - unique ID for each piece of text
text - the text of the tweet
sentiment - the general sentiment of the tweet
selected_text - [train only] the text that supports the tweet's sentiment(maybe be we don't use the data)

Methodology/Algorithm

According the related work, RNN model is good with word embedding. So that we will try to make RNN model.
In addition, we will try other model like LSTM, GRU and find the better model for higher accuracy.

"Sentiment Analysis Based on Deep Learning: A Comparative Study",Nhan Cach Dang, María N. Moreno-García, and Fernando De la Prieta,Electronics 2020, 9, 48

This paper uses RNN, CNN, DNN model and compares the result of these model.
The paper concludes the RNN model with Word Embedding is better than other models. The combination of CNN and LSTM is the best accuracy, however, the approach takes too much computation costs and from a accuracy and computation costs point of view, the paper cannot conclude the best model.

Evaluation Plan

The dataset has text data and sentiment labels. We use the text data as input and sentiment labels as label. Comparing the prediction by model and label, we can get the accuracy. We will evaluate the accuracy and compare the accuracy for each model. In addition, we will evaluates calculates training computation time for future referece.

Select a repo