# NLP Final Project (Project proposal)
## Artemii Bykov, DS-01
###### tags: `NLP`, `ML`, `Supervised learning`, `LSTM`, `Stane detection`, `Twitter`
#### Link to [Colab]()
[ToC]
### State the problem
The objective of was to identify the stance of Twitter users towards rumour tweets. Given a rumourous tweet (source) and its following replies, model should classify the stance of each tweet (including the source tweet). The type of reply could be one of the following:
* **Support**: responding user supports the veracity of the rumour
* **Deny**: responding user denies the veracity of the rumour
* **Query**: responding user demands additional evidence
* **Comment**: responding user’s tweet is not useful in determining the veracity of the rumour
### Proposed solution
* Remove URL, alias and other stuff
* Use pretrained Word2Vec model for word embeddings
* Find marker words such as negation, swear etc.
* Use the LSTM layer to predict the type of tweet (reply or source). Why LSTM? Because twitter discussion is a tree where each following reply somehow depends on the previous, we want to make a prediction based not only on current input but also use previous tweets and that is why LSTM is a good choice (I hope) for such purposes. I suppose processing will be by branches (source + replies)
* Use softmax to get on the output the probabilities of each class
* Moreover, I will use Dropout layers, but I should read about it more