NLP Final Project (Project proposal)

# NLP Final Project (Project proposal) ## Artemii Bykov, DS-01 ###### tags: `NLP`, `ML`, `Supervised learning`, `LSTM`, `Stane detection`, `Twitter` #### Link to [Colab]() [ToC] ### State the problem The objective of was to identify the stance of Twitter users towards rumour tweets. Given a rumourous tweet (source) and its following replies, model should classify the stance of each tweet (including the source tweet). The type of reply could be one of the following: * **Support**: responding user supports the veracity of the rumour * **Deny**: responding user denies the veracity of the rumour * **Query**: responding user demands additional evidence * **Comment**: responding user’s tweet is not useful in determining the veracity of the rumour ### Proposed solution * Remove URL, alias and other stuff * Use pretrained Word2Vec model for word embeddings * Find marker words such as negation, swear etc. * Use the LSTM layer to predict the type of tweet (reply or source). Why LSTM? Because twitter discussion is a tree where each following reply somehow depends on the previous, we want to make a prediction based not only on current input but also use previous tweets and that is why LSTM is a good choice (I hope) for such purposes. I suppose processing will be by branches (source + replies) * Use softmax to get on the output the probabilities of each class * Moreover, I will use Dropout layers, but I should read about it more