### Problem Statement
- Stage 1: Tweet sentiment analysis
- Bullish (buy the stock), Bearish (sell the stock), Neutral (do nothing).
- Stage 2: Stock price movement
- 
- Perhaps only work on TESLA
### Method
- Tweet sentiment analysis:
- FinBERT:
- The positive label corresponds to buy and the negative label corresponds to sell, not necessarily positive and negative in the traditional sense.
- Finetuned by the "[zeroshot/twitter-financial-news-sentiment](https://huggingface.co/datasets/zeroshot/twitter-financial-news-sentiment)" dataset
- Twitter-roBERTa-base for Sentiment Analysis
- BERT
- Stock price movement
- Features:
- Adjusted Closing Price from t-n to t-1 days (from yfinance)
- Weighted sentiment scores from stage 1 from t-n to t-1 days (n: lookback window size)
- -1: Bearish, 0: Neutral, 1: Bullish
- Integrate the sentiment score of all of the tweets in a day by their number of retweets and normalize
- Model: Random Forest Classifier
### Result
- Tweet Sentiment Analysis
- 
- The syntax and language on Twitter are noticeably different that other texts in news articles and across the internet, within the subject of finance. Therefore, preprocessing and finetuning are important
- Stock price movement
- Only use the Finetuned FinBERT since it has the best performance on stage 1
- 
- This weighted sentiment idea parallels the idea of using attention mechanisms.
- Analysis lookback window size
- 
- They picked n to be 14 eventually
### How can we improve
- Use LSTM to deal with the lookback window
- Combine different financial text dataset to make more features
- Feature importance
- Use this [dataset](https://www.kaggle.com/datasets/utkarshxy/stock-markettweets-lexicon-data/data)
$score = (-1) * probability of being negative + 1 * probaility of beging positive$