# Project Survey
## MVP

## Logics

## Data Accessibility
### Twitter
* https://dev.twitter.com/streaming/public: application establishes connection with streaming endpoints and is delivered a feed of tweets
* POST status
* [Example response from request using API](https://dev.twitter.com/rest/reference/post/statuses/update)
* GET status
* [Without Media](https://dev.twitter.com/rest/reference/post/statuses/update)
* [With Media](https://dev.twitter.com/rest/reference/post/statuses/update_with_media)
* [User Timeline](https://dev.twitter.com/rest/reference/get/statuses/user_timeline)
* [Connecting to twitter's API](https://dev.twitter.com/overview/api/tls)
* [API Rate Limits](https://dev.twitter.com/rest/public/rate-limiting)
* [Working with Timelines](https://dev.twitter.com/rest/public/timelines)
* [Github: Tweepy](https://github.com/tweepy/tweepy)
* [Rate Limits](https://dev.twitter.com/rest/public/rate-limits)
### Facebook
- [Public Feed API](https://developers.facebook.com/docs/public_feed)
:::info
Access to the Public Feed API is restricted to a limited set of media publishers and usage requires prior approval by Facebook. You cannot apply to use the API at this time.
:::
- [Graph API](https://developers.facebook.com/docs/graph-api)
### Slack
- [Slack API](https://api.slack.com/web)
- [Similar project: Digest.AI](https://digest.ai/)
## Corpus
- [Twitter Sentiment Corpus](http://www.sananalytics.com/lab/twitter-sentiment/)
- [Github: twitter-corpus](https://github.com/bwbaugh/twitter-corpus)
- [Github: chat_corpus](https://github.com/Marsan-Ma/chat_corpus)
- [Ubuntu Dialogue Corpus v1.0.](http://dataset.cs.mcgill.ca/ubuntu-corpus-1.0/)
## Projects
- [suweet](https://github.com/bass3m/suweet): 87% similar to our project (lol), but written in Clojure.
- [twitter-trends-summarizer](https://github.com/yuva29/twitter-trends-summarizer)
- [Twitter-Topic-Modeling](https://github.com/jrn1989/Twitter-Topic-Modeling)
- [Twitter Word2vec pretrained model](http://www.fredericgodin.com/software/)
- [Github: ChatterBot](https://github.com/gunthercox/ChatterBot)
- [Github: ai-chatbot-framework
](https://github.com/alfredfrancis/ai-chatbot-framework)
- [Github: MemNN -- Memory Networks implementations](https://github.com/facebook/MemNN)
- [Github: Key Value Memory Networks](https://github.com/siyuanzhao/key-value-memory-networks): Inside contains tutorial for Memory Networks for NLP @ ICML 2016.
- [Chatbot platform: Recast.ai](https://recast.ai/)
- [Github: Summarize It](https://github.com/yask123/Summarize-it): Summarize chats in slack.
- Uses [Text Rank](http://web.eecs.umich.edu/~mihalcea/papers/mihalcea.emnlp04.pdf)
- [Github: awesome-bots](https://github.com/BotCube/awesome-bots)
- [Reddit: autotldr](https://www.reddit.com/user/autotldr)
- [Gensim: Phrase detection](https://radimrehurek.com/gensim/models/phrases.html)
- [Github: Newspaper3k](https://github.com/codelucas/newspaper): Article scraping & curation
- [Github: py-web-search](https://github.com/codelucas/newspaper): A Python module to fetch and parse results from different search engines.
- [Github: Multi-document summarization tool relying on ILP and sentence fusion](https://github.com/sildar/potara)
- And [many more](https://www.google.com.tw/search?q=multi-document+summarization+github&rlz=1C5CHFA_enTW729TW729&oq=multi-document+summarization+github&aqs=chrome..69i57j69i60.17795j0j1&sourceid=chrome&ie=UTF-8)
- [Gensim: Single document summarization and extract keywords](https://rare-technologies.com/text-summarization-with-gensim/)
- [Gensim: LDA](https://rare-technologies.com/tag/lda/)
- [New Gensim feature: Author-topic modeling. LDA with metadata.](https://rare-technologies.com/new-gensim-feature-author-topic-modeling-lda-with-metadata/)
## Kaggle Kernels about Twitter
- [Twitter US Airline Sentiment](https://www.kaggle.com/crowdflower/twitter-airline-sentiment/kernels?sortBy=votes&after=71865)
- [Social Cluster Analysis in R](https://www.kaggle.com/msjgriffiths/social-cluster-analysis-in-r)
## Tutorials
- [Video: Reasoning, Attention and Memory](http://videolectures.net/deeplearning2016_chopra_attention_memory/)
- [Video: Dynamic Neural Networks for Question Answering from Stanford CS224d](https://www.youtube.com/watch?v=T3octNTE7Is&list=PL3FW7Lu3i5Jsnh1rnUwq_TcylNr7EkRe6&index=17)
- [Deep Learning for Chatbots, Part 1 – Introduction](http://www.wildml.com/2016/04/deep-learning-for-chatbots-part-1-introduction/)
- [Deep Learning for Chatbots, Part 2 – Implementing a Retrieval-Based Model in Tensorflow](http://www.wildml.com/2016/07/deep-learning-for-chatbots-2-retrieval-based-model-tensorflow/)
- [Implementing Dynamic memory networks -- YerevaNN](https://yerevann.github.io/2016/02/05/implementing-dynamic-memory-networks/#memory-networks)
- [Mining Twitter Data](https://marcobonzanini.com/2015/03/02/mining-twitter-data-with-python-part-1/)
## Conference Track
### Summarization
- [TREC 2017: Real-Time Summarization (RTS) (ongoing)](http://trecrts.github.io/): Real-Time Summarization (RTS) began at TREC 2016 and represents a merger of the Microblog (MB) track, which ran from 2010 to 2015, and the Temporal Summarization (TS) track, which ran from 2013 to 2015.
- [Survey: Overview of the TREC 2016 RTS Track](https://cs.uwaterloo.ca/~jimmylin/publications/Lin_etal_TREC2016.pdf)
- [Solutions for TREC 2016 RTS](http://trec.nist.gov/pubs/trec25/trec2016.html)
- The other [tracks](http://trec.nist.gov/pubs/call2017.html) in TREC 2017. (*Complex Answer Retrieval Track* may be relevant.)
### Dialogue Management
- Dialog State Tracking Challenge (DSTC)
- [DSTC5](http://workshop.colips.org/dstc5/)
- [DSTC6 (ongoing)](http://workshop.colips.org/dstc6/)
- [Github: dstc5](https://github.com/seokhwankim/dstc5)
## Papers
keywords: `event/topic detection`, `event/topic tracking`, `neural text summarization`, `information retrieval`, `question answering/QA`, `chatbot`, `dialog system`, `conversational summarization`
### Post Summerization
- [A Survey On Short Text Summarization Of Comment Streams On Social Network Sites](http://ijarcet.org/wp-content/uploads/IJARCET-VOL-4-ISSUE-11-4147-4151.pdf): Mentioned event summarization using tweet
- "Posts and responses in Microblogs are more similar to a multi-persons dialogue corpus."
- [IMASS: An Intelligent Microblog Analysis and Summarization System](http://www.aclweb.org/anthology/P11-4023): How do we provide an informative presentation inferface?
- Clustering algorithms: (incremental) k-means, OPICS, BIRCH. (these are implemented in [sklearn](http://scikit-learn.org/stable/))
- Topic modeling: LDA (see [sklearn](http://scikit-learn.org/stable/) or [gensim](https://radimrehurek.com/gensim/models/ldamodel.html))
- [Event Summarization using Tweets](https://pdfs.semanticscholar.org/dd0b/ab42fa2c63f99effcc50410093d7ebd887cf.pdf): Most relevant to our project mentioned in this paper. (methods are more complicated. See *Algorithms* part)
- A Neural Model for Joint Event Detection and Summarization. [IJCAI 2017] (Currently not found on web)
- [Event Detection on Curated Tweet Streams](https://cs.uwaterloo.ca/~jimmylin/publications/Ghelani_etal_SIGIR2017.pdf) [SIGIR 2017]
- [Extractive and Abstractive Event
Summarization over Streaming Web Text](https://www.ijcai.org/Proceedings/16/Papers/575.pdf) [IJCAI 2016]
- [Event Representations for Automated Story Generation with Deep Neural Nets](https://arxiv.org/pdf/1706.01331v1.pdf)
- [Learning approaches for Detecting and Tracking News Events](https://www.cs.cmu.edu/~jgc/publication/Learning_Approaches_Detecting_Tracking_IEEE_1999.pdf) (IEEE 1999)
- [DocChat: An Information Retrieval Approach for Chatbot Engines
Using Unstructured Documents](http://aclweb.org/anthology/P16-1049)
- [Comparing Twitter Summarization algorithms for multiple post summaries](http://www.cs.uccs.edu/~jkalita/papers/2011/InouyeDavidSocialComm2011.pdf)
- [Overview of the TREC 2016 Real-Time
Summarization Track](https://cs.uwaterloo.ca/~jimmylin/publications/Lin_etal_TREC2016.pdf)
- [Event Detection Based Approach for Soccer Video
Summarization Using Machine learning](http://www.sersc.org/journals/IJMUE/vol7_no2_2012/5.pdf)
- [Cluster Analysis of Twitter Data: A Review of Algorithms](https://e-space.mmu.ac.uk/617901/1/ICAART_2017_110_CR_final.pdf)
### Dialogue Managment
keywords: `state tracking`, `neural dialogue management`, `memory network`
- [Github: awesome-dialogue-management](https://github.com/bnsblue/awesome-dialogue-management)
- [Github: Neural Network Dialog System Papers](https://github.com/snakeztc/NeuralDialogPapers)
- [The Ubuntu Dialogue Corpus: A Large Dataset for Research in Unstructured Multi-Turn Dialogue Systems](https://arxiv.org/pdf/1506.08909.pdf)
- [A Survey of Available Corpora for Building
Data-Driven Dialogue Systems](https://arxiv.org/pdf/1512.05742.pdf)
- [Dialog state tracking,
a machine reading approach using Memory Network](https://arxiv.org/pdf/1606.04052.pdf)
### Multi-Document Summarization
- [Multi-Document Based Summarization](https://nlp.stanford.edu/courses/cs224n/2010/reports/ssandeep-venuk-gkparai.pdf)
- [Graph Based Multi-Document Based Summarization](https://arxiv.org/pdf/1706.06681.pdf)
- [Multi-Document Summarization By Sentence Extraction ](https://www.cs.cmu.edu/~jgc/publication/MultiDocument_Summarization_Sentence_ANLP_2000.pdf)
- [Abstractive Multi-Document Summarization via Phrase Selection and
Merging∗](https://arxiv.org/pdf/1506.01597.pdf)
- [A review of recent progress in multi document
summarization](https://paginas.fe.up.pt/~prodei/dsie15/web/papers/dsie15_submission_8.pdf)
- [A Visual Analytics Approach for Summarizing Tweets](http://david-hawking.net/SIRIP2014_Proceedings/SIRIP%2714-A%20Visual%20Analytics%20Approach%20to%20Summarizing%20Tweets.pdf)
- [And many more](http://www.arxiv-sanity.com/search?q=multi-document)
- [DL Text Summarization](https://github.com/lipiji/app-dl) (many deep learning multi-doc summarization papers)
### Topic/Text Visualization
- [Web demo: Text Visualization Browser](http://textvis.lnu.se/)
- [Ipython: Text Analysis with Topic Models for the Humanities and Social Sciences](https://de.dariah.eu/tatom/index.html)
- [Paper(inspiring figures): Visualization of Text Streams: A Survey]()
- [Paper(inspiring figures): TopicNets: Visual Analysis of Large Text Corpora
with Topic Modeling](http://www.datalab.uci.edu/papers/topicnets.pdf)
## Statistics to obtain
- Keywords
- Most active users
- tweet/reddit interactions
### Twitter Sentiment
- [Infering tweet quality](http://www.evanmiller.org/inferring-tweet-quality-from-retweets.html)
- [Twitter Sentiment Analysis](http://thinknook.com/twitter-sentiment-analysis-training-corpus-dataset-2012-09-22/)
- [University of Michigan Twitter Classification](https://inclass.kaggle.com/c/si650winter11)
- [Weather Sentiment](https://www.kaggle.com/c/crowdflower-weather-twitter/discussion/6488#35651)
- [Deep Neural Network Setntiment](https://github.com/xiaohan2012/twitter-sent-dnn)
## Data Visualization
- [資料視覺化: 這樣做就對了!圖表的使用心法 – 基本圖表篇](http://blog.infographics.tw/2016/08/tips-on-basic-charts/)
- [資料視覺化: Python 上前端!利用 Bokeh 與 Python 製作網頁互動視覺化](http://blog.infographics.tw/2016/04/interactive-visualization-with-bokeh-and-python/)
- [Seaborn](https://seaborn.pydata.org/examples/index.html)
- [ML visualisation tools](http://moderndata.plot.ly/machine-learning-visualizations-made-in-python-and-r/)