# Project Survey ## MVP ![MVP Planing](https://i.imgur.com/lG8hccx.jpg) ## Logics ![](https://i.imgur.com/gAajpvY.jpg) ## Data Accessibility ### Twitter * https://dev.twitter.com/streaming/public: application establishes connection with streaming endpoints and is delivered a feed of tweets * POST status * [Example response from request using API](https://dev.twitter.com/rest/reference/post/statuses/update) * GET status * [Without Media](https://dev.twitter.com/rest/reference/post/statuses/update) * [With Media](https://dev.twitter.com/rest/reference/post/statuses/update_with_media) * [User Timeline](https://dev.twitter.com/rest/reference/get/statuses/user_timeline) * [Connecting to twitter's API](https://dev.twitter.com/overview/api/tls) * [API Rate Limits](https://dev.twitter.com/rest/public/rate-limiting) * [Working with Timelines](https://dev.twitter.com/rest/public/timelines) * [Github: Tweepy](https://github.com/tweepy/tweepy) * [Rate Limits](https://dev.twitter.com/rest/public/rate-limits) ### Facebook - [Public Feed API](https://developers.facebook.com/docs/public_feed) :::info Access to the Public Feed API is restricted to a limited set of media publishers and usage requires prior approval by Facebook. You cannot apply to use the API at this time. ::: - [Graph API](https://developers.facebook.com/docs/graph-api) ### Slack - [Slack API](https://api.slack.com/web) - [Similar project: Digest.AI](https://digest.ai/) ## Corpus - [Twitter Sentiment Corpus](http://www.sananalytics.com/lab/twitter-sentiment/) - [Github: twitter-corpus](https://github.com/bwbaugh/twitter-corpus) - [Github: chat_corpus](https://github.com/Marsan-Ma/chat_corpus) - [Ubuntu Dialogue Corpus v1.0.](http://dataset.cs.mcgill.ca/ubuntu-corpus-1.0/) ## Projects - [suweet](https://github.com/bass3m/suweet): 87% similar to our project (lol), but written in Clojure. - [twitter-trends-summarizer](https://github.com/yuva29/twitter-trends-summarizer) - [Twitter-Topic-Modeling](https://github.com/jrn1989/Twitter-Topic-Modeling) - [Twitter Word2vec pretrained model](http://www.fredericgodin.com/software/) - [Github: ChatterBot](https://github.com/gunthercox/ChatterBot) - [Github: ai-chatbot-framework ](https://github.com/alfredfrancis/ai-chatbot-framework) - [Github: MemNN -- Memory Networks implementations](https://github.com/facebook/MemNN) - [Github: Key Value Memory Networks](https://github.com/siyuanzhao/key-value-memory-networks): Inside contains tutorial for Memory Networks for NLP @ ICML 2016. - [Chatbot platform: Recast.ai](https://recast.ai/) - [Github: Summarize It](https://github.com/yask123/Summarize-it): Summarize chats in slack. - Uses [Text Rank](http://web.eecs.umich.edu/~mihalcea/papers/mihalcea.emnlp04.pdf) - [Github: awesome-bots](https://github.com/BotCube/awesome-bots) - [Reddit: autotldr](https://www.reddit.com/user/autotldr) - [Gensim: Phrase detection](https://radimrehurek.com/gensim/models/phrases.html) - [Github: Newspaper3k](https://github.com/codelucas/newspaper): Article scraping & curation - [Github: py-web-search](https://github.com/codelucas/newspaper): A Python module to fetch and parse results from different search engines. - [Github: Multi-document summarization tool relying on ILP and sentence fusion](https://github.com/sildar/potara) - And [many more](https://www.google.com.tw/search?q=multi-document+summarization+github&rlz=1C5CHFA_enTW729TW729&oq=multi-document+summarization+github&aqs=chrome..69i57j69i60.17795j0j1&sourceid=chrome&ie=UTF-8) - [Gensim: Single document summarization and extract keywords](https://rare-technologies.com/text-summarization-with-gensim/) - [Gensim: LDA](https://rare-technologies.com/tag/lda/) - [New Gensim feature: Author-topic modeling. LDA with metadata.](https://rare-technologies.com/new-gensim-feature-author-topic-modeling-lda-with-metadata/) ## Kaggle Kernels about Twitter - [Twitter US Airline Sentiment](https://www.kaggle.com/crowdflower/twitter-airline-sentiment/kernels?sortBy=votes&after=71865) - [Social Cluster Analysis in R](https://www.kaggle.com/msjgriffiths/social-cluster-analysis-in-r) ## Tutorials - [Video: Reasoning, Attention and Memory](http://videolectures.net/deeplearning2016_chopra_attention_memory/) - [Video: Dynamic Neural Networks for Question Answering from Stanford CS224d](https://www.youtube.com/watch?v=T3octNTE7Is&list=PL3FW7Lu3i5Jsnh1rnUwq_TcylNr7EkRe6&index=17) - [Deep Learning for Chatbots, Part 1 – Introduction](http://www.wildml.com/2016/04/deep-learning-for-chatbots-part-1-introduction/) - [Deep Learning for Chatbots, Part 2 – Implementing a Retrieval-Based Model in Tensorflow](http://www.wildml.com/2016/07/deep-learning-for-chatbots-2-retrieval-based-model-tensorflow/) - [Implementing Dynamic memory networks -- YerevaNN](https://yerevann.github.io/2016/02/05/implementing-dynamic-memory-networks/#memory-networks) - [Mining Twitter Data](https://marcobonzanini.com/2015/03/02/mining-twitter-data-with-python-part-1/) ## Conference Track ### Summarization - [TREC 2017: Real-Time Summarization (RTS) (ongoing)](http://trecrts.github.io/): Real-Time Summarization (RTS) began at TREC 2016 and represents a merger of the Microblog (MB) track, which ran from 2010 to 2015, and the Temporal Summarization (TS) track, which ran from 2013 to 2015. - [Survey: Overview of the TREC 2016 RTS Track](https://cs.uwaterloo.ca/~jimmylin/publications/Lin_etal_TREC2016.pdf) - [Solutions for TREC 2016 RTS](http://trec.nist.gov/pubs/trec25/trec2016.html) - The other [tracks](http://trec.nist.gov/pubs/call2017.html) in TREC 2017. (*Complex Answer Retrieval Track* may be relevant.) ### Dialogue Management - Dialog State Tracking Challenge (DSTC) - [DSTC5](http://workshop.colips.org/dstc5/) - [DSTC6 (ongoing)](http://workshop.colips.org/dstc6/) - [Github: dstc5](https://github.com/seokhwankim/dstc5) ## Papers keywords: `event/topic detection`, `event/topic tracking`, `neural text summarization`, `information retrieval`, `question answering/QA`, `chatbot`, `dialog system`, `conversational summarization` ### Post Summerization - [A Survey On Short Text Summarization Of Comment Streams On Social Network Sites](http://ijarcet.org/wp-content/uploads/IJARCET-VOL-4-ISSUE-11-4147-4151.pdf): Mentioned event summarization using tweet - "Posts and responses in Microblogs are more similar to a multi-persons dialogue corpus." - [IMASS: An Intelligent Microblog Analysis and Summarization System](http://www.aclweb.org/anthology/P11-4023): How do we provide an informative presentation inferface? - Clustering algorithms: (incremental) k-means, OPICS, BIRCH. (these are implemented in [sklearn](http://scikit-learn.org/stable/)) - Topic modeling: LDA (see [sklearn](http://scikit-learn.org/stable/) or [gensim](https://radimrehurek.com/gensim/models/ldamodel.html)) - [Event Summarization using Tweets](https://pdfs.semanticscholar.org/dd0b/ab42fa2c63f99effcc50410093d7ebd887cf.pdf): Most relevant to our project mentioned in this paper. (methods are more complicated. See *Algorithms* part) - A Neural Model for Joint Event Detection and Summarization. [IJCAI 2017] (Currently not found on web) - [Event Detection on Curated Tweet Streams](https://cs.uwaterloo.ca/~jimmylin/publications/Ghelani_etal_SIGIR2017.pdf) [SIGIR 2017] - [Extractive and Abstractive Event Summarization over Streaming Web Text](https://www.ijcai.org/Proceedings/16/Papers/575.pdf) [IJCAI 2016] - [Event Representations for Automated Story Generation with Deep Neural Nets](https://arxiv.org/pdf/1706.01331v1.pdf) - [Learning approaches for Detecting and Tracking News Events](https://www.cs.cmu.edu/~jgc/publication/Learning_Approaches_Detecting_Tracking_IEEE_1999.pdf) (IEEE 1999) - [DocChat: An Information Retrieval Approach for Chatbot Engines Using Unstructured Documents](http://aclweb.org/anthology/P16-1049) - [Comparing Twitter Summarization algorithms for multiple post summaries](http://www.cs.uccs.edu/~jkalita/papers/2011/InouyeDavidSocialComm2011.pdf) - [Overview of the TREC 2016 Real-Time Summarization Track](https://cs.uwaterloo.ca/~jimmylin/publications/Lin_etal_TREC2016.pdf) - [Event Detection Based Approach for Soccer Video Summarization Using Machine learning](http://www.sersc.org/journals/IJMUE/vol7_no2_2012/5.pdf) - [Cluster Analysis of Twitter Data: A Review of Algorithms](https://e-space.mmu.ac.uk/617901/1/ICAART_2017_110_CR_final.pdf) ### Dialogue Managment keywords: `state tracking`, `neural dialogue management`, `memory network` - [Github: awesome-dialogue-management](https://github.com/bnsblue/awesome-dialogue-management) - [Github: Neural Network Dialog System Papers](https://github.com/snakeztc/NeuralDialogPapers) - [The Ubuntu Dialogue Corpus: A Large Dataset for Research in Unstructured Multi-Turn Dialogue Systems](https://arxiv.org/pdf/1506.08909.pdf) - [A Survey of Available Corpora for Building Data-Driven Dialogue Systems](https://arxiv.org/pdf/1512.05742.pdf) - [Dialog state tracking, a machine reading approach using Memory Network](https://arxiv.org/pdf/1606.04052.pdf) ### Multi-Document Summarization - [Multi-Document Based Summarization](https://nlp.stanford.edu/courses/cs224n/2010/reports/ssandeep-venuk-gkparai.pdf) - [Graph Based Multi-Document Based Summarization](https://arxiv.org/pdf/1706.06681.pdf) - [Multi-Document Summarization By Sentence Extraction ](https://www.cs.cmu.edu/~jgc/publication/MultiDocument_Summarization_Sentence_ANLP_2000.pdf) - [Abstractive Multi-Document Summarization via Phrase Selection and Merging∗](https://arxiv.org/pdf/1506.01597.pdf) - [A review of recent progress in multi document summarization](https://paginas.fe.up.pt/~prodei/dsie15/web/papers/dsie15_submission_8.pdf) - [A Visual Analytics Approach for Summarizing Tweets](http://david-hawking.net/SIRIP2014_Proceedings/SIRIP%2714-A%20Visual%20Analytics%20Approach%20to%20Summarizing%20Tweets.pdf) - [And many more](http://www.arxiv-sanity.com/search?q=multi-document) - [DL Text Summarization](https://github.com/lipiji/app-dl) (many deep learning multi-doc summarization papers) ### Topic/Text Visualization - [Web demo: Text Visualization Browser](http://textvis.lnu.se/) - [Ipython: Text Analysis with Topic Models for the Humanities and Social Sciences](https://de.dariah.eu/tatom/index.html) - [Paper(inspiring figures): Visualization of Text Streams: A Survey]() - [Paper(inspiring figures): TopicNets: Visual Analysis of Large Text Corpora with Topic Modeling](http://www.datalab.uci.edu/papers/topicnets.pdf) ## Statistics to obtain - Keywords - Most active users - tweet/reddit interactions ### Twitter Sentiment - [Infering tweet quality](http://www.evanmiller.org/inferring-tweet-quality-from-retweets.html) - [Twitter Sentiment Analysis](http://thinknook.com/twitter-sentiment-analysis-training-corpus-dataset-2012-09-22/) - [University of Michigan Twitter Classification](https://inclass.kaggle.com/c/si650winter11) - [Weather Sentiment](https://www.kaggle.com/c/crowdflower-weather-twitter/discussion/6488#35651) - [Deep Neural Network Setntiment](https://github.com/xiaohan2012/twitter-sent-dnn) ## Data Visualization - [資料視覺化: 這樣做就對了!圖表的使用心法 – 基本圖表篇](http://blog.infographics.tw/2016/08/tips-on-basic-charts/) - [資料視覺化: Python 上前端!利用 Bokeh 與 Python 製作網頁互動視覺化](http://blog.infographics.tw/2016/04/interactive-visualization-with-bokeh-and-python/) - [Seaborn](https://seaborn.pydata.org/examples/index.html) - [ML visualisation tools](http://moderndata.plot.ly/machine-learning-visualizations-made-in-python-and-r/)