# Data Analysis Weekly Project - First Milestone ### Key Questions - What is your data about? These datasets contain information about all audio-video recordings of TED Talks uploaded to the official TED.com website until September 21st, 2017. The TED main dataset contains information about all talks including number of views, number of comments, descriptions, speakers and titles. The TED transcripts dataset contains the transcripts for all talks available on TED.com. - Who are your audiences? Ted or other media * Content/production department in different media company * Marketer * Researcher * Ted talk audience? - Describe your given dataset? (E.g. what columns does it have? What are they about?) * total: 17 columns * comments (int): The number of first level comments made on the talk * description (str): A blurb of what the talk is about * duration (int): The duration of the talk in seconds * event (str): The TED/TEDx event where the talk took place * film_date (int): The Unix timestamp of the filming * languages (int): The number of languages in which the talk is available * main_speaker (str): The first named speaker of the talk * name (str): The official name of the TED Talk. Includes the title and the speaker. * num_speaker (int): The number of speakers in the talk * published_date (int): The Unix timestamp for the publication of the talk on TED.com * ratings (str): A stringified dictionary of the various ratings given to the talk (inspiring, fascinating, jaw dropping, etc.) * related_talks (str): A list of dictionaries of recommended talks to watch next * speaker_occupation (str): The occupation of the main speaker * tags (str): The themes associated with the talk * title (str): The title of the talk * url (str): The URL of the talk * views (int): The number of views on the talk - Based on your analysis so far, is there any error in the data? Is there any feature that should be processed further? * Published date/film date (convert it from timestamp to normal time) * Ratings * Related talks * Tags - What main points do you want to focus your analysis on? Are they interesting? Are they relavant to the audiences? * Main points could be finding insights about the world of TED, Topics, distribution of talk, emphasis of ted talk, most favored topics, etc. * Can we predict the views based on a topic with different tags, key words, time of speech, language setting, number of speaker? (I find this challenging but interesting enough) * At the same time, find correlation between each data (language & viewers, trendy & topic, etc.) * Seek out for external resources, and search in news website to scrap information about topics (extend the project to other resources) - How do you divide the works in your team? How do you maintain communicaiton? Keywords - __Data Storytelling__