# A brief overview of Data Science and Artificial Intelligence *Kirstie Whitaker, Alan Turing Institute* Billions of gigabytes of data are generated globally every day. **Data science** is the drive to turn this data into useful information, and to understand its powerful impact on science, society, the economy and our way of life. The study of data science brings together researchers in computer science, mathematics, statistics, machine learning, engineering, economics, philosophy, digital humanities, and other social sciences. | ![](https://camo.githubusercontent.com/a5a256d82788c08895aa4adb8645b3b4280c510f/687474703a2f2f692e696d6775722e636f6d2f62397859645a422e6a7067) | |:---:| | One way of defining data science is through its _interdisciplinarity_. It is the overlap between traditional research domains, mathematics and statistical knowledge, and creative software engineering (hacking skills). The danger zone in this illustration comes when there are data science tools that do not incorporate expertise on choosing between the many different analysis techniques that exist. | | *Figure by Natalia Bilenko, modified from Drew Conway's diagram; http://berkeleysciencereview.com/how-to-become-a-data-scientist-before-you-graduate* | There is no accepted definition of **artificial intelligence** or 'AI' but the term is often used to describe when a machine or system performs tasks that would ordinarily require human (or other biological) brainpower to accomplish, such as making sense of spoken language, learning behaviours or solving problems. There are a wide range of such systems, but broadly speaking they consist of computers running **algorithms** – sets of instructions in computer language – often drawing on data. A traditional focus of AI research is building machines that can **play games** such as tic-tac-toe, chess or Go. These games have concrete rules and it is easy to say whether the game has been won, lost or drawn. AI research in this area focuses on "intelligent" ways for a computer programme to investigate the billions of different options that a player could take at any point in time, and how they can "learn" the best strategies to play the game. **Machine learning** is a branch of artificial intelligence that allows computer systems to improve their performance by looking at examples, data and their own previous experience. These techniques can _interpolate_ - fill in the spaces - between examples the algorithm has seen to predict outcomes that it has not been trained on. For example, deep neural networks are used to translate text from one language to another (such as French to Japanese) without knowing anything about the syntax and grammar of either language, because they have "learned" from millions of examples. AI research may also try to **infer information** about the real world from examples it has been given. It can do this by monitoring the probabilities of something happening and using that information to build expectations in the form of a _probability distribution_. A good example of these types of models is Google's search algorithm. When you enter your search term, the algorithm presents its best guess at what you're looking for, based on a model that it has built by looking at all the previous connections between what you typed and pages on the web. Finally, AI incorporates mathematical **optimisation**. Machines can search a huge range of options to calculate the best way to execute a plan. Optimistation is the process of selecting the best path through all the possibilities. Robotics is an application of artificial intelligence that combines **machine learning** to scan the room and understand the surroundings, **inference** to generate the next step in a sequence to **progress towards a goal** such as picking up an object, and a type of **optimisation** called _path finding_ to quickly and efficiently coordiante its limbs. Importantly, there are lots of things that AI can not do in 2019. AI algorithms struggle to **reason** about the world, they require millions of examples in situations where a human child would only need to see one or two. They do not respond well to the _context_ of their environment. Most AI applications require a "human in the mix" and expert guidance throughout its creation and deployment to ensure the algorithm is giving answers to the questions we want it to address.