Introduction to AI - What you should know now...

# Introduction to AI - What you should know now... ###### tags: `Summary` `` v2.3 `` `Thanks to all who have helped me to complete this document!` ## What is Artificial Intelligence? * **What is Artificial Intelligence, and what is it not?** * Artificial Intelligence and (Deep) Machine Learning are not the same thing * Artificial Intelligence and Cognitive Science have developed into different research fields * AI is much more than Deep Learning * Definition by McCarthy: * "The science and engineering of making intelligent machines, especially intelligent computer programs. It is related to the similar task of using computers to understand human intelligence, but AI does not have to confine itself to methods that are biologically observable." --- * **Artificial Intelligence is about building machines that exhibit rational behavior** * this does not mean that the methods they use must be particularly clever * nor does it mean that the machines use the same methods as humans (typically they don’t) --- * **What are the most important subfields of AI?** * Problem Solving and Search * Game Playing * Knowledge Representation * Planning * Machine Learning * Deep Learning * Neural Networks * Natural Language Understanding * Computer Vision * Robotics --- ## History of Artificial Intelligence * **AI has a very long history** * both in speculation and scientific investigation * **AI builds upon contribution from many scientific disciplines** * logics, philosophy, psychology, neuroscience, exonomics, linguistics, control theory, mathematics * **When is the official birthdate of AI?** * 1956, The Dartmouth Conference * **Who are the founding fathers?** * John McCarthy, Marvin Minsky, Allen Newell, Herbert A. Simon, Arthur Samuel, ... * **What were the early successes?** * **McCulloch and Pitts** model artificial neurons and show that logical operations can be modeled with such cells * first Turing-complete digital computer called **ENIAC** * **Marvin Minsky and Dann Edmonds** constructed the first neural network computer * **Newell and Simon:** the General Problem Solver (successfully solved simple puzzles) * **Arthur Samuel:** * investigated game playing (checkers) with great success at IBM * pioneered ideas in game playing & machine learning * **John McCarthy:** * Lisp (second-oldest high-level language) * Logic-oriented advice taker * **Marvin Minsky:** various students working on micro-worlds --- * **Why did the initial expectations not work out?** * Progress was slower than (unrealistic) expectations. Simon and Newell's predictions came more or less true after 40 (instead of 10) years and in very different ways than they had imagined. * Difficulty of knowledge representation * Lack of scalability * Underestimation of the combinatorial explosion in search * Things that work well in micro-worlds often do not work in real world * Fundamental limitations on techniques and representations * Many of the successes in AI depend on deep and intensive computation instead of modeling the human mind. --- ## Puzzles and Search * **Search is an important problem solving technique in AI (and beyond)** * **How can a search problem be defined?** * Search is the process of trying out various possibilities in a systematic way so that in the end we can find a valid solution. Search problems are defined with : * a state decription * a Goal function * **Simple exhaustive search:** * generate all possible states, no use of heuristics for selecting a search or constraints for reducing the number of possibilities * test goal function for each such state * **Local search:** * keep single current state and try to improve it by maximizing a heuristic evaluation * uses very little memory, often quickly finds solutions in large or infinite state spaces * but no guarantee for completeness * **What is backtracking?** * Backtracking is the basic search algorithm for CSPs (Constraint Satisfaction Problems) * add one constraint at a time without conflict * succeed if a legal assignment is found * Can solve n-queens problems for up to n ~ 25 --- * **Heuristics may help to guide the search into more promising regions of the search space** * **How do simple search algorithms like hill-climbing or genetic algorithms work?** * **Hill-climbing search aka Greedy Local Search** * expand the current state (generate all neighbors) * move to the one with the highest evaluation * until the evaluation goes down * main problem -> local optima * **genetic algorithms** * Start with k randomly generated states (population) * A state is represented as a string over a finite alphabet * often a string of 0s and 1s * Evaluation function (fitness function) * Produce the next generation by selection, cross-over, and mutation * very powerful, rather easy, perform well and widely used * **What are local optima and what can you do to prevent them?** * the algorithm will stop as soon as it is at the top of a hill * but it is actually looking for a mountain peak * **Random Restart Hill-Climbing:** * several iterations with different starting positions * **Stochastic Hill-Climbing:** * select the successor node randomly, better nodes have a higher probability of being selected --- * **Constraints may help to avoid unnecessary search** * **What general-purpose heuristics are there for selecting constrained variables and their values?** * **Minimum Remaining Value Heuristic** * choose variable with fewest consistent values * **Degree Heuristic** * choose variable with most constraints on remaining variables * **Least Constraining Value Heuristic** * choose value that rules out fewest values in remaining variables * **How does constraint propagation work?** * maintain a set of possible values $D_i$ for each variable $X_i$ * try to reduce the size of $D_i$ by identifying values that violate some constraints ![](https://i.imgur.com/Y0xbMqW.png) --- * **Complex problems may be solved by reducing them to less complex problems** * **What is the key idea of divide-and-conquer?** ![](https://i.imgur.com/49YdToE.png) --- ## Game Playing * **What are the differences between human and computer game playing**? * Humans work on a pattern base and computer on a search base. As a result, it is more difficult for people to solve a chess puzzle that enables checkmate in 2 moves than a game state that one recognizes but still lasts 120 moves. For computers it is exactly the opposite. --- * **What is a game tree and how is it constructed?** * A game tree represents all possible moves and game states during a game from the selected beginning to all finishes. A game tree connects positions by moves. Where positions or game states are nodes and moves are edges. The root of the tree is the initial game state. From there an edge is drawn for each possible move of the player. This continues for each node until all leaves are finished game states. By applying heuristic functions this tree can be used to find the best path. ![](https://i.imgur.com/BG8dhtd.png) --- * **How does Retrograde analysis work?** * 0. Generate all possible positions (preferable with a database, still a problem with bigger games) * 1. Find all positions that are won from player A () * `i.)` mark all terminal positions that are won for A * `ii.)` mark all positions from which A can get to a position `i` * `iii.)` mark all positions where B is to move and all moves lead to `i` * `iv.)` if there are positions that not have been considered goto `ii`. * 2. Find all positions that are won from player B analouge to 1 * 3. All remaining positions are draw --- * **Why is exhaustive search infeasible and what strategies can you employ to solve that problem?** * Shannon did proove that there are in average $~10^{120}$ different chess games. This number is bigger than the amount of atoms in the universe. It simply cannot be computed. The best solution would be a Shannon Strategy. * **What are Shannon‘s strategies?** * Finish the search earlier and determine whether current depth is sufficient enough. For this we use heuristic evaluation functions. Two Examples of strategeies are: * **Shanon - A:** Brute Force search all position till a certain depth * **Shanon - B:** Selective Search; drop uninteresting paths, like humans do ![](https://i.imgur.com/NbflC9k.png) --- * **What is the horizon effect and how can it be countered?** * The horizon effect comes from the fact that machines do search only a small portion of the possible moves. So there is a possibility that the chosen move is an error which can not be "seen" because it is outside the search perimeter (i.e., beyond its "horizon"). Solutions: * Search Extensions by example with. Forward Pruning, Null-Move Pruning, Iterative Deepening, Move Ordering, Transposition Tables --- * **What are the differences between selective search and (depth limited) exhaustive search?** * Selective search uses techniques and algorithms, so that it does not search all possible paths but only those which are relevant. (depth limited) exhaustive search searches all possible paths to a certain depth. * **What strategies have been used to combine these two?** * Quiescence Search, Search Extension --- * **What are the key ideas behind techniques such as transition tables, Zobrist keys, null-move pruning, …?** * Might not include all concepts * **Null-move pruning** * Add a null-move to the search i.e. assume taht the current player doesn't make a move. If the null-move results in a cutoff, assume that making a move will do the same. * **Transposition Tables** * Different sequences of the same moves come to the same result: Find them and save them in Hash Tables, so that you have to compute / search only once. * **Zobrist (Hash) Keys** * A way to store the sequences of moves in the transposition table. Take an 3d-array with arbitrary long random int numbers (for example 64 bit or even longer) as values. The coordinates are piece type, location and color. Initialize a 64 bit Hash Key with 0. Now each turn you need to look up the value from the moved piece in your array and xor it with the last position of the piece and afterwards xor the result with the hash key. If the last position is empty use 0 instead. Biggest plus is that it can be updated incrementally. * **Forward Pruning** * Pruning is part of a heuristic, which removes completely certain branches of the search tree. You can do this because the algorithm knows, or thinks (Dangerous!) that the tree does not change the result of the search. Forward pruning in particular always involves some risks to overlook something, with influence on the root score. * **Zermelo Theorem** * A deterministic, perfect information, 2 player game can be sorted in one of three categories: * Player 1 wins * Player 2 wins * Both Player can force a draw * **Solving a Game ultra weakly** * Prove whether the first player will win, lose or draw from the initial position, given perfect play from both sides * by non-constructive proof, which does not help in the play * by complete min-max or alpha-beta search * **Solving a Game weakly** * Provide an algorithm which secures a win for one player or a draw for either, against any possible moves from the opponent, from the initial starting position only * **Solving a Game strongly** * Provide an algorithm, which can produce perfect play from any position $\Rightarrow$ mostly with some kind of database * **Quiescence Search** * We do evaluation only during quiescent states. * Those states are without wild swings in value in the near future * states in the middle of exchange are not * When search depth is reached look whether this is a quiescence state * if not search further until you find one. Evaluate Heuristic * **Search Extensions** * When a force move is found look further. The reason is that a forced move narrows width of the tree. Be careful, search must always terminate somehow. * **Move Ordering** ![](https://i.imgur.com/Kjxu4Km.png) --- * **Do Deeper Searches necessarily help?** * It does but only to a certain degree of deepness. The deeper the less efficient. ![](https://i.imgur.com/dXYydWd.png) --- * **What have been the key events in game playing?** * 1912, Quevedo´s KRK machine * 1912, Zermelo's Theorem * 1950, Shannon´s proposal * 1975, Alpha-Beta Search * 1996, Garry Kasparov vs Deep Blue 1st match * 1997, Garry Kasparov vs Deep Blue 2nd match * 2015, AlphaGo beats Fan Hui * 2016, AlphaGo beats Lee Sedol * 2017, Alpha Zero beats the best programs in Shogi, Chess and Go --- ## Machine Learning * **What is the relation between deep learning and AI?** * Machine learning is a subfield of AI and deep learning is a subfield of machine learning. Deep learning is a form of machine learning that makes use of artificial neural networks. --- * **How to define Machine Learning?** * (Mitchell 1997): „A computer program is said to learn from experience E with respect to some class of tasks T and performance measure P, if its performance at tasks in T, as measured by P, improves with experience E.“ --- * **What is overfitting and how can you avoid it?** * Training Set Error continues to decrease with increasing number of training examples / number of epochs * an epoch is a complete pass through all training examples * Test Set Error will start to increase because of overfitting * Simple training protocol: * keep a separate validation set to watch the performance * validation set is different from training and test sets! * stop training if error on validation set gets up --- * **What is a perceptron and how does it predict?** * A single node connecting n input signals aj with one output signal a typically signals are −1 or +1 --- * **What is the relation to Boolean functions?** ![](https://i.imgur.com/yzducJ7.png) --- * **How can you optimize weights in a perceptron?** ![](https://i.imgur.com/nzNurZl.png) --- * **What role does the activation function play in natural and in machine neural networks?** * Activation function allowed us to learn non linear function. Without activation function, NN are linear. ![](https://i.imgur.com/Ac4wx0l.png) --- * **What are the limitations of a single perceptron?** * Perceptron is a single linear classifier. In this case, they cannot solve tasks that are not linearly separable. --- * **What is the typical structure of an artificial neural network?** ![](https://i.imgur.com/RJh0KzE.png) --- * **How does backpropagation generate error signals for the intermediate layers?** ![](https://i.imgur.com/r6lMIk6.png) --- * **What are the key ideas of convolutional neural networks and recurrent neural networks?** * **Convolutional Neural Networks** * Convolution: * for each pixel of an image, a new feature is computed using a weighted combination of its nxn neighborhood * Convolutions can be encoded as network layers * all possible 3x3 pixels of the input image are connected to the corresponding pixel in the next layer * Convolutional Layers are at the heart of Image Recognition * Several stacked on top of each other and parallel to each other * **Recurrent Neural Networks (RNN)** * allow to process sequential data * by feeding back the output of the network into the next input * **Long-Short Term Memory (LSTM)** * add „forgetting“ to RNNs * good for mapping sequential input data into sequential output data * e.g., text to text, or time series to time series --- ## Machine Learning in Games * **What is the key idea of reinforcement learning?** * Learning of policies (action selection strategies) based on feedback from the environment (reinforcement) --- * **How do MENACE and TD-Gammon work?** * **MENACE:** Learns to play Tic-Tac-Toe and uses 287 matchboxes filled with beads in 9 different colors (1 color for each square). While playing, one selects the matchbox of the corresponding position, draws a bead from the box and plays the move indicated by the bead. If the game is lost, all drawn beads are kept (negative reinforcement), if the game is drawn the beads are put back and if the game is won the beads are doubled when put back (positive reinforcement) * **TD-Gammon:** Learns to play backgammon. Improvements over MENACE: * faster convergence because of Temporal Difference (TD) Learning - backpropagating every move * NN used instead of matchboxes - allows generalization * use of positional characteristics as features --- * **What is the exploration vs. exploitation trade-off?** * Example on multi-armed bandit problem: Pull the best arm in order to maximize your earnings (exploitation) vs. try if other arms are more promising (exploration). * solved by UCB (upper confidence bound) algorithm consisting of: * exploitation term - average earning of arm *i* * exploration term - inverse probability of how often arm *i* has been played * always play the arm with highest UCB --- * **Why does deep search not work with chance games?** * Player A cannot directly maximize their gain because they don't know what B's legal actions will be. Standard game trees are therefore extended with chance nodes - an expected value (average) of the outcome needs to be computed at each chance node. A deep look ahead is not feasible since the probability of reaching a given node shrinks with increasing depth. --- * **How can you approximate an expected value at a position?** * If exact probabilities for the outcomes at the chance nodes are not known, values can be sampled. ![](https://i.imgur.com/ameVBkT.png) * **Simulated Search:** * we can limit the breadth of the search tree and sample some lines to the full depth * estimate the expected values of each move by counting the number of wins in a series of complete games * at each node select one of the options at random (according to probabilities) at MAX and MIN nodes make move --- * **What is the Multi-Armed Bandit problem?** * Find the arm (move) with the highest long-term utility from observations of immediate utilities with as little regret as possible --- * **What is Monte-Carlo Search?** * Extreme case of simulation search - play a large number of games where both players make their move randomly, average the scores of these games and make the move that has the highest average score. Can also be used in deterministic games. --- * **How can Monte-Carlo Techniques be integrated with Game Tree Search?** * **Monte-Carlo Tree Search (MCTS):** * incrementally build up a search tree, evaluate the leaf nodes via roll-outs, evaluate promising moves more often than bad moves but give bad moves also a chance (UCB algorithm) - tree grows deeper in regions of good moves * uses two policies * Tree Policy: how is the tree traversed in order to find the best leaf to be expanded * Roll-out/Default Policy: how are the moves in the roll-out games selected ![](https://i.imgur.com/Myawr6K.png) * **UCT Search** - the best-known formulation of MCTS, it combines a UCB-based tree policy with random roll-outs. ![](https://i.imgur.com/YLynRPF.png) --- * **How does AlphaGo work?** * AlphaGo combines MCTS with deep learning and reinforcement learning from self-play. * use a (fast) learned roll-out policy * use a depth-limit in MCTS where a learned evaluation function is used * learn from expert games a fast but inaccurate roll-out policy * learn from expert games an accurate expert policy * refine the expert policy into a more accurate selection policy * use self-play from the selection policy to train a utility function --- * **What is the key (name giving) difference between AlphaGo and AlphaZero?** * AlphaGo Zero started without any knowledge and from random behavior --- * **What is the difference between optimal and maximal play?** * For simple games we know optimal solutions, but optimal solutions are not maximal. Mind the difference between maximizing your winning (exploit your opponents' weaknesses) and playing optimal. * Opponent Modeling - try to predict the opponent's next move, try to predict what move the opponent predicts that your next move will be,... ## Knowledge Representation and Reasoning * **How does Watson work and where does its knowledge come from?** ![](https://i.imgur.com/SAKBzr9.png) --- * **Why is logic not enough for knowledge representation?** * Formulating knowledge in logic was always a big field in research, one of its achievements was Prolog (uses backward chaining). Datalog (uses forward chaining) is a simple version of Prolog. --- * **What is the difference between forward chaining and backward chaining?** ![](https://i.imgur.com/JmR3ggT.png) ![](https://i.imgur.com/yOpa5GO.png) --- * **What are ontologies and how can we construct them?** ![](https://i.imgur.com/aegIneD.png) ![](https://i.imgur.com/mC5EDHF.png) --- * **What is Feigenbaum’s Knowledge Acquisition Bottleneck?** * "The problem of knowledge acquisition is the critical bottleneck in artificial intelligence." -Edward Feigenbaum * meaning that the automated reasoning part is often easier than capturing the expert knowldege (e.g. hard to formulate knowledge) --- * **Why is Manual KB Construction as in Cyc so difficult?** * Estimation by Douglas Lenat in 1984: * 350 person years and 250.000 rules should do the job of collecting the essence of the world's knowledge * the present: * about 1.000 person years, and 21M axioms and rules are needed * **What is the main difference in the automated construction of knowledge bases in DBPedia and NELL?** * **DBPedia:** * extracts knowledge from Wikipedia using mappings and heuristics * **NELL:** * reads documents on the web and stores it inferred * input was: ontology, seed examples, text corpus * output was: facts, text, patterns * has a large degree of automation with occasional human feedback --- * **What is the vision behind the Semantic Web as opposed to the WWW as we know it?** * **Semantic Web**: * consists of many datasets that are publicly abailable and they are all connected to each other and stored in categories that refer to its topics ## Planning * **What is the problem with using pure logic for planning (as in the situation calculus)?** * The key problem is how to represent change. --- * **How are STRIPS operators defined?** * **STRIPS: Representation of States** ![](https://i.imgur.com/FA92LzK.png) --- * **STRIPS: Representation of Goals** ![](https://i.imgur.com/9ywV8fd.png) --- * **STRIPS: Representation of Actions** ![](https://i.imgur.com/3T2v40Q.png) --- * **Semantics of the STRIPS Language** ![](https://i.imgur.com/UCg96mP.png) ![](https://i.imgur.com/95fqWPy.png) --- * **Why is regression planning often better than progression planning?** ![](https://i.imgur.com/1H9d21e.png) --- * **What is the Sussman Anomaly?** ![](https://i.imgur.com/9qfptwH.png) --- * **What is the key difference between state-space planning and plan-space planning?** ![](https://i.imgur.com/qJsAqbE.png) --- * **What domain-independent heuristics can be used in each case?** * **Heuristics for State-Space Search (two approaches to find an admissible search heuristic)** * **The optimal solution to a relaxed problem:** * remove all preconditions from actions * remove only the delete-list and find a (minimal) set of actions that collectively achieve the goals * **The subgoal independence assumption:** * The cost of solving a conjucntion of subgoals is approximated by the sum of the costs of solving them idependently * only admissible if co-ordination causes additional complexity * **Heuristics for Plan-Space Search** * general heuristics: number of distinct open preconditions * choosing good precondition to refine has also a strong impact --- * **What are causal links and ordering constraints, and how can a conflict between them be recognized and resolved?** ![](https://i.imgur.com/nLFg0sN.png) ## Natural Language Processing * **How does Eliza work? Why is its simple strategy often successful?** ![](https://i.imgur.com/qCqxYqO.png) --- * **What are fundamtental problems that need to be addressed in NLP?** ![](https://i.imgur.com/b27bJyx.png) --- * **Be aware of ambiguity across all levels of natural language (words, syntax, semantics, ...)** ![](https://i.imgur.com/R7co5NX.png) --- * **What is the intuition behind the vector-space model?** * See question below. * **What is the difference between information retrieval and information exraction?** ![](https://i.imgur.com/Q5kunEc.png) ![](https://i.imgur.com/Wgb6Hka.png) ![](https://i.imgur.com/0oBdxQs.png) --- * **What is the Bag of Words Model?** ![](https://i.imgur.com/zYuHDXr.png) --- * **Probabilistic Text Classification: given document d, from which bag was it generated? (Business, Sports, Politics,...)** * Works with simple Naive Bayes Classifier for text (assumes document is a sequence of terms), makes independence assumptions about the probability with which word occurs in documents of a class --- * **What are semantic embeddings and are they learned?** * **semantic embedding of words (Word2Vec)** * **key idea:** find a distributed word representation, i.e. each word is represented as a lower-dimensional, non-sparse vector; allows, e.g. to compute cosine similarities between words * general approach: train a (deep) neural network in a supervised way, using the context of a word as additional input * **2 variants of Word2Vec:** * **continuous bag of words:** predict current word from a window of surrounding words * **skip-gram:** use the current word to predict the context window --- * **Why is machine translation hard?** * problems of machine translation: generation of long coherent texts, poetry, song lyrics, understanding puns and jokes, ... --- * **What is end-to-end learning?** ![](https://i.imgur.com/jd9wUkM.png) ## Interpretable AI * **What is Interpretability?** ![](https://i.imgur.com/1gmL5vc.png) --- * **Why is Interpretability in AI important?** ![](https://i.imgur.com/o7WAuXZ.png) --- * **Is Complexity a good measure for interpretability? What other aspects are there?** * Results suggest that, at least in some cases, understandability is negatively correlated with the complexity, or the size, of a model. * **MDL and understandability** * Minimum Description Length explanations may be predictive but do not need to be interpretable * **Understandability and Rule length** * Humans sometimes prefer longer explanations (bet on the longer sequence). Don't operate on the logical level but construct a mental image of instances covered by a pattern. * **Structured Concepts** * Most rule learning algorithms learn flat theories, however, structured concepts are often more interpretable --- * **What are cognitive biases and why can they be useful for understanding interpretability?** ![](https://i.imgur.com/2lz3bdm.png) --- * **What are Occam‘s Razor and the Minimum Description Length Principle?** * **Occam's Razor** * simple concepts are better * simple theories are easier to falsify * empirically, we know that simpler theories perform better * **Minimum Description Length Principle** * Idea: measure how good an explanation is by measuring its description length and how many errors one makes when using that rule for describing the data. * The best hypothesis is the one that compresses the data the most. * Length of data is relative to a hypothesis (program) plus the length of this hypothesis * ![](https://i.imgur.com/E3zEKuC.png) * MDLP is viewed as the generalisation of Occam's Razor * Frequently used as selection/pruning criterion in rule learning --- * **What strategies can be used to obtain interpretable knowledge from black-box models?** * **Extract white-box models from black-box models** * **Extraction of Global Models:** try to find a white-box model that is equivalent to the black-box model * **key advantage**: black-box model can (accurately) label as many examples as necessary to learn a good white-box model * **Extraction of Local Explanations:** try to find a white-box model that explains a given data point (not the entire data space) --- * **What is the basic idea behind LIME?** * Find local explanations for an given example ![](https://i.imgur.com/ocmmu7c.png) ## Philosophical Questions * **What did Turing want to demonstrate with his test?** * Turing wanted to find out whether a machine is intelligent or not. It was simpler than defining a long controversal list of prerequisites. --- * **What is the Physical Symbol Systems Hypothesis?** * ”A physical symbol system has the necessary and sufficient means for intelligent action." The Hypothesis says that if a system, artificial or oganic, works this way it is intelligent. This would mean that humans work the same way as A.I.. And both can be considered intelligent. --- * **What is the Chinese Room Argument and how does it relate to the Turing test?** * The Chines Room Argument tries to render the Turing Test invalid. It proposes the following example. There is a room with a man, who can not speak chinese. Through the slid of the door notes are passed in chinese. With the help of a book, the man answers those questions. He does not know what he is answering, he is simply manipulating the symbols. $\Rightarrow$ Just as the room does not know any Chinese, a computer that passes the Turing Test can not really think- The Turing Test cannot be used to discover “true” intelligence, it can only be used to discover “simulated” intelligence * The argument differences between “true” intelligence and “simulated” intelligence. The word which is used is intentionally. --- * **What is commonly perceived as the difference between “Weak” and “Strong” AI?** * Weak A.I. is not universally intelligent and is designed to solve one and only one problem. Strong A.I. builds an universally intelligent agent. (Able to pass the Turing Test) --- * **How does the Mind-Body Problem relate to AI?** * Can minds be built on computer hardware or is there something missing, for example the soul. --- * **Do you find it convincing?** * I think this can be answered be everyone by them self.