# Notes about "Taking the Human Out of the Loop: A Review of Bayesian Optimization" [:link:][paper_one]
[paper_one]:
https://ieeexplore.ieee.org/document/7352306
###### tags: `Bibliography review`
Big data applications are associated with big amount of data, complex structures, and involves large number of users and storage architecture. Bayesian optimization is a powerfull tool for the joint optimization of the design choices that has gained great popularity in recent years. The quoted paper presents a review of the Bayesian optimization.
## Introduction
Design experiments is a big challenge mainly when involving big amount of data and variables. This designs are fraught of choices that are often complex and with a lot of interactions which becomes hard to take the right one for humans without computer or mathematical assistance. Bayesian optimization has emerged to help the task of design problem.
As an example the tool of the IBM for optimization CPLEX has at least 76 parameters to tune a hard work to be made by some professional that want to use this tool tothis tool to optimization.
As a second example we have the game industries. On one side have the content provider as another side user at middle of both the analĂticas company. The analitic company must develop procedures to automatically design game variants across millions of users. The objective is enhance the experience and improve the content provider revenue.
Software engineers are frequently faced with hundreds of choices of the parameters of the programs. Bayesian optimization can be used to construct optimal programs. Which is programs that run faster or find better solutions.
Mathematically the Bayesian aptimitation are dealting with global maximizers (or minimizers) of unknown objective function.
$$
x = \arg_{X \in \chi} max f(X)
$$
where $\chi$ is some design space of interest, in global optimization. $\chi$ is often a compact set but Bayesian optimization can be applied to more unusual search space that involve categorical or conditional inputs, even combinatorial search space.Sometimes $f$ can be a black box function with the single condition that can be evaluated at the point $X$. This evaluation produces noisy outputs $y \in \mathbb{R}$ such that $\mathbb{E}[y | f(x)]=f(x)$. We can only observe the function in a point-wise unbiased observation of $y$. Although the algorithm only needs this evaluation must be provided the gradients of the function $f$. The optimizer on iteraction N query $f$ to the point $x_{n+1 }$ and observe $y_{n+1}$. After get N points makes a final recommendation $\bar{x}_N$. Which represents the best algorithm estimates of the x.
What we do is get a value X_n+1 evaluate the function f at this point and observe the system valve yn+1 in the same point then make the same process to another other N points. Then makes the best estimates of the value of the $f(x)$.
Mathematically we are considering the problem of find a global maximizer ( or minimizer) of an unknow objective function $f$.
In the context of big data $f$ can be the object recognition system and $X$ the parameters of the network with stochastic observable accuracy $y=f(x)$ on a particular dataset. Because Bayesian optimization is very data efficient it is very usefull in situation like this where we do not have access to the derivatives of $f$ on $x$. The Bayesian optimization take advantages of the hystory of optimization to make the search efficient.
In particular the Bayesian optimization starts from a belief of the possible objective function and posteriorly refine the search as data is observed. The Bayesian posterior represent our updated beliefs. Equiped with the acquisition function $\alpha_N=\chi\rightarrow x$ that levarage the uncertainty in the posterior to guide exploration.
Bayesian optimization starts from a belief of the function value and correct this belief at each iteraction.
## Bayesian optimization with parametric models