Chapter 13: Learning (deep) continuous functions

# Chapter 13: Learning (deep) continuous functions It's the same idea of updating prior beliefs, just applied to neural networks. ### Key idea The parameters of a neural network comes from one or more Gaussian distributions. Given some data we can update the priors to come up with neural nets that fit the data better. ```javascript var dm = 10 //size of hidden layer var makeFn = function(M1,M2,B1){ return function(x){ return T.toScalars( // M2 * sigm(x * M1 + B1): T.dot(M2,T.sigmoid(T.add(T.mul(M1,x),B1))) )[0]} } var observedData = [{"x":-4,"y":69.76636938284166},{"x":-3,"y":36.63586217969598},{"x":-2,"y":19.95244368751754},{"x":-1,"y":4.819485497724985},{"x":0,"y":4.027631414787425},{"x":1,"y":3.755022418210824},{"x":2,"y":6.557548104903805},{"x":3,"y":23.922485493795072},{"x":4,"y":50.69924692420815}] var inferOptions = {method: 'optimize', samples: 100, steps: 3000, optMethod: {adam: {stepSize: 0.1}}} var post = Infer(inferOptions, function() { var M1 = sample(DiagCovGaussian({mu: zeros([dm, 1]), sigma: ones([dm,1])})) var B1 = sample(DiagCovGaussian({mu: zeros([dm, 1]), sigma: ones([dm,1])})) var M2 = sample(DiagCovGaussian({mu: zeros([1, dm]), sigma: ones([1,dm])})) var f = makeFn(M1,M2,B1) var obsFn = function(datum){ observe(Gaussian({mu: f(datum.x), sigma: 0.1}), datum.y) } mapData({data: observedData}, obsFn) return {M1: M1, M2: M2, B1: B1} } ) print("observed data:") viz.scatter(observedData) var postFnSample = function(){ var p = sample(post) return makeFn(p.M1,p.M2,p.B1) } ``` Notice two things here: 1. How the parameters for the network are sampled from a `DiagCovGaussian` (multivariate Gaussian distribution). 2. How we `observe` the data: `observe(Gaussian({mu: f(datum.x), sigma: 0.1}), datum.y)` The second step is key to updating the parameters. __A non-Bayesian way of updating parameters requires a loss function to backpropagate on. In a Bayesian way, we are trying to increase the likelihood of getting `y` from a Gaussian centered at the output of the neural net.__ __Note__: As the width of hidden layer goes to infinity, the network approaches a Gaussian process. > Infinitely “wide” neural nets yield a model where `f(x)` is Gaussian distributed for each `x`, and further (it turns out) the covariance among different `x`s is also Gaussian. --- ### Deep generative models > Many interesting problems are unsupervised: we get a bunch of examples and want to understand them by capturing their distribution. > Notice how this works: ```javascript var hd = 10 var ld = 2 var outSig = Vector([0.1, 0.1]) var post = Infer(inferOptions, function() { var M1 = sample(DiagCovGaussian({mu: zeros([hd,ld]), sigma: ones([hd,ld])}), { guide: function() {return Delta({v: param({dims: [hd, ld]})})}}) var B1 = sample(DiagCovGaussian({mu: zeros([hd, 1]), sigma: ones([hd,1])}), { guide: function() {return Delta({v: param({dims: [hd, 1]})})}}) var M2 = sample(DiagCovGaussian({mu: zeros([2,hd]), sigma: ones([2,hd])}), { guide: function() {return Delta({v: param({dims: [2,hd]})})}}) var f = makeFn(M1,M2,B1) var sampleXY = function(){return f(sample(DiagCovGaussian({mu: zeros([ld, 1]), sigma: ones([ld,1])})))} var means = repeat(observedData.length, sampleXY) var obsFn = function(datum,i){ observe(DiagCovGaussian({mu: means[i], sigma: outSig}), Vector([datum.x, datum.y])) } mapData({data: observedData}, obsFn) return {means: means, pp: repeat(100, sampleXY)} } ) ``` Note: 1. The output is not a scalar anymore; it is a vector of length two. 2. We use `sampleXY` defined as `var sampleXY = function(){return f(sample(DiagCovGaussian({mu: zeros([ld, 1]), sigma: ones([ld,1])})))}` to sample `means`. 3. We observe `means` to be as close to our observed data as possible: `observe(DiagCovGaussian({mu: means[i], sigma: outSig}), Vector([datum.x, datum.y]))` --- ### Minibatches and amortized inference > Minibatches: the idea is that randomly sub-sampling the data on each step can give us a good enough approximation to the whole data set. > But if split the data into smaller batches, we need to be sure the latent variable used to sample `means` improves over time. ### ToDo: Read about [Amortized Inference in Probabilistic Reasoning](https://web.stanford.edu/~ngoodman/papers/amortized_inference.pdf) ###### tags: `probabilistic-models-of-cognition`