[NOTE] Spatial Analysis and Gaussian Processes (Kriging)

[**:house: Home**](https://hackmd.io/s/rkkDP_l4M) | [:boy: **About**](https://hackmd.io/s/B149Z8v7b) | [**:microscope: Researches**](https://hackmd.io/s/rJPFNKlVz) | [**:rocket: Side projects**](https://hackmd.io/s/H1aS2qe4G) | [**:airplane: Life gallery**](https://hackmd.io/s/HJN4JslNM) --- # [NOTE] Spatial Analysis and Gaussian Processes (Kriging) *<div style="text-align: center;" markdown="1">`geostatistics` `LATEX`</div>* ## Introdction Kriging (Gaussian process) interpolation aims for predicting the value of unknown location in spatial analysis. Given the uncertainty and trend of spatial distribution, the value of unknown loction can be predicted. However, this is a *lasy learning* algorithm which memerizes the observation and measures the correlation within observed locations and unknow location. The distance between any of two locations is the relevant variable of spatial correlation. The spatial correlation as function of distance is called ***varigoram*** or ***covarigoram***. Moreover, the kriging interpolation is a regression model, and the concept is consisted of two major processes: ***error minimization*** and ***varigoram***. The error minimization process provides the optimization model for having less error between prediction and observation. According to regression model, the result of optimzation provides the weights of prediction from all observation. The weights, in general, are correlated to varigoram between unknown location and all observations. Of course, depending on different designed models of regression and variogram, kriging interpolation is very flexible for any problem. ## My outreach talk * Event : [Meetup - R MLDM Group (2019.09.23)](https://www.meetup.com/Taiwan-R/events/264959486/) * Topic : Make you own Kriging Interpolation Algorithm with Python * Slide : [outreach_2019-09-24_kriging_meetup.pdf](https://drive.google.com/file/d/1HbhfXLvDXdJrfkQRy3CepnhIF5WoSW8p/view?usp=sharing) ## Defination of Varigoram and Covarigoram The varigoram, $V(h)$, is defined by the variance of the values between any two points $(x_i,\,x_j)$ with certain distance $h$. The *semivarigoram* is very common to used in calculation under the isotropy asssumption, i.e. no direction dependency $$ \begin{split} \gamma(h) &= \frac{V(h)}{2} \\ &= \frac{E\left((Z(x_i)-Z(x_j))^2\right)_{(i,\,j)\in N(h)}}{2}\\ &=\frac{1}{2|N(h)|}\sum_{(i,\,j)\in N(h)}\left(Z(x_i)-Z(x_j)\right)^2 \\ &= \frac{1}{2|N(h)|}\sum_{(i,\,j)\in N(h)}\left[(Z(x_i)-m(h))-(Z(x_j)-m(h))\right]^2 \\ &=\frac{1}{2|N(h)|}\sum_{(i,\,j)\in N(h)}(Z(x_i)-m(h))^2-2(Z(x_i)-m(h))(Z(x_j)-m(h))+(Z(x_j)-m(h))^2 \\ &=\frac{1}{2|N(h)|}\sum_{(i,\,j)\in N(h)}{\left[(Z(x_i)-m(h))^2+(Z(x_j)-m(h))^2\right]} - C(h) \end{split} $$ where the front factor of $2$ is the subtraction of double counts of $(i,\,j)$ and $(j,\,i)$ for same distance (lag) $h$, i.e. $z(x_j)=z(x_i+h)$; $m(h)$ is the expected value (mean) of the sum of values of $x_i$ and $x_j$ with certain distance $h$ as $$ m(h)=\frac{1}{2|N(h)|}\sum_{(i\,,j)\in N(h)}Z(x_i)+Z(x_j)\ . $$ The last term is called *covariogram* and defined as $$ \begin{split} C(h) &= \frac{1}{|N(h)|}\sum_{(i,\,j)\in N(h)}(Z(x_i)-m(h))(Z(x_j)-m(h)) \\ &= \frac{1}{|N(h)|}\sum_{(i,\,j)\in N(h)}Z(x_i)Z(x_j) - m(h)^2 \end{split} $$ If the variance is assumed to depend only on $h$ instead of $x$, the expected value of $Z(x)$ in any location is the same, i.e. $$ E\{Z(x_i)\} = E\{Z(x_i+h)\}=Z_0 $$ In this case, $m(h)$ can be reduced to $z_0$, and the first term can be reduce to the variance of $Z(x)$ as $V_Z$, i.e. $$ \begin{split} V_z=\frac{1}{2|N(h)|}\sum_{(i,\,j)\in N(h)}{\left[(Z(x_i)-Z_0)^2+(Z(x_j)-Z_0)^2\right]}\\ \end{split} $$ Thus, the semivariogram can be simplified to $$ \gamma(h) = V_Z - C(h) $$ Idealy, the variance is propotional to distance, while the covariance is inverse proportional. Thus, the variacne matrix is expected to positive definite, while covariance matrix is negitve definite. Moreover, according to the observation, the sets of varigoram is as function of $h$ range (gap). The results can be either discrete (array-like) or continual (distribution) depending on purpose. ## Reference - [Wikipedia - Kriging](https://en.wikipedia.org/wiki/Kriging) - [Course lists](https://www.youtube.com/playlist?list=PLjoNU9Txin7lsQp-bgosLH4jY4rf7YBlx) - [Slides](https://www.dropbox.com/sh/m2d1jymmfz70cvc/AADbbKErf6ZEXTEVa9Q3b-uda?dl=0) - [PyKrige](https://github.com/bsmurphy/PyKrige) <br> --- [:ghost: Github](https://github.com/juifa-tsai) | [:busts_in_silhouette: Linkedin ](https://www.linkedin.com/in/jui-fa-tsai-08ba0a93)