# Computer Vision Computer Vision is about analyzing images and videos to extract knowledge from them.Mostly these images are of real scenes like that of a street image with cars where autonomous vehicle have to navigate through or it could be other type of images like that of an X-Ray inside a human head and we need to do image analysis to be able to extract things about things of interest in medical applications. So essentially goal is image and video understanding which means labelling and tracking interesting things in an image as they move Computational Photography: Capturing a light from scene to record a scene into a photograph or such other novel artifact that showcases the scene.Image analysis is done to support the capture and display of the scene in novel ways. Computer vision is really about interpreting an analysis of the scene i.e what is the content of the image of the scene ,who is there,what is in the image, what is happening. ![](https://i.imgur.com/BxmOOHE.png) ![](https://i.imgur.com/thSNGqr.jpg) ### OCR (Optical Character Recognition) ![](https://i.imgur.com/ogHIYAB.png) More advanced systems: Banks has machines that can detect handwritten digits. ### Face Detection ![](https://i.imgur.com/zgCRkwx.jpg) Now cameras can identify when a person blinks or smiles. Cameras can even recognize who you are i.e face recognition. ## Object Recognition Smart phone detects monuments and statues and it can search it on web and pull out information. Google Glass: Using Object recognition method with camera gives information about the object. ## Special Effects Taking the scan of somebody's face using laser or other methods and make models from those.We can light these models from different sides and directions. ## Motion Capture Markers on the face are tracked by cameras and using 3D geometry they know where to put the face. ![](https://i.imgur.com/AoULkaR.jpg) Aerial imagery to identify models. ## Smart Cars Automatically recognizing signs and identifying pedestrians using computer vision.The system alerts if the pedestrians are too close. ![](https://i.imgur.com/3CGzfpG.png) ## Video Games Microsoft Kinect : Depth Sensor Can build images from a scene. Darker is farther away and brighter is closer. Gray is in between. Can produce skeletal descriptions. Skeletal geometry of people can be recovered from depth images. ![](https://i.imgur.com/H92NMYq.png) ## Vision Is Not Image Processing Seeing is not the same as measuring properties of the image. Seeing is making percept of what is in the world based upon measurements made by image sensor. ## Images As Function ![](https://i.imgur.com/FPknQXx.jpg) ![](https://i.imgur.com/q9bDiKF.jpg) ![](https://i.imgur.com/fK2ezuC.png) ![](https://i.imgur.com/h2hEFpY.jpg) ![](https://i.imgur.com/y2DjQmA.jpg) ![](https://i.imgur.com/318Inz9.jpg) ![](https://i.imgur.com/vFXKjz5.jpg) ![](https://i.imgur.com/U8hkFNl.jpg) ![](https://i.imgur.com/rBJgI1D.jpg) ![](https://i.imgur.com/T9YEAe6.jpg) ![](https://i.imgur.com/ltO7iBP.png) ![](https://i.imgur.com/aDFlUaS.jpg) ![](https://i.imgur.com/oLR7VHr.png) ![](https://i.imgur.com/WgyINhS.jpg) ![Uploading file..._r8pyb3o5b]() ![](https://i.imgur.com/oKHY2iI.jpg) ![](https://i.imgur.com/nyEFdOF.jpg) ![](https://i.imgur.com/8VWvRW3.jpg) ![](https://i.imgur.com/PTqRYzN.jpg) ![](https://i.imgur.com/a0O25QN.jpg) ![](https://i.imgur.com/OHn9pli.png) ## FILTERS AS TEMPLATES ### Intro Moving the template across the entire image and comparing it to the image’s covered window is a process known as “template matching.” The implementation of template matching involves two-dimensional convolution. You could imagine you take some image,and you convert the pixels into some, maybe an array again.Still an image as a function. But the idea is that that function now, is no longer just about the intensity.But it's about some property of the pixels locally in the image. ### 1D Correlation maximum will be somewhere at the point where filter resembles most accurately to original signal. ![](https://i.imgur.com/xNwE7S1.png) ### Template matching ![](https://i.imgur.com/X36ajoj.png) ![](https://i.imgur.com/NnktfYV.png) ![](https://i.imgur.com/9UDzqLb.png) No flipping in correlation. In other words, correlation is convolution without flipping. ![](https://i.imgur.com/vUtWxOe.png) This is not the required result , we want cross corelation to be maximum for A , but here it is minimum . So , to solve this problem we will use normalisation. ![](https://i.imgur.com/rROjlfy.png) By doing this , you make cross corelation insensitive to the changes in brightness. brightest point in the output image corresponds to the matched location of the template. ![](https://i.imgur.com/yc01Pk2.png) ### EDGE DETECTION ![](https://i.imgur.com/o0maqA9.png) ![](https://i.imgur.com/UdgqvpG.png) ![](https://i.imgur.com/Cb2XCii.png) ![](https://i.imgur.com/HL7lBS0.png) ![](https://i.imgur.com/NogZzPh.png) ![](https://i.imgur.com/eT3QeRr.png) ![](https://i.imgur.com/yuvCEzm.png) ![](https://i.imgur.com/j8zDwzG.png) ![](https://i.imgur.com/nEBqCqu.png) ![](https://i.imgur.com/I4TIamd.png) ### Hough Transform ![](https://i.imgur.com/IZi89BA.jpg) ![](https://i.imgur.com/OwCCCOZ.jpg) ![](https://i.imgur.com/SUtSu93.jpg) ![](https://i.imgur.com/qzPeVxF.jpg) ![](https://i.imgur.com/9Yh8Cfc.jpg) ![](https://i.imgur.com/aTp6uPD.jpg) ![](https://i.imgur.com/1BFTjhx.png) smooth to remove noise ![](https://i.imgur.com/RsTokUD.png) Can accidentally find peaks ![](https://i.imgur.com/QyJsFgd.png) [https://towardsdatascience.com/lines-detection-with-hough-transform-84020b3b1549]