UPENDRA KUMAR - HackMD

Region Based Convolutional Neural Networks
Article on Selective Search Algorithm: https://www.geeksforgeeks.org/selective-search-for-object-detection-r-cnn/ RCNN:
UPENDRA KUMAR changed 3 years agoView mode Like Bookmark
Mask RCNN
Introduction Mask R-CNN is a masterpiece of He Kaiming God in 2017. It performs instance segmentation while performing target detection, and has achieved excellent results. It has won the COCO 2016 championship without any tricks. The design of its network is also relatively simple. On the basis of Faster R-CNN, a branch is added to the original two branches (classification + coordinate regression) for semantic segmentation , as shown in the following figure Mask R-CNN detailed Introduction So why does this network have such good results, and what are the network details? The following are introduced one by one in detail. Before introducing Mask R-CNN, first understand what is segmentation, because Mask R-CNN does this, so this must be figured out first, see the following figure, which mainly introduces several different segmentation, of which Mask RCNN does Among theminstance segmentation. Semantic segmentation: classify pixel by pixel in an image.
UPENDRA KUMAR changed 3 years agoView mode Like Bookmark
Comparison of YOLO1, YOLO2, and YOLO3
YOLO1 1.The main ideas of target detection Unlike the RCNN series, YOLO treats target detection as a regression problem, and directly uses a network for classification and box regression. The specific method is: divide the image into S * S grids, and each grid predicts the positions (x, y, w, h) of B bboxes, confidence (confidence is the intersection ratio), and class probability. The output dimension is S * S * (B * 5 + C), and C is the number of categories. No matter how many boxes are contained in the grid, each grid only predicts a set of class probabilities. During the test, the conditional class probability and the confidence of the prediction box are multiplied to indicate that each box contains the confidence of a certain type of object. This score can represent the category probability and prediction accuracy of the box at the same time. 2. Overall network structure The basic network model is GoogLe Net, but instead of using its inception module, it uses 1 * 1 and 3 * 3 convolutional layers alternately.
UPENDRA KUMAR changed 4 years agoView mode Like Bookmark
Linux
User: Regular user (only has access in their home directory only) Root user or Super user (Admin) $ : Regular User (#) : Sudo user/Super user Absolute Path : cd/bin/fo1
UPENDRA KUMAR changed 4 years agoView mode Like Bookmark
Inception (Also Known as GoogleNet)
Also known as GoogLeNet , it is a 22-layer network that won the 2014 ILSVRC Championship. The original intention of the design is to expand the width and depth on its basis . which is designed motives derived from improving the performance of the depth of the network generally can increase the size of the network and increase the size of the data set to increase, but at the same time cause the network parameters and easily fit through excessive , computing resources inefficient and The production of high-quality data sets is an expensive issue. Its design philosophy is to change the full connection to a sparse architecture and try to change it to a sparse architecture inside the convolution. The main idea is to design an inception module and increase the depth and width of the network by continuously copying these inception modules , but GooLeNet mainly extends these inception modules in depth.
UPENDRA KUMAR changed 4 years agoView mode Like Bookmark
ResNet Architecture
Introduction ResNet is a network structure proposed by the He Kaiming, Sun Jian and others of Microsoft Research Asia in 2015, and won the first place in the ILSVRC-2015 classification task. At the same time, it won the first place in ImageNet detection, ImageNet localization, COCO detection, and COCO segmentation tasks. It was a sensation at the time. ResNet, also known as residual neural network, refers to the idea of adding residual learning to the traditional convolutional neural network, which solves the problem of gradient dispersion and accuracy degradation (training set) in deep networks, so that the network can get more and more The deeper, both the accuracy and the speed are controlled. The problem caused by increasing depth The first problem brought by increasing depth is the problem of gradient explosion / dissipation . This is because as the number of layers increases, the gradient of backpropagation in the network will become unstable with continuous multiplication, and become particularly large or special. small. Among them , the problem of gradient dissipation often occurs . In order to overcome gradient dissipation, many solutions have been devised, such as using BatchNorm, replacing the activation function with ReLu, using Xaiver initialization, etc. It can be said that gradient dissipation has been well solved
UPENDRA KUMAR changed 4 years agoView mode Like Bookmark
VGG(Visual Geometry Group) Architecture
VGG-Net Introduction The full name of VGG is the Visual Geometry Group, which belongs to the Department of Science and Engineering of Oxford University. It has released a series of convolutional network models beginning with VGG, which can be applied to face recognition and image classification, from VGG16 to VGG19. The original purpose of VGG's research on the depth of convolutional networks is to understand how the depth of convolutional networks affects the accuracy and accuracy of large-scale image classification and recognition. -Deep-16 CNN), in order to deepen the number of network layers and to avoid too many parameters, a small 3x3 convolution kernel is used in all layers. The network structure The input of VGG is set to an RGB image of 224x224 size. The average RGB value is calculated for all images on the training set image, and then the image is input as an input to the VGG convolution network. A 3x3 or 1x1 filter is used, and the convolution step is fixed. . There are 3 VGG fully connected layers, which can vary from VGG11 to VGG19 according to the total number of convolutional layers + fully connected layers. The minimum VGG11 has 8 convolutional layers and 3 fully connected layers. The maximum VGG19 has 16 convolutional layers. +3 fully connected layers. In addition, the VGG network is not followed by a pooling layer behind each convolutional layer, or a total of 5 pooling layers distributed under different convolutional layers. The following figure is VGG Structure diagram:
UPENDRA KUMAR changed 4 years agoView mode Like Bookmark
AlexNet Architecture
AlexNet was designed by Hinton, winner of the 2012 ImageNet competition, and his student Alex Krizhevsky. It was also after that year that more and deeper neural networks were proposed, such as the excellent vgg, GoogleLeNet. Its official data model has an accuracy rate of 57.1% and top 1-5 reaches 80.2%. This is already quite outstanding for traditional machine learning classification algorithms. The following table below explains the network structure of AlexNet: Why does AlexNet achieve better results? Relu activation function is used. Relu function: f (x) = max (0, x)
UPENDRA KUMAR changed 4 years agoView mode Like Bookmark
LeNet Architecture
LeNet-5, from the paper Gradient-Based Learning Applied to Document Recognition, is a very efficient convolutional neural network for handwritten character recognition. Authors: Yann LeCun, Léon Bottou, Yoshua Bengio, and Patrick Haffner Published in: Proceedings of the IEEE (1998) Structure of the LeNet network LeNet5 is a small network, it contains the basic modules of deep learning: convolutional layer, pooling layer, and full link layer. It is the basis of other deep learning models. Here we analyze LeNet5 in depth. At the same time, through example analysis, deepen the understanding of the convolutional layer and pooling layer.
UPENDRA KUMAR changed 4 years agoView mode Like 1 Bookmark
Fast R-CNN
Fast RCNN
UPENDRA KUMAR changed 4 years agoView mode Like Bookmark
Perceptron
The Perceptron is one of the simplest ANN architectures, invented in 1957 by Frank Rosenblatt. It is based on a slightly different artificial neuron called a threshold logic unit (TLU), or sometimes a linear threshold unit (LTU): The inputs and output are now numbers (instead of binary on/off values) and each input connection is associated with a weight. The TLU computes a weighted sum of its inputs (z = w1 x1 + w2 x2 + ⋯ + wn xn = xT w), then applies a step function to that sum and outputs the result: hw(x) = step(z), where z = xT w. Most common step function used are: Heaviside function sign function A single TLU can be used for simple linear binary classification. It computes a linear combination of the inputs and if the result exceeds a threshold, it outputs the positive class or else outputs the negative class (just like a Logistic Regression classifier or a linear SVM).
UPENDRA KUMAR changed 4 years agoView mode Like Bookmark