---
tags: machine-learning
---
# AlexNet: Summary and Implementation
<div style="text-align: center">
<img src="https://raw.githubusercontent.com/valoxe/image-storage-1/master/research-paper-summary/alexnet/0.png?token=AMAXSKLOJ332DVKUCQFEDVS6WMFAW">
</div>
>This post is divided into 2 sections: Summary and Implementation.
>
>We are going to have an in-depth review of [ImageNet Classification with Deep ConvolutionalNeural Networks][paper] paper which introduces the AlexNet architecture.
>
> The implementation uses Keras as framework. To see full implementation,
> please refer to this [repository].
>
> Also, if you want to read other "Summary and Implementation", feel free to
> check them at my [blog](https://ferdinandmom.engineer/deep-learning/).
# I) Summary
**DISCLAIMER:**
- We will use the weights/biases of a pretained caffe model from this [website].
- Local Response Normalization will not be implemented since Keras doesn't support it anymore.
---
AlexNet architecture:
- **5 Convolutional layers**.
- **3 Fully connected layers**.
- **3 Overlapping Max pooling layers**.
- **ReLU** as activation function for hidden layer.
- Avoid vanishing gradients for positive values.
- More computationally efficient to compute than sigmoid and tanh.
- Better convergence performance than sigmoid and tanh.
- **Softmax** as activation function for output layer.
- **60,000,000 trainable parameters**.
- **Cross-entropy** as cost function
- **Mini-batch gradient descent with Momentum optimizer**.
- Batch size : 128.
- Momentum = 0.9.
- Weight decay = 0.0005.
- Learning rate: 0.01. Equal learning rate for all layers and diving by 10 when validation error stopped improving.
- **Local Response Normalization**
- it helps with generalization.
---
AlexNet details:
- Trained with **ILSVRC-2012** dataset (1.2 million training images, 50,000 validation images, and 150,000 testing images.).
- Trained on **90 epochs**.
- **Weight initialization**: zero-mean Gaussian distribution and a standard deviation of 0.01.
- **Bias initialization**: 1 for 2nd/4th/5th conv layers and all fully-connected layers and 0 for remaining layers.
---
AlexNet inputs:
- **RGB image of size 256 x 256**. If not, training/test set images need to be resized.
- Example: image_size = 1024 x 500 => Smaller dimension is resized to 256 and resulting image is cropped to obtain a 256 x 256 image.
- the RGB image of size 256 x 256 will then be **cropped into 227 x 227** (cf Data Augmentation part). The paper mistakenly says 224 x 224.
---
AlexNet is proned to overfit, thus to prevent that:
- **Dropout**.
- 50% dropout rate.
- **Data Augmentation**.
- **Translations and horizontal reflections (mirroring)**: Extract random 227 x 227 crops from 256 x 256 images.
- Translation on 1 image: (256−227)∗(256−227) = 841 possible images.
- Mirroring : x2 the training set size.
- New training set size = 1.2 millions * 2 * 841 = 1.2 millions * 1682 images.
- **Altering the intensities of RGB channels**: performing PCA on the set of RGB pixel values throughout the ImageNet training set. Doing this approximately captures an important property of natural images: object identity is invariant to changes in the intensity and color of the illumination.
<ins>**Remark**:</ins>
- According to the paper, they only trained on 1.2 millions training data without using data augmentation. The reason is the following:
- Suppose they could get 0.001s per forward/backward pass. It will take (0.001 * 1,200,000 * 1682 * 90) / (60 * 60 * 24 * 365) ~= **5.7 years** to train the model.
---
![legend]
![alexnet-model]
# II) Implementation
### 1) Architecture build
```python
def grouped_conv(input_val, name, half, filters, kernel_size, strides=1, padding='valid'):
"""
Performs a grouped convolution.
Parameters:
-input_val: previous layer.
-name: name of the convolution.
-half: Number of channels for each convolution.
-filters: Number of filters for each convolution.
-kernel_size: Kernel size used for each convolution.
-strides: stride. Default value is 1.
-padding: 'valid'(default) or 'same'.
Returns:
-conv: concatenation of the 2 previous convolution layer.
"""
input_val_1 = Lambda(lambda x: x[:, :, :, :half])(input_val)
input_val_2 = Lambda(lambda x: x[:, :, :, half:])(input_val)
conv_1 = Conv2D(filters=filters,
kernel_size=kernel_size,
padding=padding,
activation='relu',
name=name + '_1')(input_val_1)
conv_2 = Conv2D(filters=filters,
kernel_size=kernel_size,
padding=padding,
activation='relu',
name=name + '_2')(input_val_2)
conv = Concatenate(name=name)([conv_1, conv_2])
return conv
```
```python
def AlexNet():
x = Input((227, 227, 3))
conv1 = Conv2D(filters=96,
kernel_size=(11, 11),
strides=4,
activation='relu',
name='conv1')(x)
pool1 = MaxPooling2D(pool_size=3,
strides=2)(conv1)
conv2 = grouped_conv(input_val=pool1,
name='conv2',
half=48,
filters=128,
kernel_size=5,
padding='same')
pool2 = MaxPooling2D(pool_size=3,
strides=2)(conv2)
conv3 = Conv2D(filters=384,
kernel_size=(3, 3),
padding='same',
activation='relu',
name='conv3')(pool2)
conv4 = grouped_conv(input_val=conv3,
name='conv4',
half=192,
filters=192,
kernel_size=3,
padding='same')
conv5 = grouped_conv(input_val=conv4,
name='conv5',
half=192,
filters=128,
kernel_size=3,
padding='same')
pool5 = MaxPooling2D(pool_size=3,
strides=2)(conv5)
flatten = Flatten()(pool5)
fc6 = Dense(4096, activation='relu', name='fc6')(flatten)
fc7 = Dense(4096, activation='relu', name='fc7')(fc6)
fc8 = Dense(1000, activation='softmax', name='fc8')(fc7)
model = Model(inputs=x, outputs=fc8)
return model
```
### 2) Evaluating
```python
imagenet_mean = np.array([104., 117., 124.], dtype=np.float32)
fig2 = plt.figure(figsize=(30,10))
for i, image in enumerate(imgs):
img = cv2.resize(image.astype(np.float32), (227,227))
img -= imagenet_mean
img = img.reshape((1,227,227,3))
probs = model.predict(img)
class_name = class_names[np.argmax(probs)]
fig2.add_subplot(1,4,i+1)
plt.imshow(cv2.cvtColor(image, cv2.COLOR_BGR2RGB))
plt.title("Class: " + class_name + ", probability: %.4f" %probs[0,np.argmax(probs)], fontsize=13)
plt.axis('off')
plt.text(0, 240, 'Top-5 Accuracy:')
x, y = 10, 260
for idx in np.argsort(probs)[0][-5::][::-1]:
plt.text(x, y, s ='- {}, probability: {:.4f}'.format(class_names[idx], probs[0, idx]), fontsize=12)
y += 20
print()
```
![evaluating-image]
[paper]: https://papers.nips.cc/paper/4824-imagenet-classification-with-deep-convolutional-neural-networks.pdf
[repository]: https://github.com/3outeille/Research-Paper-Summary/tree/master/src/architecture/alexnet/tensorflow_2
[website]: http://www.cs.toronto.edu/~guerzhoy/tf_alexnet/
[legend]: https://raw.githubusercontent.com/valoxe/image-storage-1/master/research-paper-summary/alexnet/1.png?token=AMAXSKITCLT6SFLZYGG2Y5K6WMET6
[alexnet-model]: https://raw.githubusercontent.com/valoxe/image-storage-1/master/research-paper-summary/alexnet/2.png?token=AMAXSKM4VWEFT6JHLEZQ5GS6WMFC4
[evaluating-image]: https://raw.githubusercontent.com/valoxe/image-storage-1/master/research-paper-summary/alexnet/3.png?token=AMAXSKKDIH5N43DQIKYFDHS6WMFOE

In the previous post, we have seen a naive implementation of Convolutional Neural network using Numpy. Here, we are going to implement a faster CNN using Numpy with the im2col/col2im method. To see the full implementation, please refer to my repository. Also, if you want to read some of my blog posts, feel free to check them at my blog. I) Forward propagation :::info

9/30/2022In this post, we are going to see how to implement a Convolutional Neural Network using only Numpy. The main goal here is not only to give a boilerplate code but rather to have an in-depth explanation of the underlying mechanisms through illustrations, especially during the backward propagation where things get trickier. However, some knowledge about Convolutional Neural Networks building blocs are required. To see the full implementation, please refer to my repository. For the more advanced, here is another post where we implement a faster CNN using im2col/col2im methods.

9/30/2022This post is divided into 2 sections: Summary and Implementation. We are going to have an in-depth review of Visualizing and Understanding Convolutional Networks paper which introduces the ZFNet and DeconvNet architecture. The implementation uses Pytorch as framework. To see full implementation, please refer to this repository. Also, if you want to read other "Summary and Implementation", feel free to check them at my blog.

1/28/2022This post is divided into 2 sections: Summary and Implementation. We are going to have an in-depth review of Very Deep Convolutional Networks for Large-Scale Image Recognition paper which introduces the VggNet architecture. The implementation uses Pytorch as framework. To see full implementation, please refer to this repository. Also, if you want to read other "Summary and Implementation", feel free to check them at my blog.

1/28/2022
Published on ** HackMD**

or

By clicking below, you agree to our terms of service.

Sign in via Facebook
Sign in via Twitter
Sign in via GitHub
Sign in via Dropbox
Sign in with Wallet

Wallet
(
)

Connect another wallet
New to HackMD? Sign up