---
tags: machine-learning
---
# AlexNet: Summary and Implementation
<div style="text-align: center">
<img src="https://raw.githubusercontent.com/valoxe/image-storage-1/master/research-paper-summary/alexnet/0.png?token=AMAXSKLOJ332DVKUCQFEDVS6WMFAW">
</div>
>This post is divided into 2 sections: Summary and Implementation.
>
>We are going to have an in-depth review of [ImageNet Classification with Deep ConvolutionalNeural Networks][paper] paper which introduces the AlexNet architecture.
>
> The implementation uses Keras as framework. To see full implementation,
> please refer to this [repository].
>
> Also, if you want to read other "Summary and Implementation", feel free to
> check them at my [blog](https://ferdinandmom.engineer/deep-learning/).
# I) Summary
**DISCLAIMER:**
- We will use the weights/biases of a pretained caffe model from this [website].
- Local Response Normalization will not be implemented since Keras doesn't support it anymore.
---
AlexNet architecture:
- **5 Convolutional layers**.
- **3 Fully connected layers**.
- **3 Overlapping Max pooling layers**.
- **ReLU** as activation function for hidden layer.
- Avoid vanishing gradients for positive values.
- More computationally efficient to compute than sigmoid and tanh.
- Better convergence performance than sigmoid and tanh.
- **Softmax** as activation function for output layer.
- **60,000,000 trainable parameters**.
- **Cross-entropy** as cost function
- **Mini-batch gradient descent with Momentum optimizer**.
- Batch size : 128.
- Momentum = 0.9.
- Weight decay = 0.0005.
- Learning rate: 0.01. Equal learning rate for all layers and diving by 10 when validation error stopped improving.
- **Local Response Normalization**
- it helps with generalization.
---
AlexNet details:
- Trained with **ILSVRC-2012** dataset (1.2 million training images, 50,000 validation images, and 150,000 testing images.).
- Trained on **90 epochs**.
- **Weight initialization**: zero-mean Gaussian distribution and a standard deviation of 0.01.
- **Bias initialization**: 1 for 2nd/4th/5th conv layers and all fully-connected layers and 0 for remaining layers.
---
AlexNet inputs:
- **RGB image of size 256 x 256**. If not, training/test set images need to be resized.
- Example: image_size = 1024 x 500 => Smaller dimension is resized to 256 and resulting image is cropped to obtain a 256 x 256 image.
- the RGB image of size 256 x 256 will then be **cropped into 227 x 227** (cf Data Augmentation part). The paper mistakenly says 224 x 224.
---
AlexNet is proned to overfit, thus to prevent that:
- **Dropout**.
- 50% dropout rate.
- **Data Augmentation**.
- **Translations and horizontal reflections (mirroring)**: Extract random 227 x 227 crops from 256 x 256 images.
- Translation on 1 image: (256−227)∗(256−227) = 841 possible images.
- Mirroring : x2 the training set size.
- New training set size = 1.2 millions * 2 * 841 = 1.2 millions * 1682 images.
- **Altering the intensities of RGB channels**: performing PCA on the set of RGB pixel values throughout the ImageNet training set. Doing this approximately captures an important property of natural images: object identity is invariant to changes in the intensity and color of the illumination.
<ins>**Remark**:</ins>
- According to the paper, they only trained on 1.2 millions training data without using data augmentation. The reason is the following:
- Suppose they could get 0.001s per forward/backward pass. It will take (0.001 * 1,200,000 * 1682 * 90) / (60 * 60 * 24 * 365) ~= **5.7 years** to train the model.
---
![legend]
![alexnet-model]
# II) Implementation
### 1) Architecture build
```python
def grouped_conv(input_val, name, half, filters, kernel_size, strides=1, padding='valid'):
"""
Performs a grouped convolution.
Parameters:
-input_val: previous layer.
-name: name of the convolution.
-half: Number of channels for each convolution.
-filters: Number of filters for each convolution.
-kernel_size: Kernel size used for each convolution.
-strides: stride. Default value is 1.
-padding: 'valid'(default) or 'same'.
Returns:
-conv: concatenation of the 2 previous convolution layer.
"""
input_val_1 = Lambda(lambda x: x[:, :, :, :half])(input_val)
input_val_2 = Lambda(lambda x: x[:, :, :, half:])(input_val)
conv_1 = Conv2D(filters=filters,
kernel_size=kernel_size,
padding=padding,
activation='relu',
name=name + '_1')(input_val_1)
conv_2 = Conv2D(filters=filters,
kernel_size=kernel_size,
padding=padding,
activation='relu',
name=name + '_2')(input_val_2)
conv = Concatenate(name=name)([conv_1, conv_2])
return conv
```
```python
def AlexNet():
x = Input((227, 227, 3))
conv1 = Conv2D(filters=96,
kernel_size=(11, 11),
strides=4,
activation='relu',
name='conv1')(x)
pool1 = MaxPooling2D(pool_size=3,
strides=2)(conv1)
conv2 = grouped_conv(input_val=pool1,
name='conv2',
half=48,
filters=128,
kernel_size=5,
padding='same')
pool2 = MaxPooling2D(pool_size=3,
strides=2)(conv2)
conv3 = Conv2D(filters=384,
kernel_size=(3, 3),
padding='same',
activation='relu',
name='conv3')(pool2)
conv4 = grouped_conv(input_val=conv3,
name='conv4',
half=192,
filters=192,
kernel_size=3,
padding='same')
conv5 = grouped_conv(input_val=conv4,
name='conv5',
half=192,
filters=128,
kernel_size=3,
padding='same')
pool5 = MaxPooling2D(pool_size=3,
strides=2)(conv5)
flatten = Flatten()(pool5)
fc6 = Dense(4096, activation='relu', name='fc6')(flatten)
fc7 = Dense(4096, activation='relu', name='fc7')(fc6)
fc8 = Dense(1000, activation='softmax', name='fc8')(fc7)
model = Model(inputs=x, outputs=fc8)
return model
```
### 2) Evaluating
```python
imagenet_mean = np.array([104., 117., 124.], dtype=np.float32)
fig2 = plt.figure(figsize=(30,10))
for i, image in enumerate(imgs):
img = cv2.resize(image.astype(np.float32), (227,227))
img -= imagenet_mean
img = img.reshape((1,227,227,3))
probs = model.predict(img)
class_name = class_names[np.argmax(probs)]
fig2.add_subplot(1,4,i+1)
plt.imshow(cv2.cvtColor(image, cv2.COLOR_BGR2RGB))
plt.title("Class: " + class_name + ", probability: %.4f" %probs[0,np.argmax(probs)], fontsize=13)
plt.axis('off')
plt.text(0, 240, 'Top-5 Accuracy:')
x, y = 10, 260
for idx in np.argsort(probs)[0][-5::][::-1]:
plt.text(x, y, s ='- {}, probability: {:.4f}'.format(class_names[idx], probs[0, idx]), fontsize=12)
y += 20
print()
```
![evaluating-image]
[paper]: https://papers.nips.cc/paper/4824-imagenet-classification-with-deep-convolutional-neural-networks.pdf
[repository]: https://github.com/3outeille/Research-Paper-Summary/tree/master/src/architecture/alexnet/tensorflow_2
[website]: http://www.cs.toronto.edu/~guerzhoy/tf_alexnet/
[legend]: https://raw.githubusercontent.com/valoxe/image-storage-1/master/research-paper-summary/alexnet/1.png?token=AMAXSKITCLT6SFLZYGG2Y5K6WMET6
[alexnet-model]: https://raw.githubusercontent.com/valoxe/image-storage-1/master/research-paper-summary/alexnet/2.png?token=AMAXSKM4VWEFT6JHLEZQ5GS6WMFC4
[evaluating-image]: https://raw.githubusercontent.com/valoxe/image-storage-1/master/research-paper-summary/alexnet/3.png?token=AMAXSKKDIH5N43DQIKYFDHS6WMFOE