Building AlexNet from Scratch with PyTorch
In the world of deep learning, AlexNet holds a special place as a groundbreaking convolutional neural network (CNN) that won the ImageNet Large Scale Visual Recognition Challenge (ILSVRC) in 2012. Developed by Alex Krizhevsky, Ilya Sutskever, and Geoffrey Hinton, it demonstrated the power of deep learning and GPUs for image classification. In this blog post, we’ll implement AlexNet from scratch using PyTorch, explore its architecture, and provide a working code example.
What is AlexNet?
AlexNet is a deep CNN designed to classify images into 1000 categories. It features five convolutional layers, three max-pooling layers, and three fully connected layers, with ReLU activations and dropout for regularization. Its innovative use of large kernels, overlapping pooling, and GPU acceleration made it a milestone in computer vision.
Let’s dive into the implementation!
Prerequisites
Before running the code, ensure you have the following installed:
- PyTorch: For building and running the model (pip install torch).
- torchsummary: For visualizing the model summary (pip install torchsummary).
- A Python environment (e.g., version 3.8+).
The AlexNet Architecture
AlexNet expects an input image of size 227x227 pixels with 3 color channels (RGB). Its structure can be broken into two main parts:
- Feature Extraction: A series of convolutional and pooling layers.
- Classification: Fully connected layers to produce class predictions.
Here’s the breakdown:
- Conv1: 96 filters (11x11), stride 4, padding 2 → ReLU → MaxPool (3x3, stride 2).
- Conv2: 256 filters (5x5), padding 2 → ReLU → MaxPool (3x3, stride 2).
- Conv3: 384 filters (3x3), padding 1 → ReLU.
- Conv4: 384 filters (3x3), padding 1 → ReLU.
- Conv5: 256 filters (3x3), padding 1 → ReLU → MaxPool (3x3, stride 2).
- FC Layers: 9216 → 4096 → 4096 → 1000, with dropout (p=0.5) and ReLU.
The output of the final convolutional layer is flattened to 256 * 6 * 6 = 9216 features (for a 227x227 input), feeding into the classifier.
import torch
import torch.nn as nn
import torch.optim as optim
from torchsummary import summary # Assuming this is already installed
class AlexNet(nn.Module):
def __init__(self, num_classes=1000):
super(AlexNet, self).__init__()
self.features = nn.Sequential(
# Layer 1: Convolution + ReLU + MaxPool
nn.Conv2d(in_channels=3, out_channels=96, kernel_size=11, stride=4, padding=2),
nn.ReLU(inplace=True),
nn.MaxPool2d(kernel_size=3, stride=2),
# Layer 2: Convolution + ReLU + MaxPool
nn.Conv2d(96, 256, kernel_size=5, stride=1, padding=2),
nn.ReLU(inplace=True),
nn.MaxPool2d(kernel_size=3, stride=2),
# Layer 3: Convolution + ReLU
nn.Conv2d(256, 384, kernel_size=3, stride=1, padding=1),
nn.ReLU(inplace=True),
# Layer 4: Convolution + ReLU
nn.Conv2d(384, 384, kernel_size=3, padding=1),
nn.ReLU(inplace=True),
# Layer 5: Convolution + ReLU + MaxPool
nn.Conv2d(384, 256, kernel_size=3, padding=1),
nn.ReLU(inplace=True),
nn.MaxPool2d(kernel_size=3, stride=2),
)
self.classifier = nn.Sequential(
nn.Dropout(p=0.5), # Explicitly set dropout probability
nn.Linear(256 * 6 * 6, 4096), # 9216 input features
nn.ReLU(inplace=True),
nn.Dropout(p=0.5),
nn.Linear(4096, 4096),
nn.ReLU(inplace=True),
nn.Linear(4096, num_classes),
)
def forward(self, x):
x = self.features(x)
x = x.view(x.size(0), 256 * 6 * 6) # Explicitly flatten to 9216
x = self.classifier(x)
return x
# Instantiate the model
alexnet_model = AlexNet(num_classes=1000)
# Move model to GPU if available
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
alexnet_model = alexnet_model.to(device)
# Print model structure
print(alexnet_model)
# Print summary (assuming input size of 3x227x227)
summary(alexnet_model, (3, 227, 227))
Output screenshot

Alexnet

AlexNet
Comments
Post a Comment