Skip to main content

Building AlexNet from Scratch with PyTorch: A Step-by-Step Guide

Building AlexNet from Scratch with PyTorch

In the world of deep learning, AlexNet holds a special place as a groundbreaking convolutional neural network (CNN) that won the ImageNet Large Scale Visual Recognition Challenge (ILSVRC) in 2012. Developed by Alex Krizhevsky, Ilya Sutskever, and Geoffrey Hinton, it demonstrated the power of deep learning and GPUs for image classification. In this blog post, we’ll implement AlexNet from scratch using PyTorch, explore its architecture, and provide a working code example.

What is AlexNet?

AlexNet is a deep CNN designed to classify images into 1000 categories. It features five convolutional layers, three max-pooling layers, and three fully connected layers, with ReLU activations and dropout for regularization. Its innovative use of large kernels, overlapping pooling, and GPU acceleration made it a milestone in computer vision.

Let’s dive into the implementation!

Prerequisites

Before running the code, ensure you have the following installed:

  • PyTorch: For building and running the model (pip install torch).
  • torchsummary: For visualizing the model summary (pip install torchsummary).
  • A Python environment (e.g., version 3.8+).

The AlexNet Architecture

AlexNet expects an input image of size 227x227 pixels with 3 color channels (RGB). Its structure can be broken into two main parts:

  1. Feature Extraction: A series of convolutional and pooling layers.
  2. Classification: Fully connected layers to produce class predictions.

Here’s the breakdown:

  • Conv1: 96 filters (11x11), stride 4, padding 2 → ReLU → MaxPool (3x3, stride 2).
  • Conv2: 256 filters (5x5), padding 2 → ReLU → MaxPool (3x3, stride 2).
  • Conv3: 384 filters (3x3), padding 1 → ReLU.
  • Conv4: 384 filters (3x3), padding 1 → ReLU.
  • Conv5: 256 filters (3x3), padding 1 → ReLU → MaxPool (3x3, stride 2).
  • FC Layers: 9216 → 4096 → 4096 → 1000, with dropout (p=0.5) and ReLU.

The output of the final convolutional layer is flattened to 256 * 6 * 6 = 9216 features (for a 227x227 input), feeding into the classifier.


import torch
import torch.nn as nn
import torch.optim as optim
from torchsummary import summary  # Assuming this is already installed

class AlexNet(nn.Module):
    def __init__(self, num_classes=1000):
        super(AlexNet, self).__init__()

        self.features = nn.Sequential(
            # Layer 1: Convolution + ReLU + MaxPool
            nn.Conv2d(in_channels=3, out_channels=96, kernel_size=11, stride=4, padding=2),
            nn.ReLU(inplace=True),
            nn.MaxPool2d(kernel_size=3, stride=2),

            # Layer 2: Convolution + ReLU + MaxPool
            nn.Conv2d(96, 256, kernel_size=5, stride=1, padding=2),
            nn.ReLU(inplace=True),
            nn.MaxPool2d(kernel_size=3, stride=2),

            # Layer 3: Convolution + ReLU
            nn.Conv2d(256, 384, kernel_size=3, stride=1, padding=1),
            nn.ReLU(inplace=True),

            # Layer 4: Convolution + ReLU
            nn.Conv2d(384, 384, kernel_size=3, padding=1),
            nn.ReLU(inplace=True),

            # Layer 5: Convolution + ReLU + MaxPool
            nn.Conv2d(384, 256, kernel_size=3, padding=1),
            nn.ReLU(inplace=True),
            nn.MaxPool2d(kernel_size=3, stride=2),
        )

        self.classifier = nn.Sequential(
            nn.Dropout(p=0.5),  # Explicitly set dropout probability
            nn.Linear(256 * 6 * 6, 4096),  # 9216 input features
            nn.ReLU(inplace=True),
            nn.Dropout(p=0.5),
            nn.Linear(4096, 4096),
            nn.ReLU(inplace=True),
            nn.Linear(4096, num_classes),
        )

    def forward(self, x):
        x = self.features(x)
        x = x.view(x.size(0), 256 * 6 * 6)  # Explicitly flatten to 9216
        x = self.classifier(x)
        return x

# Instantiate the model
alexnet_model = AlexNet(num_classes=1000)

# Move model to GPU if available
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
alexnet_model = alexnet_model.to(device)

# Print model structure
print(alexnet_model)

# Print summary (assuming input size of 3x227x227)
summary(alexnet_model, (3, 227, 227))

Output screenshot
AlexNet
Alexnet


AlexNet
AlexNet




Conclusion

Implementing AlexNet in PyTorch is a great way to understand CNNs and deep learning fundamentals. This code provides a foundation you can extend—try training it on a dataset like MNIST or CIFAR-10 or ImageNet. 

This blog post is beginner-friendly yet detailed enough for intermediate learners. It includes the corrected source code and explains its purpose, making it a useful resource for anyone exploring deep learning with PyTorch. Let me know if you’d like adjustments!

Comments

Popular posts from this blog

Installing ns3 in Ubuntu 22.04 | Complete Instructions

In this post, we are going to see how to install ns-3.36.1 in Ubuntu 22.04. You can follow the video for complete details Tools used in this simulation: NS3 version ns-3.36.1  OS Used: Ubuntu 22.04 LTS Installation of NS3 (ns-3.36.1) There are some changes in the ns3 installation procedure and the dependencies. So open a terminal and issue the following commands Step 1:  Prerequisites $ sudo apt update In the following packages, all the required dependencies are taken care and you can install all these packages for the complete use of ns3. $ sudo apt install g++ python3 python3-dev pkg-config sqlite3 cmake python3-setuptools git qtbase5-dev qtchooser qt5-qmake qtbase5-dev-tools gir1.2-goocanvas-2.0 python3-gi python3-gi-cairo python3-pygraphviz gir1.2-gtk-3.0 ipython3 openmpi-bin openmpi-common openmpi-doc libopenmpi-dev autoconf cvs bzr unrar gsl-bin libgsl-dev libgslcblas0 wireshark tcpdump sqlite sqlite3 libsqlite3-dev  libxml2 libxml2-dev libc6-dev libc6-dev-i386 libc...

Installation of NS2 in Ubuntu 22.04 | NS2 Tutorial 2

NS-2.35 installation in Ubuntu 22.04 This post shows how to install ns-2.35 in Ubuntu 22.04 Operating System Since ns-2.35 is too old, it needs the following packages gcc-4.8 g++-4.8 gawk and some more libraries Follow the video for more instructions So, here are the steps to install this software: To download and extract the ns2 software Download the software from the following link http://sourceforge.net/projects/nsnam/files/allinone/ns-allinone-2.35/ns-allinone-2.35.tar.gz/download Extract it to home folder and in my case its /home/pradeepkumar (I recommend to install it under your home folder) $ tar zxvf ns-allinone-2.35.tar.gz or Right click over the file and click extract here and select the home folder. $ sudo apt update $ sudo apt install build-essential autoconf automake libxmu-dev gawk To install gcc-4.8 and g++-4.8 $ sudo gedit /etc/apt/sources.list make an entry in the above file deb http://in.archive.ubuntu.com/ubuntu/ bionic main universe $ sudo apt update Since, it...

Simulation of URDF, Gazebo and Rviz | ROS Noetic Tutorial 8

Design a User-defined robot of your choice (or you can use the URDF file) and enable the LIDAR Scanner so that any obstacle placed on the path of the light scan will cut the light rays. Visualize the robot in the Gazebo workspace, and also show the demonstration in RViz.   (NB: Gain knowledge on wiring URDF file and .launch file for enabling any user-defined robot to get launched in the gazebo platform.) SLAM : One of the most popular applications of ROS is SLAM(Simultaneous Localization and Mapping). The objective of the SLAM in mobile robotics is to construct and update the map of an unexplored environment with the help of the available sensors attached to the robot which will be used for exploring. URDF: Unified Robotics Description Format, URDF, is an XML specification used in academia and industry to model multibody systems such as robotic manipulator arms for manufacturing assembly lines and animatronic robots for amusement parks. URDF is especially popular with users of the ...