Part II: Convolutional Neural Networks

The revolution in visual understanding


Overview

Part II covers the deep learning revolution in computer vision. Starting with the landmark AlexNet paper (co-authored by Ilya Sutskever himself), we trace the evolution of CNN architectures through ResNet and beyond.

Chapters

# Chapter Key Concept
6 AlexNet - The ImageNet Breakthrough Deep learning works at scale
7 CS231n - CNNs for Visual Recognition Comprehensive CNN foundations
8 Deep Residual Learning (ResNet) Skip connections enable depth
9 Identity Mappings in ResNets Optimal residual unit design
10 Dilated Convolutions Multi-scale context aggregation

The Evolution

AlexNet (2012)     →  8 layers, ReLU, Dropout
       ↓
VGG (2014)         →  Deeper (19 layers), smaller filters
       ↓
ResNet (2015)      →  152 layers via skip connections
       ↓
Modern CNNs        →  Efficient architectures, attention

Key Takeaway

Depth matters, but only with the right architectural innovations. Skip connections, proper normalization, and careful design allow networks to learn hierarchical visual features automatically.

Prerequisites

  • Part I foundations (helpful but not required)
  • Basic understanding of linear algebra (matrix operations)
  • Familiarity with gradient descent

What You’ll Be Able To Do After Part II

  • Understand how CNNs learn visual features
  • Implement and train image classifiers
  • Explain why deeper networks work (with ResNet)
  • Choose appropriate CNN architectures for tasks

Table of contents


Back to top

Educational content based on public research papers. All original papers are cited with links to their sources.