Part II: Convolutional Neural Networks
The revolution in visual understanding
Overview
Part II covers the deep learning revolution in computer vision. Starting with the landmark AlexNet paper (co-authored by Ilya Sutskever himself), we trace the evolution of CNN architectures through ResNet and beyond.
Chapters
| # | Chapter | Key Concept |
|---|---|---|
| 6 | AlexNet - The ImageNet Breakthrough | Deep learning works at scale |
| 7 | CS231n - CNNs for Visual Recognition | Comprehensive CNN foundations |
| 8 | Deep Residual Learning (ResNet) | Skip connections enable depth |
| 9 | Identity Mappings in ResNets | Optimal residual unit design |
| 10 | Dilated Convolutions | Multi-scale context aggregation |
The Evolution
AlexNet (2012) → 8 layers, ReLU, Dropout
↓
VGG (2014) → Deeper (19 layers), smaller filters
↓
ResNet (2015) → 152 layers via skip connections
↓
Modern CNNs → Efficient architectures, attention
Key Takeaway
Depth matters, but only with the right architectural innovations. Skip connections, proper normalization, and careful design allow networks to learn hierarchical visual features automatically.
Prerequisites
- Part I foundations (helpful but not required)
- Basic understanding of linear algebra (matrix operations)
- Familiarity with gradient descent
What You’ll Be Able To Do After Part II
- Understand how CNNs learn visual features
- Implement and train image classifiers
- Explain why deeper networks work (with ResNet)
- Choose appropriate CNN architectures for tasks
Table of contents
- Chapter 6 - AlexNet - The ImageNet Breakthrough
- Chapter 7 - CS231n - Convolutional Neural Networks for Visual Recognition
- Chapter 8 - Deep Residual Learning for Image Recognition
- Chapter 9 - Identity Mappings in Deep Residual Networks
- Chapter 10 - Multi-Scale Context Aggregation by Dilated Convolutions