Part VI: Scaling and Efficiency

Training neural networks at scale


Overview

Part VI covers the practical challenges of training neural networks at massive scale. These papers show how to handle large datasets, massive models, and efficient distributed training—essential knowledge for modern deep learning.

Chapters

# Chapter Key Concept
24 Deep Speech 2 End-to-end speech at scale
25 Scaling Laws for Neural Language Models How performance scales
26 GPipe - Pipeline Parallelism Training giant models

The Scaling Journey

Deep Speech 2 (2015)  →  End-to-end speech, multi-GPU
       ↓
Scaling Laws (2020)   →  Understanding how to scale
       ↓
GPipe (2018)          →  Pipeline parallelism
       ↓
Modern LLMs           →  GPT-3, GPT-4, and beyond

Key Takeaway

Scaling neural networks requires understanding compute, data, and model size trade-offs, along with efficient distributed training strategies to handle models that don’t fit on a single GPU.

Prerequisites

  • Parts I-V (helpful for understanding design principles)
  • Basic understanding of distributed systems
  • Interest in large-scale training

What You’ll Be Able To Do After Part VI

  • Understand scaling laws and resource allocation
  • Design distributed training strategies
  • Appreciate the engineering behind large models
  • See how theory meets practice at scale

Table of contents


Back to top

Educational content based on public research papers. All original papers are cited with links to their sources.