Login

***Leejohnston*** · 11-17-2025, 01:11 PM

Thread 5 — Training Large Models: Optimisers, Learning Rates & Loss Landscapes

The Hidden Mechanics Behind Model Training

Modern AI models aren’t just built — they’re grown, shaped through millions of tiny adjustments.
This thread explains the advanced machinery behind training deep models.

1. The Loss Landscape

A model’s performance is represented as a giant multidimensional surface.
Each point on the surface corresponds to a particular set of weights.

The goal of training:
find low valleys (good solutions) on this landscape.

This landscape is:
• huge
• chaotic
• full of ridges, basins, and flat regions

Understanding it is key to training powerful models.

2. Gradient Descent — The Core Idea

At each step:
• compute gradient (direction of steepest descent)
• move weights slightly downhill

Basic form:
SGD — Stochastic Gradient Descent

Simple but powerful.

3. Advanced Optimisers

Modern models use smarter algorithms:

• Adam — adaptive momentum
• AdamW — weight decay removed
• RMSProp — stabilises learning
• LAMB / Lion — used for extremely large models

Optimisers improve training speed and stability.

4. Learning Rate Scheduling

The learning rate controls the “step size” during training.

Too high → unstable
Too low → painfully slow

Schedulers include:
• warmup
• cosine decay
• exponential decay
• cyclical schedules

These dramatically improve performance.

5. Batch Size Effects

Small batches:
• noisy gradients
• good generalisation

Large batches:
• stable
• fast
• used for huge models

Knowing when to use which is a science.

6. Regularisation Techniques

Used to prevent overfitting:
• dropout
• weight decay
• label smoothing
• data augmentation

Essential for robust models.

7. Training Large Language Models

LLMs require:
• distributed training
• parallelism
• mixed precision (FP16/BF16)
• gradient checkpointing

And enormous compute power.

Final Thoughts

Behind every modern AI is a complex training system.
Understanding these tools gives insight into the engineering that powers today’s intelligent models.

Login
Username:
Password:	Lost Password?
	Remember me