Thread Rating:
Training Large Models — Optimisers, Learning Rates & Loss Landscapes
#1
Thread 5 — Training Large Models: Optimisers, Learning Rates & Loss Landscapes

The Hidden Mechanics Behind Model Training

Modern AI models aren’t just built — they’re grown, shaped through millions of tiny adjustments. 
This thread explains the advanced machinery behind training deep models.



1. The Loss Landscape

A model’s performance is represented as a giant multidimensional surface. 
Each point on the surface corresponds to a particular set of weights.

The goal of training:
find low valleys (good solutions) on this landscape.

This landscape is:
• huge 
• chaotic 
• full of ridges, basins, and flat regions 

Understanding it is key to training powerful models.



2. Gradient Descent — The Core Idea

At each step:
• compute gradient (direction of steepest descent) 
• move weights slightly downhill 

Basic form:
SGD — Stochastic Gradient Descent

Simple but powerful.



3. Advanced Optimisers

Modern models use smarter algorithms:

• Adam — adaptive momentum 
• AdamW — weight decay removed 
• RMSProp — stabilises learning 
• LAMB / Lion — used for extremely large models 

Optimisers improve training speed and stability.



4. Learning Rate Scheduling

The learning rate controls the “step size” during training.

Too high → unstable 
Too low → painfully slow 

Schedulers include:
• warmup 
• cosine decay 
• exponential decay 
• cyclical schedules 

These dramatically improve performance.



5. Batch Size Effects

Small batches:
• noisy gradients 
• good generalisation 

Large batches:
• stable 
• fast 
• used for huge models 

Knowing when to use which is a science.



6. Regularisation Techniques

Used to prevent overfitting:
• dropout 
• weight decay 
• label smoothing 
• data augmentation 

Essential for robust models.



7. Training Large Language Models

LLMs require:
• distributed training 
• parallelism 
• mixed precision (FP16/BF16) 
• gradient checkpointing 

And enormous compute power.



Final Thoughts

Behind every modern AI is a complex training system. 
Understanding these tools gives insight into the engineering that powers today’s intelligent models.
Reply
« Next Oldest | Next Newest »


Forum Jump:


Users browsing this thread: 1 Guest(s)