When people try to improve neural networks, they often ask:
Should I collect more data, design a new architecture, or tune hyperparameters?
A common rule of thumb says: Better Data/Features >>> Better Architecture >> Better Hyperparameters. Let’s unpack this from first principles.
1. What actually determines neural network performance?
At the core, any learning system depends on three levers:
- Data: what signal is available
- Model: how well it can represent that signal
- Training: how efficiently it learns that signal
These map to physics-like principles:
- Data → Information availability
- Architecture → Representational capacity
- Hyperparameters → Optimization dynamics
2. Why “Better Data” dominates
A network can only learn patterns present in data. If the dataset is small, noisy, or missing key features, no architecture can extract what isn’t there. More or cleaner data increases the signal-to-noise ratio, reducing overfitting and improving generalization — the most fundamental bottleneck.
Garbage in → Garbage out. More meaningful data in → Better learning out.
How would we know if data have high signal to noise ratio?
3. Why “Better Architecture” comes next
Once the data is rich enough, architecture decides how efficiently patterns are represented.
Convolutions exploit spatial structure, transformers exploit relationships — each introduces the right inductive bias for the task.
A poor architecture wastes data; a good one compresses it into the right abstractions.
4. Why “Better Hyperparameters” come last
Hyperparameters (learning rate, batch size, regularization, etc.) don’t add new information or representation power — they control how well the model explores its loss surface.
They can make training stable or unstable, fast or slow, but cannot overcome bad data or a wrong architecture.
5. The deeper pattern
Improvement flows from information → representation → optimization. Each stage builds on the previous one. If the information itself is weak, downstream cleverness can’t fix it.
So in practice:
Data defines the ceiling.
Architecture approaches it.
Hyperparameters fine-tune it.
Bottom line: The hierarchy isn’t an iron law — sometimes tuning beats redesigning — but as a guiding principle, always start with data, then architecture, then training knobs.
That order aligns with the natural flow of learning itself:
Discover → Represent → Refine.