Training Models - Sage Wisdom

Training Neural Networks

The following is solid advice from Andrej Karpathy on training neural networks and some specific tips to deal with it. Read his post here: A Recipe for Training Neural Networks (by @karpathy)

Understand the data. Become one with it. Be the ball.
Set up end to end pipeline, with some tips
- fix and random seed
- simplify
- add sig digits to eval
- verify loss at init
- init "well"
- human baseline
- input-indepent baseline - does random or no data perform worse as expected
- overfit one batch
- verify decreasing training loss - push buttons you would expect to break things and see if they do. tap on the gauges when they look right, to see if it's real.
- use backprop to chart dependencies - find bugs in arch by doing basic tests
- generalize a special case - don't bite off more than you can chew. start special case, then generalize, reasonably.
Overfit
Regularize
- moar data
- creative data
- dropout
- weight decay
- early stopping
- decrease batch size
- moar weights
Tune your hyperparameters
Squeeze out the juice w/ ensembles and lengthened training

Go read that post. And this one, and the rest of his site, probably.

Yes, You Should Understand Backprop (by @karpathy)