Yes you should understand backprop

Some eye candy: a computational graph of a Batch Norm layer with a forward pass (black) and backward pass (red). (borrowed from this post)

Vanishing gradients on sigmoids

z = 1/(1 + np.exp(-np.dot(W, x))) # forward pass
dx = np.dot(W.T, z*(1-z)) # backward pass: local gradient for x
dW = np.outer(z*(1-z), x) # backward pass: local gradient for W

Dying ReLUs

z = np.maximum(0, np.dot(W, x)) # forward pass
dW = np.outer(z > 0, x) # backward pass: local gradient for W

Exploding gradients in RNNs

Spotted in the Wild: DQN Clipping

def clipped_error(x): 
return tf.select(tf.abs(x) < 1.0,
0.5 * tf.square(x),
tf.abs(x) - 0.5) # condition, true, false

In conclusion

--

--

--

I like to train deep neural nets on large datasets.

Love podcasts or audiobooks? Learn on the go with our new app.

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store
Andrej Karpathy

Andrej Karpathy

I like to train deep neural nets on large datasets.

More from Medium

[ACM TELO 2021 / NeurIPS 2020 Works] Reusability and Transferability of Macro Actions for…

What solving the protein folding problem actually means

Dissecting Transformers — Part1

Fundamental limitations/drawbacks of deep learning models everyone should know