Create a step "use backprop to chart dependencies"

As in #72 
 **Use backprop to chart dependencies**. Your deep learning code will often contain complicated, vectorized, and broadcasted operations. A relatively common bug I’ve come across a few times is that people get this wrong (e.g. they use a view instead of transpose/permute somewhere) and inadvertently mix information across the batch dimension. It is a depressing fact that your network will typically still train okay because it will learn to ignore data from the other examples. One way to debug this (and other related problems) is to set the loss to be something trivial like the sum of all outputs of example i, run the backward pass all the way to the input and ensure that you get a non-zero gradient only on the i-th input. The same strategy can be used to e.g. ensure that your autoregressive model at time t only depends on 1..t-1. More generally, gradients give you information about what depends on what is in your network, which can be useful for debugging.

A step or a network modifier. To be discussed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Create a step "use backprop to chart dependencies" #196

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Create a step "use backprop to chart dependencies" #196

Description

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions