|
| 1 | +--- |
| 2 | +title: Forward Propagation |
| 3 | +sidebar_label: Forward Propagation |
| 4 | +description: "Understanding how data flows from the input layer to the output layer to generate a prediction." |
| 5 | +tags: [deep-learning, neural-networks, forward-propagation, math] |
| 6 | +--- |
| 7 | + |
| 8 | +**Forward Propagation** is the process by which a neural network transforms input data into an output prediction. It is the "inference" stage where data flows through the network layers, undergoing linear transformations and non-linear activations until it reaches the final layer. |
| 9 | + |
| 10 | +## 1. The Step-by-Step Flow |
| 11 | + |
| 12 | +In a dense (fully connected) network, the signal moves from left to right. For every neuron in a hidden or output layer, two distinct steps occur: |
| 13 | + |
| 14 | +### Step A: The Linear Transformation (Z) |
| 15 | +The neuron takes all inputs from the previous layer, multiplies them by their respective weights, and adds a bias term. This is essentially a multi-dimensional linear equation. |
| 16 | + |
| 17 | +$$ |
| 18 | +z = \sum_{i=1}^{n} (w_i \cdot x_i) + b |
| 19 | +$$ |
| 20 | + |
| 21 | +Where: |
| 22 | + |
| 23 | +- $x_i$ = input features from the previous layer |
| 24 | +- $w_i$ = weights associated with each input |
| 25 | +- $b$ = bias term |
| 26 | + |
| 27 | +### Step B: The Non-Linear Activation (A) |
| 28 | +The result $z$ is passed through an **Activation Function** (like ReLU or Sigmoid). This step is crucial because it allows the network to learn complex, non-linear patterns. |
| 29 | + |
| 30 | +$$ |
| 31 | +a = \sigma(z) |
| 32 | +$$ |
| 33 | + |
| 34 | +## 2. Forward Propagation in Matrix Form |
| 35 | + |
| 36 | +In practice, we don't calculate one neuron at a time. We use **Linear Algebra** to calculate entire layers simultaneously. This is why GPUs (which are great at matrix math) are so important for Deep Learning. |
| 37 | + |
| 38 | +If $W^{[1]}$ is the weight matrix for the first layer and $X$ is our input vector: |
| 39 | + |
| 40 | +$$ |
| 41 | +Z^{[1]} = W^{[1]} \cdot X + b^{[1]} |
| 42 | +$$ |
| 43 | + |
| 44 | +Then, we apply the activation function: |
| 45 | + |
| 46 | +$$ |
| 47 | +A^{[1]} = \sigma(Z^{[1]}) |
| 48 | +$$ |
| 49 | + |
| 50 | +This output $A^{[1]}$ then becomes the "input" for the next layer. |
| 51 | + |
| 52 | +## 3. A Visual Example |
| 53 | + |
| 54 | +Imagine a simple network with 1 Hidden Layer: |
| 55 | + |
| 56 | +```mermaid |
| 57 | +graph LR |
| 58 | + %% Input Layer |
| 59 | + X1["$$x_1$$"] -->|"$$w_{11}^{[1]}$$"| H1 |
| 60 | + X2["$$x_2$$"] -->|"$$w_{12}^{[1]}$$"| H1 |
| 61 | + X3["$$x_3$$"] -->|"$$w_{13}^{[1]}$$"| H1 |
| 62 | +
|
| 63 | + X1 -->|"$$w_{21}^{[1]}$$"| H2 |
| 64 | + X2 -->|"$$w_{22}^{[1]}$$"| H2 |
| 65 | + X3 -->|"$$w_{23}^{[1]}$$"| H2 |
| 66 | +
|
| 67 | + %% Hidden Layer |
| 68 | + H1["$$z_1^{[1]} \\ a_1^{[1]} = \sigma(z_1^{[1]})$$"] |
| 69 | + H2["$$z_2^{[1]} \\ a_2^{[1]} = \sigma(z_2^{[1]})$$"] |
| 70 | +
|
| 71 | + %% Output Layer |
| 72 | + H1 -->|"$$w_1^{[2]}$$"| Y |
| 73 | + H2 -->|"$$w_2^{[2]}$$"| Y |
| 74 | +
|
| 75 | + Y["$$z^{[2]} \\ \hat{y} = \sigma(z^{[2]})$$"] |
| 76 | +
|
| 77 | + %% Bias annotations |
| 78 | + B1["$$b^{[1]}$$"] -.-> H1 |
| 79 | + B1 -.-> H2 |
| 80 | + B2["$$b^{[2]}$$"] -.-> Y |
| 81 | +
|
| 82 | +``` |
| 83 | + |
| 84 | +1. **Input:** Your features (e.g., pixel values of an image). |
| 85 | +2. **Hidden Layer:** Extracts abstract features (e.g., edges or shapes). |
| 86 | +3. **Output Layer:** Provides the final guess (e.g., "This is a dog with 92% probability"). |
| 87 | + |
| 88 | +## 4. Why "Propagate"? |
| 89 | + |
| 90 | +The term "propagate" is used because the output of one layer is the input of the next. The information "spreads" through the network. Each layer acts as a filter, refining the raw data into more meaningful representations until a decision can be made at the end. |
| 91 | + |
| 92 | +## 5. Implementation in Pure Python (NumPy) |
| 93 | + |
| 94 | +This snippet demonstrates the math behind a single forward pass for a network with one hidden layer. |
| 95 | + |
| 96 | +```python |
| 97 | +import numpy as np |
| 98 | + |
| 99 | +def sigmoid(x): |
| 100 | + return 1 / (1 + np.exp(-x)) |
| 101 | + |
| 102 | +# 1. Inputs (3 features) |
| 103 | +X = np.array([0.5, 0.1, -0.2]) |
| 104 | + |
| 105 | +# 2. Weights and Biases (Hidden Layer with 2 neurons) |
| 106 | +W1 = np.random.randn(2, 3) |
| 107 | +b1 = np.random.randn(2) |
| 108 | + |
| 109 | +# 3. Weights and Biases (Output Layer with 1 neuron) |
| 110 | +W2 = np.random.randn(1, 2) |
| 111 | +b2 = np.random.randn(1) |
| 112 | + |
| 113 | +# --- FORWARD PASS --- |
| 114 | + |
| 115 | +# Layer 1 (Hidden) |
| 116 | +z1 = np.dot(W1, X) + b1 |
| 117 | +a1 = sigmoid(z1) |
| 118 | + |
| 119 | +# Layer 2 (Output) |
| 120 | +z2 = np.dot(W2, a1) + b2 |
| 121 | +prediction = sigmoid(z2) |
| 122 | + |
| 123 | +print(f"Model Prediction: {prediction}") |
| 124 | + |
| 125 | +``` |
| 126 | + |
| 127 | +## 6. What happens next? |
| 128 | + |
| 129 | +Forward propagation gives us a prediction. However, at the start, the weights are random, so the prediction will be wrong. To make the model "learn," we must: |
| 130 | + |
| 131 | +1. Compare the prediction to the truth using a **Loss Function**. |
| 132 | +2. Send the error backward through the network using **Backpropagation**. |
| 133 | + |
| 134 | +## References |
| 135 | + |
| 136 | +* **DeepLearning.AI:** [Neural Networks and Deep Learning (Week 2)](https://www.coursera.org/learn/neural-networks-deep-learning) |
| 137 | +* **Khan Academy:** [Matrix Multiplication Foundations](https://www.khanacademy.org/math/precalculus/x9e81a4f98389efdf:matrices) |
| 138 | + |
| 139 | +--- |
| 140 | + |
| 141 | +**We have the prediction. Now, how do we tell the network it made a mistake?** Head over to the [Backpropagation](./backpropagation.mdx) guide to learn how neural networks learn from their errors! |
0 commit comments