codeharborhub
diff --git a/‎docs/machine-learning/deep-learning/neural-network-basics/activation-functions.mdx‎
Lines changed: 84 additions & 0 deletions b/‎docs/machine-learning/deep-learning/neural-network-basics/activation-functions.mdx‎
Lines changed: 84 additions & 0 deletions
diff --git a/‎docs/machine-learning/deep-learning/neural-network-basics/backpropagation.mdx‎
Lines changed: 116 additions & 0 deletions b/‎docs/machine-learning/deep-learning/neural-network-basics/backpropagation.mdx‎
Lines changed: 116 additions & 0 deletions
diff --git a/‎docs/machine-learning/deep-learning/neural-network-basics/forward-propagation.mdx‎
Lines changed: 141 additions & 0 deletions b/‎docs/machine-learning/deep-learning/neural-network-basics/forward-propagation.mdx‎
Lines changed: 141 additions & 0 deletions
@@ -0,0 +1,84 @@
+---
+title: Activation Functions
+sidebar_label: Activation Functions
+description: "Why we need non-linearity and a deep dive into Sigmoid, Tanh, ReLU, and Softmax."
+tags: [deep-learning, neural-networks, activation-functions, relu, sigmoid]
+---
+
+An **Activation Function** is a mathematical formula applied to the output of a neuron. Its primary job is to introduce **non-linearity** into the network. Without them, no matter how many layers you add, your neural network would behave like a simple linear regression model.
+
+## 1. Why do we need Non-Linearity?
+
+Real-world data is rarely a straight line. If we only used linear transformations ($z = wx + b$), the composition of multiple layers would just be another linear transformation. 
+
+Non-linear activation functions allow the network to "bend" the decision boundary to fit complex patterns like images, sound, and human language.
+
+## 2. Common Activation Functions
+
+### A. Sigmoid
+The Sigmoid function squashes any input value into a range between **0 and 1**. 
+* **Formula:** $\sigma(z) = \frac{1}{1 + e^{-z}}$
+* **Best For:** The output layer of binary classification models.
+* **Downside:** It suffers from the **Vanishing Gradient** problem; for very high or low inputs, the gradient is almost zero, which kills learning.
+
+### B. ReLU (Rectified Linear Unit)
+
+ReLU is the default choice for hidden layers in modern deep learning.
+* **Formula:** $f(z) = \max(0, z)$
+* **Pros:** It is computationally very efficient and helps prevent vanishing gradients.
+* **Cons:** "Dying ReLU" — if a neuron's input is always negative, it stays at 0 and never updates its weights again.
+
+### C. Tanh (Hyperbolic Tangent)
+
+Similar to Sigmoid, but it squashes values between **-1 and 1**.
+* **Formula:** $\tanh(z) = \frac{e^z - e^{-z}}{e^z + e^{-z}}$
+* **Pros:** It is "zero-centered," meaning the average output is closer to 0, which often makes training faster than Sigmoid.
+
+
+
+## 3. Comparison Table
+
+| Function | Range | Common Use Case | Main Issue |
+| :--- | :--- | :--- | :--- |
+| **Sigmoid** | (0, 1) | Binary Classification Output | Vanishing Gradient |
+| **Tanh** | (-1, 1) | Hidden Layers (legacy) | Vanishing Gradient |
+| **ReLU** | [0, $\infty$) | Hidden Layers (Standard) | Dying Neurons |
+| **Softmax** | (0, 1) | Multi-class Output | Only used in Output layer |
+
+## 4. The Softmax Function (Multi-class)
+
+When you have more than two categories (e.g., classifying an image as a Cat, Dog, or Bird), we use **Softmax** in the final layer. It turns the raw outputs (logits) into a probability distribution that sums up to **1.0**.
+
+$$
+\sigma(\mathbf{z})_i = \frac{e^{z_i}}{\sum_{j=1}^K e^{z_j}}
+$$
+
+Where:
+
+* $\mathbf{z}$ = vector of raw class scores (logits)
+* $K$ = total number of classes
+* $\sigma(\mathbf{z})_i$ = probability of class $i$
+
+## 5. Implementation with Keras
+
+```python
+from tensorflow.keras.layers import Dense
+
+# Using ReLU for hidden layers and Sigmoid for output
+model.add(Dense(64, activation='relu'))
+model.add(Dense(1, activation='sigmoid'))
+
+# Alternatively, using Softmax for multi-class (3 classes)
+model.add(Dense(3, activation='softmax'))
+
+```
+
+---
+
+## References
+
+* **CS231n:** [Linear Classifiers and Activations](https://cs231n.github.io/neural-networks-1/)
+
+---
+
+**Now that you know how neurons fire, how do we measure how "wrong" their firing pattern is compared to the ground truth?**
@@ -0,0 +1,116 @@
+---
+title: "Backpropagation: How Networks Learn"
+sidebar_label: Backpropagation
+description: "Demystifying the heart of neural network training: The Chain Rule, Gradients, and Error Attribution."
+tags: [deep-learning, neural-networks, backpropagation, calculus, gradient-descent]
+---
+
+**Backpropagation** (short for "backward propagation of errors") is the central algorithm that allows neural networks to learn. If [Forward Propagation](./forward-propagation) is how the network makes a guess, Backpropagation is how it realizes how wrong it was and adjusts its internal weights to do better next time.
+
+## 1. The High-Level Concept
+
+Imagine you are a manager of a large factory (the network). At the end of the day, the final product is defective (a high **Loss**). To fix the problem, you don't just blame the person at the exit door; you trace the mistake backward through every department to find out who contributed most to the error and tell them to adjust their process.
+
+## 2. The Four Steps of Training
+
+Backpropagation is the third step in the general training loop:
+
+1.  **Forward Pass:** Calculate the prediction ($y_{pred}$).
+2.  **Loss Calculation:** Calculate the error using a [Loss Function](./loss-functions) (e.g., $L = (y_{actual} - y_{pred})^2$).
+3.  **Backward Pass (Backpropagation):** Calculate the **Gradient** of the loss with respect to every weight and bias in the network.
+4.  **Weight Update:** Adjust the weights slightly in the opposite direction of the gradient.
+
+## 3. The Secret Sauce: The Chain Rule
+
+Mathematically, we want to find out how much the Loss ($L$) changes when we change a specific weight ($w$). This is the derivative $\frac{\partial L}{\partial w}$.
+
+Because the weight is buried deep inside the network, we use the **Chain Rule** from calculus to "unpeel" the layers:
+
+$$
+\frac{\partial L}{\partial w} = \frac{\partial L}{\partial \text{out}} \cdot \frac{\partial \text{out}}{\partial \text{net}} \cdot \frac{\partial \text{net}}{\partial w}
+$$
+
+Where:
+
+- $\text{out}$ = output of the neuron
+- $\text{net}$ = weighted sum input to the neuron
+
+By applying the chain rule repeatedly, we can propagate the error gradient backward through the network.
+
+This allows us to calculate the error contribution of a neuron in the 10th layer, and then use that result to calculate the error of a neuron in the 9th layer, and so on, all the way back to the input.
+
+## 4. Visualizing the Gradient Flow
+
+Information flows backward through the same paths it took during the forward pass.
+
+```mermaid
+graph RL
+    %% Output Layer
+    Y["$$\hat{y}$$"] -->|"$$\frac{\partial L}{\partial z^{[2]}}$$"| H1
+    Y -->|"$$\frac{\partial L}{\partial z^{[2]}}$$"| H2
+
+    %% Hidden Layer
+    H1["$$a_1^{[1]}$$"] -->|"$$\frac{\partial L}{\partial z_1^{[1]}}$$"| X1
+    H1 -->|"$$\frac{\partial L}{\partial z_1^{[1]}}$$"| X2
+    H1 -->|"$$\frac{\partial L}{\partial z_1^{[1]}}$$"| X3
+
+    H2["$$a_2^{[1]}$$"] -->|"$$\frac{\partial L}{\partial z_2^{[1]}}$$"| X1
+    H2 -->|"$$\frac{\partial L}{\partial z_2^{[1]}}$$"| X2
+    H2 -->|"$$\frac{\partial L}{\partial z_2^{[1]}}$$"| X3
+
+    %% Input Layer
+    X1["$$x_1$$"]
+    X2["$$x_2$$"]
+    X3["$$x_3$$"]
+
+    %% Loss
+    L["$$L(\hat{y}, y)$$"] --> Y
+
+```
+
+In this diagram, the arrows represent the flow of gradients backward through the network. Each neuron receives gradients from the neurons it feeds into, allowing it to compute how much it contributed to the final loss.
+
+**Quick overview of the steps during backpropagation:**
+
+1. Start at the output layer and compute the gradient of the loss with respect to the output.
+2. Use the chain rule to propagate this gradient backward through each layer.
+3. At each neuron, compute the gradient with respect to its weights and biases.
+
+## 5. The Vanishing Gradient Problem
+
+In very deep networks, as we multiply many small derivatives together using the chain rule, the gradient can become extremely small by the time it reaches the first layers.
+
+* **Result:** The early layers stop learning because their weights are barely changing.
+* **The Solution:** This is why we use activation functions like **ReLU** instead of Sigmoid, as ReLU doesn't "squash" gradients as severely.
+
+## 6. Simple Implementation Logic
+
+In modern libraries like PyTorch or TensorFlow, you don't have to write the calculus yourself—they use **Autograd** (Automatic Differentiation).
+
+```python
+# A conceptual example using PyTorch logic
+import torch
+
+# 1. Initialize weights with 'requires_grad'
+w = torch.tensor([2.0], requires_grad=True)
+x = torch.tensor([5.0])
+y_actual = torch.tensor([12.0])
+
+# 2. Forward Pass
+y_pred = w * x
+
+# 3. Calculate Loss
+loss = (y_actual - y_pred)**2
+
+# 4. BACKPROPAGATION (The Magic Step)
+loss.backward()
+
+# 5. Check the Gradient
+print(f"Gradient of loss w.r.t w: {w.grad}") 
+# This tells us how to change 'w' to reduce 'loss'
+
+```
+
+---
+
+**Now that we have the "Gradients" (the direction of change), how do we actually move the weights to reach the minimum error?**
@@ -0,0 +1,141 @@
+---
+title: Forward Propagation
+sidebar_label: Forward Propagation
+description: "Understanding how data flows from the input layer to the output layer to generate a prediction."
+tags: [deep-learning, neural-networks, forward-propagation, math]
+---
+
+**Forward Propagation** is the process by which a neural network transforms input data into an output prediction. It is the "inference" stage where data flows through the network layers, undergoing linear transformations and non-linear activations until it reaches the final layer.
+
+## 1. The Step-by-Step Flow
+
+In a dense (fully connected) network, the signal moves from left to right. For every neuron in a hidden or output layer, two distinct steps occur:
+
+### Step A: The Linear Transformation (Z)
+The neuron takes all inputs from the previous layer, multiplies them by their respective weights, and adds a bias term. This is essentially a multi-dimensional linear equation.
+
+$$
+z = \sum_{i=1}^{n} (w_i \cdot x_i) + b
+$$
+
+Where:
+
+- $x_i$ = input features from the previous layer
+- $w_i$ = weights associated with each input
+- $b$ = bias term
+
+### Step B: The Non-Linear Activation (A)
+The result $z$ is passed through an **Activation Function** (like ReLU or Sigmoid). This step is crucial because it allows the network to learn complex, non-linear patterns.
+
+$$
+a = \sigma(z)
+$$
+
+## 2. Forward Propagation in Matrix Form
+
+In practice, we don't calculate one neuron at a time. We use **Linear Algebra** to calculate entire layers simultaneously. This is why GPUs (which are great at matrix math) are so important for Deep Learning.
+
+If $W^{[1]}$ is the weight matrix for the first layer and $X$ is our input vector:
+
+$$
+Z^{[1]} = W^{[1]} \cdot X + b^{[1]}
+$$
+
+Then, we apply the activation function:
+
+$$
+A^{[1]} = \sigma(Z^{[1]})
+$$
+
+This output $A^{[1]}$ then becomes the "input" for the next layer.
+
+## 3. A Visual Example
+
+Imagine a simple network with 1 Hidden Layer:
+
+```mermaid
+graph LR
+    %% Input Layer
+    X1["$$x_1$$"] -->|"$$w_{11}^{[1]}$$"| H1
+    X2["$$x_2$$"] -->|"$$w_{12}^{[1]}$$"| H1
+    X3["$$x_3$$"] -->|"$$w_{13}^{[1]}$$"| H1
+
+    X1 -->|"$$w_{21}^{[1]}$$"| H2
+    X2 -->|"$$w_{22}^{[1]}$$"| H2
+    X3 -->|"$$w_{23}^{[1]}$$"| H2
+
+    %% Hidden Layer
+    H1["$$z_1^{[1]} \\ a_1^{[1]} = \sigma(z_1^{[1]})$$"]
+    H2["$$z_2^{[1]} \\ a_2^{[1]} = \sigma(z_2^{[1]})$$"]
+
+    %% Output Layer
+    H1 -->|"$$w_1^{[2]}$$"| Y
+    H2 -->|"$$w_2^{[2]}$$"| Y
+
+    Y["$$z^{[2]} \\ \hat{y} = \sigma(z^{[2]})$$"]
+
+    %% Bias annotations
+    B1["$$b^{[1]}$$"] -.-> H1
+    B1 -.-> H2
+    B2["$$b^{[2]}$$"] -.-> Y
+
+```
+
+1. **Input:** Your features (e.g., pixel values of an image).
+2. **Hidden Layer:** Extracts abstract features (e.g., edges or shapes).
+3. **Output Layer:** Provides the final guess (e.g., "This is a dog with 92% probability").
+
+## 4. Why "Propagate"?
+
+The term "propagate" is used because the output of one layer is the input of the next. The information "spreads" through the network. Each layer acts as a filter, refining the raw data into more meaningful representations until a decision can be made at the end.
+
+## 5. Implementation in Pure Python (NumPy)
+
+This snippet demonstrates the math behind a single forward pass for a network with one hidden layer.
+
+```python
+import numpy as np
+
+def sigmoid(x):
+    return 1 / (1 + np.exp(-x))
+
+# 1. Inputs (3 features)
+X = np.array([0.5, 0.1, -0.2])
+
+# 2. Weights and Biases (Hidden Layer with 2 neurons)
+W1 = np.random.randn(2, 3) 
+b1 = np.random.randn(2)
+
+# 3. Weights and Biases (Output Layer with 1 neuron)
+W2 = np.random.randn(1, 2)
+b2 = np.random.randn(1)
+
+# --- FORWARD PASS ---
+
+# Layer 1 (Hidden)
+z1 = np.dot(W1, X) + b1
+a1 = sigmoid(z1)
+
+# Layer 2 (Output)
+z2 = np.dot(W2, a1) + b2
+prediction = sigmoid(z2)
+
+print(f"Model Prediction: {prediction}")
+
+```
+
+## 6. What happens next?
+
+Forward propagation gives us a prediction. However, at the start, the weights are random, so the prediction will be wrong. To make the model "learn," we must:
+
+1. Compare the prediction to the truth using a **Loss Function**.
+2. Send the error backward through the network using **Backpropagation**.
+
+## References
+
+* **DeepLearning.AI:** [Neural Networks and Deep Learning (Week 2)](https://www.coursera.org/learn/neural-networks-deep-learning)
+* **Khan Academy:** [Matrix Multiplication Foundations](https://www.khanacademy.org/math/precalculus/x9e81a4f98389efdf:matrices)
+
+---
+
+**We have the prediction. Now, how do we tell the network it made a mistake?** Head over to the [Backpropagation](./backpropagation.mdx) guide to learn how neural networks learn from their errors!