Skip to content

How to build a simple neural network from scratch using Numpy and linear algebra without relying on high-level libraries like TensorFlow or Keras.

License

Notifications You must be signed in to change notification settings

Amir-Tav/primitive-NN

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

3 Commits
 
 
 
 
 
 
 
 

Repository files navigation

primitive Neural Network (NN From Scratch)

Welcome to the Neural Network From Scratch project! In this project, we built a simple, yet powerful neural network from the ground up, without relying on libraries like TensorFlow or Keras. Instead, we used Numpy and linear algebra to understand the raw mechanics behind neural networks. Let's dive in! 🤖💡

Project Overview

The goal of this project was to implement a basic neural network with 3 layers:

  1. Input Layer: 784 nodes corresponding to the pixels in the 28x28 MNIST images.
  2. Hidden Layer: 10 nodes, which helps in learning complex patterns.
  3. Output Layer: 10 units, one for each digit (0-9) that the model is classifying.

Why Build a Neural Network From Scratch?

Building a neural network from scratch is not only fun, but it also gives you a deeper understanding of the algorithms behind machine learning. Instead of relying on pre-built frameworks, we manually implement key components like forward propagation, backward propagation, and activation functions, including ReLU and Softmax.

The Dataset

For this project, we used the MNIST dataset, which consists of 28x28 grayscale images of handwritten digits. Our task was to build a model that can classify these digits based on the pixel values. This makes it a classification problem.


Key Concepts and Code Implementation ⚙️

Here’s a brief overview of the important parts of the code:

1. Data Preprocessing

Before training, the data is shuffled and split into training and development sets. The pixel values are also normalized to a range between 0 and 1.

data = np.array(data)
m, n = data.shape
np.random.shuffle(data)  # Shuffle before splitting

# Development set (1000 samples)
data_dev = data[0:1000].T
Y_dev = data_dev[0]
X_dev = data_dev[1:n]
X_dev = X_dev / 255.  # Normalize

# Training set (remaining samples)
data_train = data[1000:m].T
Y_train = data_train[0]
X_train = data_train[1:n]
X_train = X_train / 255.  # Normalize

2. Network Initialization

We initialize the weights and biases for the neural network using random values. This is where the magic starts!

def init_params():
    W1 = np.random.rand(10, 784) - 0.5  # Weights for layer 1
    b1 = np.random.rand(10, 1) - 0.5  # Bias for layer 1
    W2 = np.random.rand(10, 10) - 0.5  # Weights for layer 2
    b2 = np.random.rand(10, 1) - 0.5  # Bias for layer 2
    return W1, b1, W2, b2

3. Forward Propagation

We calculate activations at each layer to determine the output of the network. The ReLU activation function is applied to the hidden layer, and Softmax is used at the output layer to produce probabilities.

def forward_prop(W1, b1, W2, b2, X):
    Z1 = W1.dot(X) + b1  # Weighted sum for hidden layer
    A1 = ReLU(Z1)  # Apply ReLU activation
    Z2 = W2.dot(A1) + b2  # Weighted sum for output layer
    A2 = softmax(Z2)  # Apply Softmax to get probabilities
    return Z1, A1, Z2, A2

4. Backward Propagation

We compute the gradients for each weight and bias, helping the network adjust during training. This step allows the model to learn from its errors!

def backward_prop(Z1, A1, Z2, A2, W1, W2, X, Y):
    one_hot_Y = one_hot(Y)  # One-hot encode the labels
    dZ2 = A2 - one_hot_Y  # Error at output layer
    dW2 = 1 / m * dZ2.dot(A1.T)  # Gradients for W2
    db2 = 1 / m * np.sum(dZ2)  # Gradients for b2
    dZ1 = W2.T.dot(dZ2) * ReLU_deriv(Z1)  # Error at hidden layer
    dW1 = 1 / m * dZ1.dot(X.T)  # Gradients for W1
    db1 = 1 / m * np.sum(dZ1)  # Gradients for b1
    return dW1, db1, dW2, db2

5. Training the Network**

We use gradient descent to minimize the loss and optimize the network's weights and biases over multiple iterations.

def gradient_descent(X, Y, alpha, iterations):
    W1, b1, W2, b2 = init_params()
    for i in range(iterations):
        Z1, A1, Z2, A2 = forward_prop(W1, b1, W2, b2, X)
        dW1, db1, dW2, db2 = backward_prop(Z1, A1, Z2, A2, W1, W2, X, Y)
        W1, b1, W2, b2 = update_params(W1, b1, W2, b2, dW1, db1, dW2, db2, alpha)
        if i % 10 == 0:
            print("Iteration:", i)
            predictions = get_predictions(A2)
            print(get_accuracy(predictions, Y))
    return W1, b1, W2, b2

6. Results

After training the model for 500 epochs with a learning rate of 0.1, we were able to achieve an average accuracy of 86%. Not bad for a simple neural network trained from scratch!

  • Testing the Model: We tested the model by making predictions on a few images. The model successfully predicted 3 out of 4 images correctly, showing that it has learned useful patterns from the data.
def make_predictions(X, W1, b1, W2, b2):
    _, _, _, A2 = forward_prop(W1, b1, W2, b2, X)
    predictions = get_predictions(A2)
    return predictions

def test_prediction(index, W1, b1, W2, b2):
    current_image = X_train[:, index, None]
    prediction = make_predictions(X_train[:, index, None], W1, b1, W2, b2)
    label = Y_train[index]
    print("Prediction:", prediction)
    print("Label:", label)
    
    current_image = current_image.reshape((28, 28)) * 255
    plt.gray()
    plt.imshow(current_image, interpolation='nearest')
    plt.show()

7. Conclusion

This project has been an exciting journey of understanding how neural networks function at a fundamental level. We were able to create a basic neural network, train it on the MNIST dataset, and achieve an accuracy of 86%.

Future Work

  • Fine-tune the model by adjusting the learning rate or adding dynamic learning rates.
  • Add more hidden layers to improve the model’s learning capacity.
  • Train the model for more epochs to achieve better performance.

Objective The main objective of this project was to gain a deeper understanding of neural networks, and this knowledge will be incredibly useful in more advanced machine learning tasks.

Credits A big thank you to Samson Zhang for his tutorial that helped me understand how neural networks work from scratch. If you're interested, you can watch his full video here.

About

How to build a simple neural network from scratch using Numpy and linear algebra without relying on high-level libraries like TensorFlow or Keras.

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published