This project is an investigation into the fundamental ideas that underpin machine learning, with a particular emphasis on the use of a straightforward linear regression model. The analytic solution and gradient descent are the two primary approaches that are incorporated into the project in order to locate the line that provides the greatest fit for a particular dataset. The primary goal is to compare the effectiveness of different strategies on synthetic datasets and to gain an understanding of how these methods operate.
- Analytic Solution: Calculates the optimal weights for the regression model analytically using the normal equation.
- Gradient Descent: Iteratively adjusts the weights to minimize the cost function, showcasing the practical application of this widely used optimization technique.
- Python 3.x
- numpy
This project uses synthetic datasets provided in .in files for training the linear regression models. Each .in file contains multiple lines, each representing a data point.
A line consists of space-separated real numbers, where the last number is the target variable (y) and the preceding numbers are the feature variables (x1, x2, ..., xM).
| x1 | x2 | y |
|---|---|---|
| 14 | 20 | 69 |
| 16 | 3 | -1 |
| 24 | 30 | 99 |
| 11 | 62 | 240 |
| 30 | -4 | -43 |
In this example, each line represents a data point with two features and one target variable.
Hyperparameters for the gradient descent method are specified in .json files. Each .json file corresponds to an .in file and contains the learning rate and the number of iterations.
{
"learning rate": 0.0001,
"num iter": 1000
}{
"learning rate": 0.01,
"num iter": 1000
}Clone this repository to your local machine:
git clone https://github.com/SatvikVarshney/LinearRegressionFromScratch.gitAfter cloning, navigate to the project directory:
cd LinearRegressionFromScratchInstall the required dependencies:
pip install -r requirements.txt