**Introduction**

One of the most fundamental algorithms in machine learning is linear regression. Linear regression creates a straight line between input features (X) and output labels (y).

Each output label is described as a linear function of input features using weights and biases in linear regression. These weights and biases are model parameters that are started at random and then adjusted with each training/learning cycle through the dataset. One epoch is defined as the process of training the model and changing the parameters after a single iteration of training data. We must prepare the model for numerous epochs for the weights and biases to understand the linear relationship between the input characteristics and output labels. Each target label in linear regression is expressed as a weighted sum of input variables plus a bias. The weights and biases are randomly initialized and then modified as needed during the training process.

So now, let's get started with implementation using Pytorch.

**Implementation**

**Importing Libraries**

The flame nn modules assist us in creating and training neural networks. So that's something we require.

```
import torch
import torch.nn as nn
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
from sklearn.preprocessing import MinMaxScaler
```

We need them cause we have to do some preprocessing on the dataset we will be using.

**Reading Data**

`df=pd.read_csv("TSLA.csv")`

The dataset I am going to use is Celsius to Fahrenheit data which can be found __here__.

**Data preprocessing**

```
X_train = df.iloc[:,0].values
y_train = df.iloc[:,-1].values
```

**Standardizing the data**

```
sc = MinMaxScaler()
sct = MinMaxScaler()
X_train=sc.fit_transform(X_train.reshape(-1,1))
y_train =sct.fit_transform(y_train.reshape(-1,1))
```

Because the values are so vast and variable, standardize the data.

So, suppose you don't do it for this particular example. In that case, you'll most likely get inf or nan for loss values later on when training the model, indicating that it can't do backpropagation effectively and will result in a flawed model.

**Converting Numpy arrays to Tensors**

```
X_train = torch.from_numpy(X_train.astype(np.float32)).view(-1,1)
y_train = torch.from_numpy(y_train.astype(np.float32)).view(-1,1)
```

We must ensure that X train and y train are both two-dimensional. In tensor, the view, like reshape in numpy, takes care of the 2d item.

**Model Building**

We can build the model using two ways. The first way is given below:

```
input_size = 1
output_size = 1
class LinearRegressionModel(torch.nn.Module):
def __init__(self):
super(LinearRegressionModel, self).__init__()
self.linear = torch.nn.Linear(1, 1) # One in and one out
def forward(self, x):
y_pred = self.linear(x)
return y_pred
```

The second way is:

`model = nn.Linear(input_size , output_size)`

In both cases, we're using nn. Linear generates our initial linear layer, which essentially performs a linear transformation on the data, such as y = w*x for a straight line, where y is the label and x is the feature. W stands for weight. We are pleased with one layer in our data because Celsius and Fahrenheit have a linear relationship. Still, in some circumstances where the relationship is non-linear, we add additional steps to account for the non-linearity, such as adding a sigmoid function.

**Loss Function and Optimizer**

The loss function in this situation is "mse" or "mean squared error," as we can see. Our goal will be to reduce the loss, which can be accomplished by utilizing an optimizer, such as stochastic gradient descent in this example. SGD requires initial model parameters or weights and a learning rate.

learning_rate = 0.0001

l = nn.MSELoss()

optimizer = torch.optim.SGD(model.parameters(), lr =learning_rate )

**Training**

```
num_epochs = 100
for epoch in range(num_epochs):
#forward feed
y_pred = model(X_train.requires_grad_())
#calculate the loss
loss= l(y_pred, y_train)
#backward propagation: calculate gradients
loss.backward()
#update the weights
optimizer.step()
#clear out the gradients from the last step loss.backward()
optimizer.zero_grad()
print('epoch {}, loss {}'.format(epoch, loss.item()))
```

forward feed: we're just computing the y pred using some initial weights and feature values in this step.

Loss phase: After the y pred, we need to figure out how much prediction error there was. To do so, we're going to use mse.

backpropagation: gradients are determined in this step.

Steps: The weights have been updated as a result of this step.

zero grad: last but not least, remove the gradients from the previous stage to make place for the new ones.

**Visualization**

We don't need to keep gradients any longer, so detach them from the tensor with detach(). Let's look at the first 100 data points to see how good the model is.

```
predicted = model(X_train).detach().numpy()
plt.scatter(X_train.detach().numpy()[:100] , y_train.detach().numpy()[:100])
plt.plot(X_train.detach().numpy()[:100] , predicted[:100] , "red")
plt.xlabel("Celcius")
plt.ylabel("Farenhite")
plt.show()
```

Notice how the accuracy of the forecasts improves as the number of epochs grows. Other methods for optimizing the network include adjusting the learning rate, weight initialization techniques, etc.

Finally, test the model with a known celsius value to determine if it can correctly predict the Fahrenheit value. Because the values have been transformed, use sc.inverse transform() and sct.inverse transform() to recover the original values.