Page Contents

## Prediction through Multiple Linear Regression based Model

In this post, we are performing prediction through the use of Multiple linear regression. For implementing MLR (Multiple Linear Regression), we have collected the dataset consisting of 5 columns: R&D Spend, Administration, Marketing Spend, State and Profit.

Here in the example, we are predicting Profit by considering other 4 factors. For achieving the prediction, the following steps are undertaken: Starting by importing the libraries and the dataset followed by exploring the dataset. The data consists of a total of 30 rows with no null value and no categorical data. Further the data is split into training set and test set. Train data is required to train the model to perform the prediction for new data.

Step 1: Importing the libraries

Step 2: Importing the dataset

Step 3: Exploring the dataset

Step 4: Encoding categorical data

Step 5: Splitting the dataset into the Training set and Test set

Step 5: Training the Multiple Linear Regression model on the Training set

Step 6: Predicting the Test set results

**Importing the libraries**

import numpy as np

import matplotlib.pyplot as plt

import pandas as pd

**Importing the dataset**

dataset = pd.read_csv(‘50_Startups.csv’)

X = dataset.iloc[:, :-1].values

y = dataset.iloc[:, -1].values

**Exploring the dataset**

dataset.head(5)

**Encoding categorical data**

from sklearn.compose import ColumnTransformer

from sklearn.preprocessing import OneHotEncoder

ct = ColumnTransformer(transformers=[(‘encoder’, OneHotEncoder(), [3])], remainder=’passthrough’)

X = np.array(ct.fit_transform(X))

**Splitting the dataset into the Training set and Test set**

from sklearn.model_selection import train_test_split

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size = 0.2, random_state = 0)

**Training the Multiple Linear Regression model on the Training set**

from sklearn.linear_model import LinearRegression

regressor = LinearRegression()

regressor.fit(X_train, y_train)

**Predicting the Test set results**

y_pred = regressor.predict(X_test)

np.set_printoptions(precision=2)

print(np.concatenate((y_pred.reshape(len(y_pred),1), y_test.reshape(len(y_test),1)),1))