My Coding > Programming language > Python > PyTocrch > PyTorch - Neural network for simple regression analysis > Linear regression for approximate solution of linear equation

Linear regression for approximate solution of linear equation

This is practical task from many real experiments. If we have two, linearly related values and we need to find this relation. But from the experiment we only knows the few pairs of values, and furthermore, these pairs are not measured precisely, but with some errors. We need to find the original linear relation between these two values.

Lets for example consider these data from equation y = k*x + b:

x 0 1 2 3

y 0 1 0 3

As you can see, these data are not ideal and can’t be approximated by one line, but we need to do it. The easiest way is to calculate minimal square distance between them

Manual solution of approximate linear equation

Original data for fitting
This is our original data, which we need to use for finding best line going through them

So, we need to find W₁ and W₀ from equation y'=W₁*x + W₀ with minimal square distance between y' from our calculations and y from experiment.

Speaking mathematically, we need to find the minima of the function

L(W₁, W₀) = Σ(W₁*x_i + W₀ - y_i)²

To find the minima, we need to solve two equations in partial derivatives

∂L/∂W₁ = 0; ∂L/∂W₀ = 0

∂/∂W₁Σ(W₁*x_i + W₀ - y_i)² = 0; ∂/∂W₀Σ(W₁*x_i + W₀ - y_i)² = 0;

For understanding of all equations, let’s do it manually here with immediate substitution of our table data:

L = (W₀)² + (W₁ + W₀ - 1)² + (2*W₁ + W₀)² + (3*W₁ + W₀ - 3)² = 14*W²₁ + 4*W²₀ + 10 + 12*W₁*W₀ - 20*W₀ - 8*W₀;

Now we can take partial derivatives:

∂L/∂W₁ = 28*W₁ + 12*W₀ - 20 = 0

∂L/∂W₀ = 12*W₁ + 8*W₀ - 8 = 0

and we need to solve this system of equations.

W₁ = 0.8; W₀ = -0.2

How to solve the system of linear equations you can read here.

y = 0.8*x – 0.2
Our best fitted line is y = 0.8*x – 0.2.

Now, when we understand all mathematical operation, staying behind this procedure, we can use some python tools to solve this task

Numpay method of solving approximate systems numpy.linalg.lstsq


import numpy as np
x = np.array([0, 1, 2, 3])
y = np.array([0, 1, 0, 3])

We need to rewrite our line equation y=W₁*x + W₀ as y = Ap, where A = [[x 1]] and p = [[W₁], [W₀]], and then solve it:


A = np.vstack([x, np.ones(len(x))]).T
print(A) # to see what it is inside
#[[0. 1.]
# [1. 1.]
# [2. 1.]
# [3. 1.]]
(w1,w0) = np.linalg.lstsq(A, y, rcond=None)[0]
print(w1,w0) # 0.7999999999999997 -0.19999999999999904

which is very close to our manual solution.

Sclearn method of solving approximate systems

Basically, sclearn do the same procedure. The only difference, you need to prepare data in slightly different format. It is better to prepare data in numpy array again for easy manipulation. For linear_model.fit procedure you need to prepare data in the following format: x = [[0],[1],[2],[3]], y = [0,1,0,3]

Easiest way id to reshape X with parameters (-1, 1)


import numpy as np
from sklearn import linear_model as lm
lr = lm.LinearRegression()
x = np.array([0, 1, 2, 3])
y = np.array([0, 1, 0, 3])
x=x.reshape(-1, 1) 
lr.fit(x, y)
w1 = lr.coef_[0]
w0 = lr.intercept_
print(w1,w0) # 0.7999999999999998 -0.19999999999999973

Performance comparison between Sclearn and numpy

Best fitted line
Calculation of the best fitted line with numpy.linalg.lstsq (green) and linear_model.fit (red). Both methods give the same results y = 1.5x – 2.3. They are so close that you can’t see the red line.

To calculate the time difference between these two functions I’ve create two x,y dataset with 5000 points (only 500 are shown on the picture) and solve it 10000 times with numpy and sclearn


import numpy as np
from sklearn import linear_model as lm
import datetime
# create random arrays with definetely present solution
x1 = np.random.rand(5000)*10
x2 = np.random.rand(5000)*10+10
x = np.concatenate([x1,x2])

y1 = np.random.rand(5000)*5
y2 = np.random.rand(5000)*5+20
y = np.concatenate([y1,y2])

# numpy solution
def nps(x,y):
    A = np.vstack([x, np.ones(len(x))]).T
    (w1,w0) = np.linalg.lstsq(A, y, rcond=None)[0]
    return(w1,w0)
    
# sklearn solution
def sks(x,y):
    xr=x.reshape(-1, 1) 
    lr.fit(xr, y)
    (w1, w0) = (lr.coef_[0], lr.intercept_)
    return(w1,w0)

start_time = datetime.datetime.now()
for i in range(10000):
    (nw1, nw0) = nps(x,y)
end_time = datetime.datetime.now()
print("numpy time = ", end_time - start_time)

start_time = datetime.datetime.now()
for i in range(10000):
    (sw1, sw0) = sks(x,y)
end_time = datetime.datetime.now()
print("sklearn time = ", end_time - start_time)

The result is pretty predictable, numpy is almost 3 times faster than sklearn.

numpy time = 0:00:02.199247

sklearn time = 0:00:06.124603

Published: 2022-06-18 02:28:10
Updated: 2022-06-18 02:29:50

Last 10 artitles

2024-11-29 16:38:37 - Estimating super-sonic missile velocity in the atmosphere
2024-11-27 19:16:02 - YouTube manipulations
2024-10-31 19:17:21 - Hydrogen (H) electron density
2024-10-31 16:09:56 - Period 1 element
2024-10-31 12:46:28 - Atom electron clouds
2024-10-31 12:04:50 - Ephemerides
2024-10-19 14:46:49 - Polynomial fit with Numpy polyfit
2024-10-15 12:09:28 - Astronomy and Python
2024-07-24 10:19:17 - Calendar
2024-07-24 09:32:51 - Faker

9 popular artitles

Python Panda (1359)
Python NumPy (913)
Python: How to make absolute links in BeautifulSoup (692)
In-place and out-place Numpy functions (632)
Linear first order ODE (608)
Python: How to make video from images (600)
NumPy: What the difference between mgrid and meshgrid (571)
Plotting of Interactive Electric field due to point charges with Matplotlib (504)
How to reduce size of the numpy array (444)