My Coding >
Numerical simulations >
Theory of regression analysis >
How to fit line with fixed point
How to fit line with fixed pointVery common task for many scientific fields is to find the line, which is best fitted to a given set of data. Sometimes this task is a bit more complicated, because sometimes we have some restrictions to the coefficients of this line or curve. In this topic I will show you how to best fit the line and what to do if we have requirement to fix coefficient of this line. I already give one way of solving this problem before, see article about Ridge regression for approximate solution of linear equation, but now I need to make some extra notices. Task: Best line with fixed pointThis is a real scientific task – find the best fitted line y=k*x+b for out data [xi, yi] with requirements that this line must go through the point (0,0). First of all, if we have the requirement with other coordinates, we can do parallel shift of coordinate system, to move the origin of our system to this point for simplify our calculations. If y=k*x+b cross the point (0,0) then we can find b coefficient: 0 = k*0 + b => b = 0; Solution: Best line with fixed pointSo we need to find minima of the function Σ(k*xi + b – yi)2 when b equal 0. Solving this equation we obtain formula: k = Σxi*yi / Σx2i You can see short lecture about Linear regression analysis with fixed intercept It is not much to show how to code this simple task, but I will give you some example and compare RMSD with proper best solution without any fixed coefficients. Sample of the task for best fitted lineWe will use libraries NumPy and mathplotlib.
Then we need to prepare task for solving, this will be random cloud of dots, with fixed random start for repeatability.
Best fit with Numpy Polifit()For finding best poli-line fitted to our dataset we can just call function polyfit with mentioning the order of this line (1 for linear case)
Best fit with fixed 0We can use polyfit() function as well with adding point (0, 0) with higher weight, but this is not necessary if we have exact analytical solution, according to formula k = Σxi*yi / Σx2i
it is possible to see some difference between 0.3992235424197471 and 0.5436083924466607 RMSD between datasets.To find, how good our solution corresponds to the given dataset. By the definition, RMSD = √((1/n)*Σ(yi-ymean)2). We can make this calculation from scratch, or we can use some good formulas from NumPy library We need to find RMSD not from some mean value, but difference between two datasets. So, we will use this code for doing this:
where y1 and y2 - numpy arrays with Y values for the same X. Final code with comparison of two best fitsThat it. We are ready to put all code together
Which will give the following result:
|
Last 10 artitles
9 popular artitles
|
|||
© 2020 MyCoding.uk -My blog about coding and further learning. This blog was writen with pure Perl and front-end output was performed with TemplateToolkit. |