My Coding >
Numerical simulations >
Visualization >
Visualization of a high-range data with pseudo logarithm
Visualization of a high-range data with pseudo logarithmOne of the common problems for data presentation is how to deal with the data with a very high range when it is important to show details at the high and low values. For example, we can have a very strong source of oscillations with relatively weak waves excited by this source and we need to display both of them on the same plot. The solution of this problem is shown in the video about wide-ranged data. Rescaling methodsFramework for experimentsFor clarity, let's generate this situation numerically.
In this code, I use a very high-ranged sigmoid function multiplied by a simple oscillation function. Also, I do make sure that for any values of X, this function oscillated around the horizontal axis, i.e. has positive and negative values. I will show a few different ways of rescaling and plot them on the same big graph. For this, I will use this subplot framework:
Original graphWith this framework, we can plot the original function without any modifications. Also, let's print the range of this function.
The range of this function for low X is in order of millions and for the high X, the range of this function is within only tens.
There are a few ways to rescale these data to have a better presentation in all ranges of values, let's explore them in detail. LimitsThe first obvious solution is to apply limits to omit detailed information at the large values but resolve the low-value range.
In this code, we limit the range of plotted values by the range from -10 to 10.
As for now, it is possible to see the details at the low range of the function, but it is pretty difficult to see anything at the big range. For some analysis, it is ok, when you do not need to plot overloaded data. For some data, it is not acceptable. To resolve this problem we can try to use the logarithmic scale: Logarithm scaleLogarithm scale is a popular way to display data with exponential growth, or as in our case with a very wide range. For the simplicity of quantitative analysis, it is very common to take a logarithm with a base of 10. The first possible problem for the logarithm, it is only defined for positive arguments. It is easy to resolve by calculating the logarithm for positive values and then taking the logarithm from the module of the negative argument and changing the sign of the result. In Python, it is very convenient to do with applying positive and negative masks in Numpy
This code for the values of the y > 0 makes a positive mask (y - is a numpy array of results) and for y < 0 - makes a negative mask. Then for the modified values, we create an array of the same size as our values filled with 0, by simple multiplication, or using numpy function zeros_like(). Then we apply the logarithm with the base 10 to positive values and apply the logarithm with base 10 to a negative parameter for negative parameters and then change its sign.
This plot is better. First of all, it shows the behaviour of the function at the low and high values. Also, it is possible to estimate real values as powers of the 10. But it has strange peaks around the places where the function is crossing the y=0 line. This is becasue the the value of the logarithm for the parameters below 1 is negative and approaching \(-\infty\). That is why this method of scaling is not very practical for the function with positive and negative values. But we can resolve it by adding 1 to the parameter of the logarithm scaling function. log1p scalingNumpy function log1p() returns the natural logarithm of one plus the input array, element-wise. You can also use log(y+1) way of calculating. It will be the same, but using the proper function will make calculations perform much faster. Again we need to split values into a positive and negative range. Also, it is important to remember, that log1p() is a natural logarithm, and to convert it to a decimal logarithm, it is necessary to divide the results into a log(10).
It is not necessary to calculate positive and negative masks again, we already calculate it.
This is a perfect plot for a wide range of values. Another good way of representing this kind of data is a pseudo-logarithm. Pseudo logarithm.It is possible to design logarithm-like functions, which is an odd function, approach to log(x) at the large values of X, and approximately linear at the low X values. \[y = \frac{arsinh(\frac{x}{2})}{\ln(10)}\] This function at close the the logarithm+1 function:
It is possible to see, that these functions are pretty similar and any of them can be used.
The advantage of the pseudo-logarithm function - it is defined for all ranges of parameters and it is not necessary to split them into positive and negative sets.
Apart from this advantage, they are very similar and give some problems in the exact definition of the small-by-module parameters. High power root scalingAnother, way to rescale large range data for displaying is the root of a high power. For odd power, it is not necessary (in theory) to split parameters into positive and negative subsets. Rather than for even powers of the root, the behaviour of the graph is very complicated. You can watch the video about how to calculate roots from negative values. Unfortunately, python is not very good at calculating the power of (1/odd integer) from negative values, therefore it is necessary to split the dataset into positive and negative subsets.
The result looks promising, but again it is difficult to clearly understand the relation between real and displayed values. Also, this scaling increases the values in the absolute range lower than 1, which allows us to observe really small features.
This function is also can be used for the scaling of the wide-ranged function. Combined scalingAnother prominent way of presenting data is to use different scaling procedures for different data ranges. Ideally, when at the border between different scale ranges the scaled values are equal, but sometimes it is impossible to achieve for standard scaling procedures. In this case, it is necessary to mention the jumps on the graph or use the gap between ranges to make the graph smooth. For example, we can take real values within the range [-5, 5] and take pseudo-logarithms outside this range.
It is easy to make maks in numpy for any conditions but adding logic operations.
As one can see, without the gap, the graph has some strong jumps when changing from one scaling to another and the only way to avoid them is to omit some data in the switching area. For every particular task, it is necessary to choose the best way of scaling to show the most desired parts of the graph. 2D exampleThe same principle is much better to show on the 2D plot, which can be done with imshow procedure from the matplotlib. I will not show the codes, required for this plot, but only show the comparison of a few different approaches to the same data.
As one can see from the above image, the different scaling methods can give slightly different results and the way of scaling should be chosen based on the task required to be solved and the features that need to be shown on the plot.
|
Last 10 artitles
9 popular artitles
|
|||||||||||||||||||
© 2020 MyCoding.uk -My blog about coding and further learning. This blog was writen with pure Perl and front-end output was performed with TemplateToolkit. |