My Coding >
Numerical simulations >
Numerical Differentiation >
Performance comparison C++, Numpy and Cupy (CUDA) for differentiation
Performance comparison C++, Numpy and Cupy (CUDA) for differentiationOne of the interesting question, what can performance we can have from using Python in compare with C/C++, and what is the benefit to use CUDA for the same calculations. To answer this question, I do calculate derivatives for different dataset sizes and for different data types. How to calculate derivative, I do explain in this video about derivatives and in this short article Performance was measured with float32 and fload64 data types with array sizes limited by memory size. Computer parameters CPU: Intel(R) Core(TM) i73770 CPU @ 3.40GHz Memory: 8GB GPU: NVIDIA GeForce GTX 1070, 8GB For GPU code was transferred from NumPy to CUPA library. Performance from the data set was calculated as described in the video about linear regression and in this article C++ code: This code was performed with float and double arrays Also, it is nesesary to note, that before main cycle of calculations, some false calculations are performed, to reduce an impact instability with cache and quick memory realocation.
Similar code was written for Python. Which was run with np.float32 and np.float64 for import cupy as np and import numpy as np. To obtain more compatible values, for CUDA, cycle = 20 was used to increase overall time into 20 times.
Due to memory size, all calculations were performed with maximal size s_max = 400_000_000 elements for float32 and s_max = 200_000_000 for float64.
The final results of performance given in the table below. Less is faster.
The same in real mathematical performance (calculated derivatives per second)
Or, transferred to relative values, with NumPy performance taken as a measure. 32/64 acceleration shows the acceleration of changing from 64 to 32 bytes float.
ConclusionIf you a happy about memory usage, then CuPy is the best for Float 64 and NumPy is very good at float32. C++ is reasonable for using with float64 It is interesting to note, that on float32, the overall performance of C++ is very low in compare with Python. One core usageDuring this experiment, only one core of CPU was used according to CPU load = 100% dedicated to this script on the TOP monitor.

Last 10 artitles
9 popular artitles


© 2020 MyCoding.uk My blog about coding and further learning. This blog was writen with pure Perl and frontend output was performed with TemplateToolkit. 