3 ways of function caching in Python

If you have a deterministic, expensive-to-calculate function with a limited range of parameters, sometimes it is a good idea to cache this function, i.e. do not recalculate it for the same parameters twice. This can save much calculation time in exchange for a small amount of used memory.

Deterministic functions

Deterministic functions are functions, the results of which are completely defined by the given parameters and solely depend on the given parameters, where the output is entirely determined by the input. In other words, if you provide the same input to a deterministic function multiple times, you will always get the same result. There is no randomness or variability involved in the output.

Libraries


import timeit
from functools import lru_cache
import diskcache

Our own cache

Pros of our own cache

Full control and full flexibility of an algorithm. Opportunity to store it (as a pickle) on the disk and use it between different program calls.

Cons of our own cache

The extra algorithm needs to be written, tested and looked after during the logic of the code. It's not the fastest implementation.

Code for our own cache

The idea is very simple and straightforward. We create a dictionary, where keys will be tuples with function parameters, and value - the result of the function calculation. For the function, we will make a wrap, which will check the presence of these parameters in the cache dictionary and respond respectively.


cache4func = {}
def func1(a, b):
    res = 0
    for i in range(10_000_000):
        res += (a**b)**(1/b)
    return res
def func_cache(a, b):
    if(a, b) not in cache4func:
        res = cache4func[(a, b)] = func1(a, b)
        return res
    else:
        return cache4func[(a, b)]

In this code, func_cache is a function wrap, which checks for the presence of the cache and then either returns value from the cache or calls for the function.


val = (3,4)
print(f"Time for {val}: {timeit.timeit(lambda: func_cache(*val), number=1)}sec")
val = (3,6)
print(f"Time for {val}: {timeit.timeit(lambda: func_cache(*val), number=1)}sec")
val = (3,4)
print(f"Time for {val}: {timeit.timeit(lambda: func_cache(*val), number=1000)}sec")
#Time for (3, 4): 16.08006967289839sec
#Time for (3, 6): 14.821766839013435sec
#Time for (3, 4): 0.0000005525050219148397sec

You can see the acceleration of the third call caused by the caching of the parameters. If you call this block again (in Jupyter for example) all lines will be executed the same fast and for real comparison, I will call the cached version 1000 times (number=1000).

Python memory cache lru_cache

Pros of memory cache lru_cache

Maximal possible performance. The usage of this cache does not require any coding attention - it is straightforward. Decorators are extremely simple and robust for usage.

Cons of memory cache lru_cache

Not much control over this cache, so you can't save it on the disk for future use, or check or modify some values. You can only control its size.

Usage of memory cache lru_cache

To use the lru_cache decorator, it is necessary to import the relative library and call this decorator in front of the function to be cached.


from functools import lru_cache
@lru_cache(maxsize=None)
def func2(a, b):
    res = 0
    for i in range(10_000_000):
        res += (a**b)**(1/b)
    return res

The only parameter that needs to be modified - maxsize - the maximal size of the cache, how many last calls to store.

Now to use func2 we just need to call it without any extra manipulations, the decorator will do the job automatically.

Measurement reveals, that this decorator works 1.5-3 times faster than our dictionary-based cache.

Python diskcache

Pros of diskcache

Simplicity of usage. This decorator again does not require any complicated coding. Simple and robust - don't forget to mention it, and Python will take care of everything. Data will be stored on the disk drive, so they will be available between function calls.

Cons of diskcache

Disk operations are slow in comparison with memory caching. Also needs to be checked that the disk cache will not be interfered with other codes.

Usage of diskcache

As with all decorators, this decorator is simple to use as well. But before the first usage, it is necessary to declare the directory for the cache storage, and then declare the decorator in front of the function to be cached.


disk_cache = diskcache.Cache("./mycache")
@disk_cache.memoize()
def func3(a, b):
    res = 0
    for i in range(10_000_000):
        res += (a**b)**(1/b)
    return res

The second call of the code will read the cache from the disk, which is very useful for very expensive functions. But on average, the reading from the disk is 100-1000 slower than the memory cache, which in most cases is not principally crucial.

The final choice of the caching system for deterministic functions is related to the nature of the task and should be done for each problem individually.