API Reference

Core

@memoize(*, disable=None, **kwargs)

See charmonium.cache.Memoized.

Parameters
Return type

Callable[[Callable[[<sphinx.util.inspect.TypeAliasForwardRef object at 0x7f7c304a1ae0>], <sphinx.util.inspect.TypeAliasForwardRef object at 0x7f7c304a18d0>]], Memoized[<sphinx.util.inspect.TypeAliasForwardRef object at 0x7f7c304a1330>, <sphinx.util.inspect.TypeAliasForwardRef object at 0x7f7c304a1960>]]

class Memoized(func: 'Callable[FuncParams, FuncReturn]', *, group: 'MemoizedGroup' = <charmonium.cache.util.Future object at 0x7f7c30fac8b0>, name: 'Optional[str]' = None, use_obj_store: 'bool' = True, use_metadata_size: 'bool' = False, pickler: 'Optional[Pickler]' = None, extra_func_state: 'Callable[[Callable[FuncParams, FuncReturn]], Any]' = Constant(None)) 'None'

Bases: Generic[charmonium.cache.util.FuncParams, charmonium.cache.util.FuncReturn]

Parameters
  • func (Callable[FuncParams, FuncReturn]) –

  • group (MemoizedGroup) –

  • name (Optional[str]) –

  • use_obj_store (bool) –

  • use_metadata_size (bool) –

  • pickler (Optional[Pickler]) –

  • extra_func_state (Callable[[Callable[FuncParams, FuncReturn]], Any]) –

Return type

None

__init__(func, *, group=<charmonium.cache.util.Future object>, name=None, use_obj_store=True, use_metadata_size=False, pickler=None, extra_func_state=Constant(None))

Construct a memozied function

Parameters
  • group (MemoizedGroup) – see charmonium.cache.MemoizedGroup.

  • name (Optional[str]) – A key-to-lookup distinguishing this funciton from others. Defaults to the Python module and name.

  • extra_func_state (Callable[[Callable[FuncParams, FuncReturn]], Any]) – An extra state function. The return-value is a key-to-match after the function name.

  • use_obj_store (bool) – whether the objects should be put behind object store, a layer of indirection.

  • use_metadata_size (bool) – whether to include the size of the metadata in the size threshold calculation for eviction.

  • pickler (Optional[Pickler]) – A custom pickler to use with the index. Pickle types must include tuples of picklable types, hashable types, and the arguments (__cache_key__ and __cache_var__, if defined).

  • func (Callable[FuncParams, FuncReturn]) –

Return type

None

func: Callable[[FuncParams], FuncReturn]
name: str
group: MemoizedGroup
log_usage_report()
Return type

None

would_hit(*args, **kwargs)
Parameters
  • args (FuncParams.args) –

  • kwargs (FuncParams.kwargs) –

Return type

bool

class MemoizedGroup(*, obj_store=None, replacement_policy='gdsize', size=KiB(100.0), pickler=<module 'pickle' from '/nix/store/iw1vmh509hcbby8dbpsaanbri4zsq7dj-python3-3.10.10/lib/python3.10/pickle.py'>, lock=None, fine_grain_persistence=False, fine_grain_eviction=False, extra_system_state=Constant(None), temporary=False)

Bases: object

A MemoizedGroup holds the memoization for multiple functions.

Parameters
  • obj_store (Optional[ObjStore]) –

  • replacement_policy (Union[str, ReplacementPolicy]) –

  • size (Union[int, str, bitmath.Bitmath]) –

  • pickler (Pickler) –

  • lock (Optional[RWLock]) –

  • fine_grain_persistence (bool) –

  • fine_grain_eviction (bool) –

  • extra_system_state (Callable[[], Any]) –

  • temporary (bool) –

Return type

None

__init__(*, obj_store=None, replacement_policy='gdsize', size=KiB(100.0), pickler=<module 'pickle' from '/nix/store/iw1vmh509hcbby8dbpsaanbri4zsq7dj-python3-3.10.10/lib/python3.10/pickle.py'>, lock=None, fine_grain_persistence=False, fine_grain_eviction=False, extra_system_state=Constant(None), temporary=False)

Construct a memoized group. Use with :py:function:Memoized.

Parameters
  • obj_store (Optional[ObjStore]) – The object store to use for return values.

  • replacement_policy (Union[str, ReplacementPolicy]) – See policies submodule for options. You can pass an object conforming to the ReplacementPolicy protocol or one of REPLACEMENT_POLICIES.

  • size (Union[int, str, bitmath.Bitmath]) – The size as an int (in bytes), as a string (e.g. “3 MiB”), or as a bitmath.Bitmath.

  • pickler (Pickler) – A de/serialization to use on the index, conforming to the Pickler protocol.

  • lock (Optional[RWLock]) – A ReadersWriterLock to achieve exclusion. If the lock is wrong but the obj_store is atomic, then the memoization is still correct, but it may not be able to borrow values that another machine computed. Defaults to a FileRWLock.

  • fine_grain_persistence (bool) – De/serialize the index at every access. This is useful if you need to update the cache for multiple simultaneous processes, but it compromises performance in the single-process case.

  • fine_grain_eviction (bool) – Maintain the cache’s size through eviction at every access (rather than just the de/serialization points). This is useful if the caches size would not otherwise fit in memory, but it compromises performance if not needed.

  • extra_system_state (Callable[[], Any]) – A callable that returns “extra” system state. If the system state changes, the cache is dumped.

  • temporary (bool) – Whether the cache should be cleared at the end of the process; This is useful for tests.

Return type

None

time_cost: dict[str, timedelta]
time_saved: dict[str, timedelta]
temporary: bool
remove_orphans()

Remove data in the objstore that are not referenced by the index.

Orphans can accumulate if there are multiple processes. They might generate orphans if they crash or if there is a bug in my code (yikes!). If you notice accumulation of orphans, I recommend calling this function once or once-per-pipeline to clean them up.

However, this can compromise performance if you do it while peer processes are active. They may have a slightly different index-state, and you might remove something they wanted to keep. I recommend calling this before you fork off processes.

Return type

None

Components

class ObjStore

Bases: object

An object-store is a persistent mapping from int to bytes.

__setitem__(key, val)
Parameters
Return type

None

__getitem__(key)
Parameters

key (int) –

Return type

bytes

__delitem__(key)
Parameters

key (int) –

Return type

None

clear()
Return type

None

class DirObjStore(path, key_bytes=16)

Bases: charmonium.cache.obj_store.ObjStore

Use a directory in the filesystem as an object-store.

Each object is a file in the directory.

Note that this directory must not contain any other files.

Parameters
  • path (Path) –

  • key_bytes (int) –

Return type

None

__init__(path, key_bytes=16)
Parameters
  • path (Union[Path, str]) – the directory of the object store.

  • key_bytes (int) – the number of bytes to use as keys

Return type

None

path: Path
key_bytes: int
clear()
Return type

None

class ReplacementPolicy

Bases: object

A replacement policy for a cache

abstract add(key, entry)

Called when a key, entry pair is added to the index.

Parameters
  • key (Any) –

  • entry (Entry) –

Return type

None

abstract access(key, entry)

Called when a key, entry pair is accessed/used.

Update last-used-time here.

Parameters
  • key (Any) –

  • entry (Entry) –

Return type

None

invalidate(key, entry)

Called when a key is invalidated by the a subkey-to-match.

This could be useful as a metric to see develop a heuristic for fast a versioned resources is changing.

Parameters
  • key (Any) –

  • entry (Entry) –

Return type

None

abstract evict()

Select a key, entry pair to evict.

Return type

tuple[Any, <sphinx.util.inspect.TypeAliasForwardRef object at 0x7f7c3030e650>]

abstract update(other)

Update self with contents of other, but self overrides other.

This is necessary because there could be multiple processes using the same MemoizedGroup. A differnet process may have made progress. We want to incorporate their progress into this process.

Parameters

other (ReplacementPolicy) –

Return type

None

class GDSize

Bases: charmonium.cache.replacement_policies.ReplacementPolicy

GreedyDual-Size policy, described by [Cao et al]_.

Return type

None

add(key, entry)

Called when a key, entry pair is added to the index.

Parameters
  • key (Any) –

  • entry (Entry) –

Return type

None

access(key, entry)

Called when a key, entry pair is accessed/used.

Update last-used-time here.

Parameters
  • key (Any) –

  • entry (Entry) –

Return type

None

invalidate(key, entry)

Called when a key is invalidated by the a subkey-to-match.

This could be useful as a metric to see develop a heuristic for fast a versioned resources is changing.

Parameters
  • key (Any) –

  • entry (Any) –

Return type

None

evict()

Select a key, entry pair to evict.

Return type

tuple[Any, <sphinx.util.inspect.TypeAliasForwardRef object at 0x7f7c300f1960>]

update(other)

Update self with contents of other, but self overrides other.

This is necessary because there could be multiple processes using the same MemoizedGroup. A differnet process may have made progress. We want to incorporate their progress into this process.

Parameters

other (ReplacementPolicy) –

Return type

None

class Pickler

Bases: object

loads(buffer)
Parameters

buffer (bytes) –

Return type

Any

dumps(obj)
Parameters

obj (Any) –

Return type

bytes

class RWLock

Bases: object

A Readers-Writer Lock guarantees N readers xor 1 writer.

This permits read-concurrency in the underlying resource when there is no writer.

property reader: Lock
property writer: Lock
class FileRWLock(path: 'PathLikeFrom') 'None'

Bases: charmonium.cache.rw_lock.RWLock

Parameters

path (Union[str, PathLike]) –

Return type

None

__init__(path)

Creates a lockfile at path.

Parameters

path (Union[str, PathLike]) –

Return type

None

path: Union[str, PathLike]
property writer: Lock
property reader: Lock
class NaiveRWLock(lock)

Bases: charmonium.cache.rw_lock.RWLock

RWLock constructed from a regular Lock.

A true readers-writers lock permits read concurrency (N readers xor 1 writer), but in some cases, that may be more maintanence effort than it is worth. A NaiveRWLock permits 1 reader xor 1 writer.

Parameters

lock (Lock) –

Return type

None

__init__(lock)
Parameters

lock (Lock) –

Return type

None

property reader: Lock
property writer: Lock
class Lock

Bases: object

Helpers

class FileContents(path, comparison='crc32')

Bases: object

wraps the path and its contents, to make your function pure

  • When FileContents is un/pickled, the contents of path get restored/snapshotted.

  • When FileContents is used as an argument, the path is the key and the contents are the version.

FileContents is os.PathLike, so you can open(FileContents("file"), "rb"). You won’t even know its not a string.

Since this changes the un/pickle protocol, this class might cause unexpected results when used with fine_grain_persistence.

Parameters
Return type

None

__init__(path, comparison='crc32')
Parameters
Return type

None

path: PathLike
comparison: str
__cache_key__()

Returns the path

Return type

str

__cache_ver__()

Returns the contents of the file

Return type

Any

class TTLInterval(interval)

Bases: object

TTLInterval(td)() returns a value that changes once every td.

td may be a a timedelta or a number of seconds.

It can be used as extra_system_state or extra_func_state. For example,

>>> from charmonium.cache import memoize
>>> interval = TTLInterval(datetime.timedelta(seconds=0.5))
>>> # applies a 0.5-second TTL to justthis function
>>> @memoize(extra_func_state=interval)
... def func():
...     pass

Underlying usage:

>>> import datetime, time
>>> interval = TTLInterval(datetime.timedelta(seconds=0.5))
>>> start = interval()
>>> start == interval()
True
>>> time.sleep(0.5)
>>> start == interval()
False
Parameters

interval (Union[int, float, timedelta]) –

Return type

None

__init__(interval)
Parameters

interval (Union[int, float, timedelta]) –

Return type

None

with_attr(obj, attr_name, attr_val)
Parameters
  • obj (charmonium.cache.util._T) –

  • attr_name (str) –

  • attr_val (Any) –

Return type

charmonium.cache.util._T

Utils

class PathLike

Bases: object

Duck type of pathlib.Path

__truediv__(key)

Joins a segment onto this Path.

Parameters

key (str) –

Return type

Any

read_bytes()
Return type

bytes

write_bytes(data)
Parameters

data (bytes) –

Return type

int

mkdir(*, parents=Ellipsis, exist_ok=Ellipsis)
Parameters
  • parents (bool) –

  • exist_ok (bool) –

Return type

None

Parameters

missing_ok (bool) –

Return type

None

iterdir()
Return type

Iterable[Any]

stat()
Return type

stat_result

property parent: Any
exists()
Return type

bool

resolve()
Return type

Any

property name: str
class Future(thunk)

Bases: Generic[charmonium.cache.util._T]

Parameters

thunk (Callable[[], _T]) –

Return type

None

__init__(thunk)
Parameters

thunk (Callable[[], charmonium.cache.util._T]) –

Return type

None

unwrap()
Return type

charmonium.cache.util._T

classmethod create(thunk)
Parameters

thunk (Callable[[], charmonium.cache.util._T]) –

Return type

charmonium.cache.util._T