Automatic dispatch¶
The primary function of autoray is to enable writing high level
array / tensor code that is agnostic to the backend arrays being supplied.
It does this via ‘automatic dispatch’, which has a few notable
differences to other approaches:
It is automatic - generally neither you or the backend array library needs to implement any dispatch logic, instead
autorayfinds, if neccesary ‘translates’, and then caches the relevant functions when they are first called.It is specialized for array functions and treats
numpyas the reference interface for call signatures of ‘equivalent’ functions, although it doesn’t rely or numpy or require it to be installed.Despite this, there is no fixed API as such - if a backend can be inferred, and the relevant function imported, a
docall is valid.
Basics¶
The main function of autoray is do, which takes a
function name followed by *args and **kwargs, and automatically looks up
(and caches) the correct backend function. There are four main ways that the
backend is inferred:
1. Automatic backend:
do('sqrt', x)
Here the backend is inferred from x. By default dispatch happens on the
first argument, but various functions (such as 'stack' and 'einsum')
know to dispatch on other arguments.
2. Backend ‘like’ another array:
do('random.normal', size=(2, 3, 4), like=x)
Here the backend is inferred from another array and can thus be implicitly
propagated, even when functions take no array arguments. Some creation routines
such as "eye" and "zeros" will also set the default dtype and / or
device to match like in this case.
3. Explicit backend:
do('einsum', eq, x, y, like='customlib')
Here one simply supplies the desired function backend explicitly.
4. Context manager
with backend_like('autoray.lazy'):
xy = do('tensordot', x, y, 1)
z = do('trace', xy)
Here you set a default backend for a whole block of code. This default
overrides method 1. above but 2. and 3. still take precedence. The argument to
backend_like can be a backend string or an example
array.
Namespace API¶
An alternative way to call functions is to get a namespace object using
get_namespace. There are two main ways to use this.
1. Late dispatch – alternative syntax to do:
Here dispatch still occurs according to above 4 patterns, but with a syntax similar to a standard array library.
xp = ar.get_namespace()
x = xp.ones((3, 4), like="torch")
xp.sqrt(x)
Hint
autoray also provides this ‘default namespace’ as an importable object like
so:
from autoray import numpy as np
np.fn_name(*args, like=like, **kwargs)
2. Early dispatch – specialized to a particular backend (and optionally
dtype and device)
Here get_namespace dispatches on a like argument:
# if `like` is an array, dtype and device will be picked up from it
xp = ar.get_namespace(like="torch", device="cuda", dtype="float32")
x = xp.ones((3, 4))
xp.sqrt(x)
Function calls then lookup and cache the specific backend function, removing
any dispatch overhead. This is a concise and fast way to use autoray within a
function where you are happy to lock in the backend. It is also similar how the
python array api works.
Manual dispatch functions¶
You can manually break the process into two steps with the following functions:
autoray.infer_backend- return the backend name for a single array.autoray.infer_backend_multi- return the backend name based on multiple arrays.autoray.get_lib_fn- return the actual function for a given backend and function name.
If you know you are going to use a function repeatedly, you can thus avoid the (albeit minor) overhead of dispatching each call separately, for instance:
def matmul_chain(*arrays):
# if the arrays might be a mix of backends, use infer_backend_multi,
# but here we just dispatch on the first array
backend = infer_backend(arrays[0])
fn = get_lib_fn(backend, 'matmul')
return functools.reduce(fn, arrays)
Other special functions¶
There are a few high level functions that might be preferred to attribute access, for reasons of consitency:
autoray.shape- return the shape of an array. In most casesx.shapeis fine, but this ensures the output istuple[int]and also works for builtins without callingnumpy.autoray.ndim- return the number of dimensions of an array.autoray.size- return the total number of elements in an arrayautoray.dag- return the adjoint of an array, i.e. the transpose with complex conjugation.
Functions for dealing with dtypes:
autoray.get_dtype_name- return the name of the dtype of an array as a stringautoray.to_backend_dtype- turn a string specified dtype into the equivalent dtype for a given backendautoray.astype- cast an array to a given dtype, specified as a string.
And for converting any array to a numpy array:
Hint
All of these can be called via do as well, e.g.
do('shape', x).
Backends¶
In autoray a backend internally is simply specified by a string.
By default, the backend of an array is name of the library that the class is
defined in, and the relevant functions are assumed to be in the namespace of
backend. If that is the case (e.g. cupy), then that library is already
compatible with autoray. Note all backend lookups are cached on
obj.__class__ for speed.
autoray also handles common cases where the functions are in a different
library or sub-module (such as jax -> jax.numpy). This requires a simple
mapping to be specified, which autoray does for various libraries.
You can explicitly register a backend name (and thus default location) for a
specific class with the function
register_backend:
register_backend(mylib.myobjs.MyClass, 'mylib.myfuncs')
Now when autoray encounters an instance of MyClass it will look for
functions in mylib.myfuncs instead of mylib. You could also use an
arbitrary name for the backend, and then alias it to the correct location
separately.
Note
autoray is aware of the scipy namespace and relevant submodules for
numpy, cupy, jax, for example:
do('scipy.linalg.exp', x)
Functions¶
Once a backend is inferred and the location of the relevant functions is
known, autoray tries to import and cache the relevant function from that
namespace. Many libraries (e.g. cupy, dask, jax, autograd, sparse,
…) actively mirror the numpy API, so there is little else to be done.
Some other libraries (e.g. tensorflow, pytorch, …) diverge from the
numpy API more, and yet have largely equivalent functions, simply defined in
slight different places with different names and / or signatures. autoray
has a simple translation mechanism for:
when functions are in a different module (e.g.
'trace' -> tensorflow.linalg.trace)when functions have a different name (e.g.
'sum' -> tensorflow.reduce_sum)when functions have a different signature (e.g.
tensordot(a, b, axes) -> torch.tensordot(a, b, dims))
If you want to directly provide a missing or alternative implementation of
some function for a particular backend you can swap one in with
register_function:
def my_custom_torch_svd(x):
import torch
print('Hello SVD!')
u, s, v = torch.svd(x)
return u, s, v.T
ar.register_function('torch', 'linalg.svd', my_custom_torch_svd)
x = ar.do('random.uniform', size=(3, 4), like='torch')
ar.do('linalg.svd', x)
# Hello SVD!
# (tensor([[-0.5832, 0.6188, -0.5262],
# [-0.5787, -0.7711, -0.2655],
# [-0.5701, 0.1497, 0.8078]]),
# tensor([2.0336, 0.8518, 0.4572]),
# tensor([[-0.4568, -0.3166, -0.6835, -0.4732],
# [-0.5477, 0.2825, -0.2756, 0.7377],
# [ 0.2468, -0.8423, -0.0993, 0.4687]]))
If you want to make use of the existing function you can supply wrap=True
in which case the custom function supplied should act like a decorator:
def my_custom_sum_wrapper(old_fn):
def new_fn(*args, **kwargs):
print('Hello sum!')
return old_fn(*args **kwargs)
return new_fn
ar.register_function('torch', 'sum', my_custom_sum_wrapper, wrap=True)
ar.do('sum', x)
# Hello sum!
# tensor(5.4099)
Though be careful, if you call register_function
again it will now wrap the
new function! Note you can combine
register_backend and
register_function to dynamically define array
types and functions from anywhere. See also
register_dispatch for
controlling which arguments are used to infer the backend for any function.
Composing new functions¶
Sometimes you want to define a function that is composed of many array
functions, but you want to dispatch at the level of the whole block, not each
individual call, or indeed use a completely different implementation. For
instance, you might want to use a numba or
pythran compiled version for
numpy.
The autoray.compose function allows you to do this. You
decorate a function, that forms the default implementation, then you can
register alternative implementations for specific backends. For instance:
from autoray import compose
from numba import njit
@compose
def my_func(x):
# get how many elements are needed to sum to 20
return ar.do('sum', ar.do('cumsum', x, 0) < 20)
# register a numba implementation
@my_func.register('numpy')
@njit
def my_func_numba(x):
s = 0.0
i = 0
while s < 20:
s += x[i]
i += 1
return i - 1
# any calls like this now dispatch to my_func_numba
do('my_func', x_numpy)
Deviations from numpy¶
As stated above, autoray does not have an explicit API, but where there exist
equivalent functions, autoray uses the call signature of numpy as a
reference. The following are deviations from this:
do('linalg.svd', x)-autoraydefaults tofull_matrices=False, since this is generally always desired, and many libraries do not even supportfull_matrices=True.
Comparison to alternatives¶
The Python Array API (
x.__array_namespace__()) is a proposed standard for array libraries to implement that many are converging to. It hopefully will be the de-facto approach for writing backend array agnostic code, and autoray.get_namespace provides a similar mechanism with this in mind. The current benefit ofautorayis that it does not require the backend library to explicitly support it, and it can dynamically compose and replace functions etc.The
__array_function__protocol has been suggested and now implemented innumpy. This is more orientated around array classes that explicitly subclassnumpy.ndarray.The uarray project appears to have similar goals but is still being developed.
functools.singledispatchis a general single dispatch mechanism, but it is slower and requires the user to explicitly register each function they want to dispatch on.plumis a general multiple dispatch mechanism, but again it would require registering every function for every backend explicitly.