localapi Package¶
localapi
Package¶
Modules dealing with the local index-space view of DistArrays.
In other words, the view from an engine.
construct
Module¶
error
Module¶
-
exception
distarray.localapi.error.
IncompatibleArrayError
[source]¶ Bases:
distarray.error.DistArrayError
Exception class when arrays are incompatible.
-
exception
distarray.localapi.error.
InvalidBaseCommError
[source]¶ Bases:
distarray.error.DistArrayError
Exception class when an object expected to be an MPI.Comm object is not one.
-
exception
distarray.localapi.error.
InvalidDimensionError
[source]¶ Bases:
distarray.error.DistArrayError
Exception class when a specified dimension is invalid.
-
exception
distarray.localapi.error.
NullCommError
[source]¶ Bases:
distarray.error.DistArrayError
Exception class when an MPI communicator is NULL.
format
Module¶
Define a simple format for saving LocalArrays to disk with full information
about them. This format, .dnpy
, draws heavily from the .npy
format
specification from NumPy and from the data structure defined in the Distributed
Array Protocol.
Version numbering¶
The version numbering of this format is independent of DistArray’s and the Distributed Array Protocol’s version numberings.
Format Version 1.0¶
The first 6 bytes are a magic string: exactly \x93DARRY
.
The next 1 byte is an unsigned byte: the major version number of the file
format, e.g. \x01
.
The next 1 byte is an unsigned byte: the minor version number of the file
format, e.g. \x00
. Note: the version of the file format is not tied
to the version of the DistArray package.
The next 2 bytes form a little-endian unsigned short int: the length of the header data HEADER_LEN.
The next HEADER_LEN bytes form the header data describing the
distribution of this chunk of the LocalArray. It is an ASCII string
which contains a Python literal expression of a dictionary. It is
terminated by a newline (\n
) and padded with spaces (\x20
) to
make the total length of magic string + 4 + HEADER_LEN
be evenly
divisible by 16 for alignment purposes.
The dictionary contains two keys, both described in the Distributed Array Protocol:
- “__version__” : str
- Version of the Distributed Array Protocol used in this header.
- “dim_data” : tuple of dict
- One dictionary per array dimension; see the Distributed Array Protocol for the details of this data structure.
For repeatability and readability, the dictionary keys are sorted in alphabetic order. This is for convenience only. A writer SHOULD implement this if possible. A reader MUST NOT depend on this.
Following this header is the output of numpy.save
for the underlying
data buffer. This contains the full output of save
, beginning with
the magic number for .npy
files, followed by the .npy
header and
array data.
Notes
The .npy
format, including reasons for creating it and a comparison
of alternatives, is described fully in the “npy-format” NEP and in the
module docstring for numpy.lib.format
.
-
distarray.localapi.format.
magic
(major, minor, prefix=<MagicMock name='mock.asbytes()' id='139795813700816'>)[source]¶ Return the magic string for the given file format version.
Parameters: - major (int in [0, 255]) –
- minor (int in [0, 255]) –
- prefix (bytes) – The magic prefix to concatenate with version number
Returns: magic
Return type: bytes
Raises: ValueError
– if the version cannot be formatted.
-
distarray.localapi.format.
read_localarray
(fp)[source]¶ Read a LocalArray from an .dnpy file.
Parameters: fp (file_like object) – If this is not a real file object, then this may take extra memory and time. Returns: distbuffer – The Distributed Array Protocol structure created from the data on disk. Return type: dict Raises: ValueError
– If the data is invalid.
-
distarray.localapi.format.
read_localarray_header
(fp, version)[source]¶ Read an array header from a filelike object using the 1.0 file format version.
This will leave the file object located just after the header.
Parameters: - fp (filelike object) – A file object or something with a .read() method like a file.
- version (tuple of int) –
Returns: - __version__ (str) – Version of the Distributed Array Protocol used.
- dim_data (tuple) – A tuple containing a dictionary for each dimension of the underlying array, as described in the Distributed Array Protocol.
Raises: ValueError
– If the data is invalid.
-
distarray.localapi.format.
read_magic
(fp, prefix=<MagicMock name='mock.asbytes()' id='139795813700816'>, prefix_len=2)[source]¶ Read the magic string to get the version of the file format.
Parameters: - fp (filelike object) –
- prefix (bytes) – Magic prefix to look for
- prefix_len (int) – Number of bytes in prefix
Returns: - major (int)
- minor (int)
-
distarray.localapi.format.
write_localarray
(fp, larr, version=(1, 0))[source]¶ Write a LocalArray to a .dnpy file, including a header.
The
__version__
anddim_data
keys from the Distributed Array Protocol are written to a header, thennumpy.save
is used to write the value of thebuffer
key.Parameters: - fp (file_like object) – An open, writable file object, or similar object with a
.write()
method. - larr (LocalArray) – The array to write to disk.
- version ((int, int), optional) – The version number of the file format. Default: (1, 0)
Raises: ValueError
– If the array cannot be persisted.- Various other errors – If the underlying numpy array contains Python objects as part of its dtype, the process of pickling them may raise various errors if the objects are not picklable.
- fp (file_like object) – An open, writable file object, or similar object with a
-
distarray.localapi.format.
write_localarray_header
(fp, d, version=None)[source]¶ Write the header for a localarray and return the version used
Parameters: - fp (filelike object) –
- d (dict) – This has the appropriate entries for writing its string representation to the header of the file.
- version (tuple or None) – None means use oldest that works explicit version will raise a ValueError if the format does not allow saving this data. Default: None
Returns: version – the file version which needs to be used to store the data
Return type: tuple of int
localarray
Module¶
The LocalArray data structure.
DistArray objects are proxies for collections of LocalArray objects (that usually reside on engines).
-
class
distarray.localapi.localarray.
GlobalIndex
(distribution, ndarray)[source]¶ Bases:
object
Object which provides access to global indexing on LocalArrays.
-
class
distarray.localapi.localarray.
GlobalIterator
(arr)[source]¶ Bases:
distarray.externals.six.Iterator
-
class
distarray.localapi.localarray.
LocalArray
(distribution, dtype=None, buf=None)[source]¶ Bases:
object
Distributed memory Python arrays.
-
__array_wrap__
(obj, context=None)[source]¶ Return a LocalArray based on obj.
This method constructs a new LocalArray object using the distribution from self and the buffer from obj.
This is used to construct return arrays for ufuncs.
-
__distarray__
()[source]¶ Returns the data structure required by the DAP.
DAP = Distributed Array Protocol
See the project’s documentation for the Protocol’s specification.
-
asdist_like
(other)[source]¶ Return a version of self that has shape, dist and grid_shape like other.
-
cart_coords
¶
-
comm
¶
-
comm_rank
¶
-
comm_size
¶
-
dim_data
¶
-
dist
¶
-
dtype
¶
-
classmethod
from_distarray
(comm, obj)[source]¶ Make a LocalArray from Distributed Array Protocol data structure.
An object that supports the Distributed Array Protocol will have a __distarray__ method that returns the data structure described here:
https://github.com/enthought/distributed-array-protocol
Parameters: obj (an object with a __distarray__ method or a dict) – If a dict, it must conform to the structure defined by the distributed array protocol. Returns: A LocalArray encapsulating the buffer of the original data. No copy is made. Return type: LocalArray
-
global_shape
¶
-
global_size
¶
-
grid_shape
¶
-
itemsize
¶
-
local_data
¶
-
local_shape
¶
-
local_size
¶
-
nbytes
¶
-
ndarray
¶
-
ndim
¶
-
-
distarray.localapi.localarray.
arecompatible
(a, b)[source]¶ Do these arrays have the same compatibility hash?
-
distarray.localapi.localarray.
compact_indices
(dim_data)[source]¶ Given a dim_data structure, return a tuple of compact indices.
For every dimension in dim_data, return a representation of the indices indicated by that dim_dict; return a slice if possible, else, return the list of global indices.
Parameters: dim_data (tuple of dict) – A dict for each dimension, with the data described here: https://github.com/enthought/distributed-array-protocol we use only the indexing related keys from this structure here. Returns: index – Efficient structure usable for indexing into a numpy-array-like data structure. Return type: tuple of slices and/or lists of int
-
distarray.localapi.localarray.
empty
(distribution, dtype=<type 'float'>)[source]¶ Create an empty LocalArray.
-
distarray.localapi.localarray.
empty_like
(arr, dtype=None)[source]¶ Create an empty LocalArray with a distribution like arr.
-
distarray.localapi.localarray.
fromndarray_like
(ndarray, like_arr)[source]¶ Create a new LocalArray like like_arr with buffer set to ndarray.
-
distarray.localapi.localarray.
load_dnpy
(comm, file)[source]¶ Load a LocalArray from a
.dnpy
file.Parameters: file (file-like object or str) – The file to read. It must support seek()
andread()
methods.Returns: result – A LocalArray encapsulating the data loaded. Return type: LocalArray
-
distarray.localapi.localarray.
load_hdf5
(comm, filename, dim_data, key='buffer')[source]¶ Load a LocalArray from an
.hdf5
file.Parameters: - filename (str) – The filename to read.
- dim_data (tuple of dict) – A dict for each dimension, with the data described here: https://github.com/enthought/distributed-array-protocol, describing which portions of the HDF5 file to load into this LocalArray, and with what metadata.
- comm (MPI comm object) –
- key (str, optional) – The identifier for the group to load the LocalArray from (the default is ‘buffer’).
Returns: result – A LocalArray encapsulating the data loaded.
Return type: Note
For dim_data dimension dictionaries containing unstructured (‘u’) distribution types, the indices selected by the ‘indices’ key must be in increasing order. This is a limitation of h5py / hdf5.
-
distarray.localapi.localarray.
load_npy
(comm, filename, dim_data)[source]¶ Load a LocalArray from a
.npy
file.Parameters: - filename (str) – The file to read.
- dim_data (tuple of dict) – A dict for each dimension, with the data described here: https://github.com/enthought/distributed-array-protocol, describing which portions of the HDF5 file to load into this LocalArray, and with what metadata.
- comm (MPI comm object) –
Returns: result – A LocalArray encapsulating the data loaded.
Return type:
-
distarray.localapi.localarray.
local_reduction
(out_comm, reducer, larr, ddpr, dtype, axes)[source]¶ Entry point for reductions on local arrays.
Parameters: - reducer (callable) – Performs the core reduction operation.
- out_comm (MPI Comm instance.) – The MPI communicator for the result of the reduction. Is equal to MPI.COMM_NULL when this rank is not part of the output communicator.
- larr (LocalArray) – Input. Defined for all ranks.
- ddpr (sequence of dim-data dictionaries.) –
- axes (Sequence of ints or None.) –
Returns: When out_comm == MPI.COMM_NULL, returns None. Otherwise, returns the LocalArray section of the reduction result.
Return type: LocalArray or None
-
distarray.localapi.localarray.
max_reducer
(reduce_comm, larr, out, axes, dtype)[source]¶ Core reduction function for max.
-
distarray.localapi.localarray.
mean_reducer
(reduce_comm, larr, out, axes, dtype)[source]¶ Core reduction function for mean.
-
distarray.localapi.localarray.
min_reducer
(reduce_comm, larr, out, axes, dtype)[source]¶ Core reduction function for min.
-
distarray.localapi.localarray.
ones
(distribution, dtype=<type 'float'>)[source]¶ Create a LocalArray filled with ones.
-
distarray.localapi.localarray.
save_dnpy
(file, arr)[source]¶ Save a LocalArray to a
.dnpy
file.Parameters: - file (file-like object or str) – The file or filename to which the data is to be saved.
- arr (LocalArray) – Array to save to a file.
-
distarray.localapi.localarray.
save_hdf5
(filename, arr, key='buffer', mode='a')[source]¶ Save a LocalArray to a dataset in an
.hdf5
file.Parameters: - filename (str) – Name of file to write to.
- arr (LocalArray) – Array to save to a file.
- key (str, optional) – The identifier for the group to save the LocalArray to (the default is ‘buffer’).
- mode (optional, {‘w’, ‘w-‘, ‘a’}, default ‘a’) –
'w'
- Create file, truncate if exists
'w-'
- Create file, fail if exists
'a'
- Read/write if exists, create otherwise (default)
-
distarray.localapi.localarray.
set_printoptions
(precision=None, threshold=None, edgeitems=None, linewidth=None, suppress=None)[source]¶
-
distarray.localapi.localarray.
std_reducer
(reduce_comm, larr, out, axes, dtype)[source]¶ Core reduction function for std.
-
distarray.localapi.localarray.
sum_reducer
(reduce_comm, larr, out, axes, dtype)[source]¶ Core reduction function for sum.
-
distarray.localapi.localarray.
var_reducer
(reduce_comm, larr, out, axes, dtype)[source]¶ Core reduction function for var.
maps
Module¶
Classes to manage the distribution-specific aspects of a LocalArray.
The Distribution class is the main entry point and is meant to be used by LocalArrays to help translate between local and global index spaces. It manages ndim one-dimensional map objects.
The one-dimensional map classes BlockMap, CyclicMap, BlockCyclicMap, and UnstructuredMap all manage the mapping tasks for their particular dimension. All are subclasses of MapBase. The reason for the several subclasses is to allow more compact and efficient operations.
-
class
distarray.localapi.maps.
BlockCyclicMap
(global_size, grid_size, grid_rank, start, block_size)[source]¶ Bases:
distarray.localapi.maps.MapBase
One-dimensional block cyclic map class.
-
dim_dict
¶
-
dist
= 'c'¶
-
global_iter
¶
-
global_slice
¶ Return a slice representing the global index space of this dimension; only possible for block_size == 1.
-
size
¶
-
-
class
distarray.localapi.maps.
BlockMap
(global_size, grid_size, grid_rank, start, stop)[source]¶ Bases:
distarray.localapi.maps.MapBase
One-dimensional block map class.
-
dim_dict
¶
-
dist
= 'b'¶
-
global_iter
¶
-
global_slice
¶ Return a slice representing the global index space of this dimension.
-
size
¶
-
-
class
distarray.localapi.maps.
CyclicMap
(global_size, grid_size, grid_rank, start)[source]¶ Bases:
distarray.localapi.maps.MapBase
One-dimensional cyclic map class.
-
dim_dict
¶
-
dist
= 'c'¶
-
global_iter
¶
-
global_slice
¶ Return a slice representing the global index space of this dimension.
-
size
¶
-
-
class
distarray.localapi.maps.
Distribution
(comm, dim_data)[source]¶ Bases:
object
Multi-dimensional Map class.
Manages one or more one-dimensional map classes.
-
cart_coords
¶
-
comm_rank
¶
-
comm_size
¶
-
dim_data
¶
-
dist
¶
-
classmethod
from_shape
(comm, shape, dist=None, grid_shape=None)[source]¶ Create a Distribution from a shape and optional arguments.
-
global_shape
¶
-
global_size
¶
-
global_slice
¶ Return a slice representing the global index space of this dimension.
-
grid_shape
¶
-
local_shape
¶
-
local_size
¶
-
ndim
¶
-
-
class
distarray.localapi.maps.
MapBase
[source]¶ Bases:
object
Base class for all one dimensional Map classes.
mpiutils
Module¶
Entry point for MPI.
-
distarray.localapi.mpiutils.
create_comm_of_size
(size=4)[source]¶ Create a subcommunicator of COMM_PRIVATE of given size.
proxyize
Module¶
random
Module¶
Pseudo-random number generation routines for local arrays.
This module provides a number of routines for generating random numbers, from a variety of probability distributions.
-
distarray.localapi.random.
beta
(a, b, distribution=None)[source]¶ Return an array with random numbers from the beta probability distribution.
Parameters: - a (float) – Parameter that describes the beta probability distribution.
- b (float) – Parameter that describes the beta probability distribution.
- distribution (The desired distribution of the array.) – If None, then a normal NumPy array is returned. Otherwise, a LocalArray with this distribution is returned.
Returns: Return type: An array with random numbers.
-
distarray.localapi.random.
label_state
(comm)[source]¶ Label/personalize the random generator state for the local rank.
This ensures that each separate engine, when using the same global seed, will generate a different sequence of pseudo-random numbers.
-
distarray.localapi.random.
normal
(loc=0.0, scale=1.0, distribution=None)[source]¶ Return an array with random numbers from a normal (Gaussian) probability distribution.
Parameters: - loc (float) – The mean (or center) of the probability distribution.
- scale (float) – The standard deviation (or width) of the probability distribution.
- distribution (The desired distribution of the array.) – If None, then a normal NumPy array is returned. Otherwise, a LocalArray with this distribution is returned.
Returns: Return type: An array with random numbers.
-
distarray.localapi.random.
rand
(distribution=None)[source]¶ Return an array with random numbers distributed over the interval [0, 1).
Parameters: distribution (The desired distribution of the array.) – If None, then a normal NumPy array is returned. Otherwise, a LocalArray with this distribution is returned. Returns: Return type: An array with random numbers.
-
distarray.localapi.random.
randint
(low, high=None, distribution=None)[source]¶ Return random integers from low (inclusive) to high (exclusive).
Return random integers from the “discrete uniform” distribution in the “half-open” interval [low, high). If high is None (the default), then results are from [0, low).
Parameters: - low (int) – Lowest (signed) integer to be drawn from the distribution (unless high=None, in which case this parameter is the highest such integer).
- high (int, optional) – If provided, one above the largest (signed) integer to be drawn from the distribution (see above for behavior if high=None).
- distribution (The desired distribution of the array.) – If None, then a normal NumPy array is returned. Otherwise, a LocalArray with this distribution is returned.
Returns: Return type: An array with random numbers.
-
distarray.localapi.random.
randn
(distribution=None)[source]¶ Return a sample (or samples) from the “standard normal” distribution.
Parameters: distribution (The desired distribution of the array.) – If None, then a normal NumPy array is returned. Otherwise, a LocalArray with this distribution is returned. Returns: Return type: An array with random numbers.