tiledbsoma.SparseNDArrayRead.blockwise¶
- SparseNDArrayRead.blockwise(axis: int | Sequence[int], *, size: int | Sequence[int] | None = None, reindex_disable_on_axis: int | Sequence[int] | None = None, eager: bool = True) SparseNDArrayBlockwiseRead ¶
Returns an intermediate type to choose a blockwise iterator of a specific format.
Blockwise iterators yield results grouped by a user-specified axis. For example, a blockwise iterator with axis=0 will yield results containing all coordinates for a given “row” in the array, regardless of the read result_order (i.e., the sort order).
Blockwise iterators yield an array “block” in some user-specified format, as well as a list of coordinates contained in the individual block.
All blockwise iterators will reindex coordinates (i.e., map them from soma_joinid to an integer in the range [0, N)), unless reindexing is specifically disabled for that axis, using the reindex_disable_on_axis argument. When reindexing: * the primary iterator axis coordinates, as indicated by the axis argument, will be reindexed into the range
[0, N), where N is the number of coordinates read for the block (controlled with the size argument).
all other axes will be reindexed to [0, M), where M is the number of points read on that axis across all blocks.
- Parameters:
axis – Required. The axis across which to yield blocks, indicated as the dimension number, e.g., axis=0 will step across soma_dim_0 (the first dimension).
size – Optional. Number of coordinates in each block yielded by the iterator. A reasonable default will be provided if the argument is omitted. Current defaults are 2^16 for dimension 0 and 2^8 for all other dimensions. Defaults are subject to change and will likely remain relatively small.
reindex_disable_on_axis – Optional. Axis or sequence of axes which will _not_ be reindexed. Defaults to None, indicating all axes will be reindexed.
eager – Optional. If True, the iterator will read ahead (using multi-threading) to improve overall performance when iterating over a large result. Setting this flag to False will reduce memory consumption, at the cost of additional processing time.
Examples
A simple example iterating over the first 10000 elements of the first dimension, into blocks of SciPy sparse matrices:
>>> import tiledbsoma >>> with tiledbsoma.open("a_sparse_nd_array") as X: ... for (obs_coords, var_coords), matrix in X.read( ... coords=(slice(9999),) ... ).blockwise( ... axis=0, size=4999 ... ).scipy(): ... print(repr(matrix)) <4999x60664 sparse matrix of type '<class 'numpy.float32'>' with 11509741 stored elements in Compressed Sparse Row format> <4999x60664 sparse matrix of type '<class 'numpy.float32'>' with 13760197 stored elements in Compressed Sparse Row format> <2x60664 sparse matrix of type '<class 'numpy.float32'>' with 3417 stored elements in Compressed Sparse Row format>
To stride over the second dimension, returning a CSC matrix, specify blockwise(axis=1). To iterate over COO matrices, on either axis, specify scipy(compress=False).
Lifecycle
Maturing.