tiledbsoma.SparseNDArrayRead.blockwise

SparseNDArrayRead.blockwise(axis: int | Sequence[int], *, size: int | Sequence[int] | None = None, reindex_disable_on_axis: int | Sequence[int] | None = None, eager: bool = True) SparseNDArrayBlockwiseRead

Returns an intermediate type to choose a blockwise iterator of a specific format.

Blockwise iterators yield results grouped by a user-specified axis. For example, a blockwise iterator with axis=0 will yield results containing all coordinates for a given “row” in the array, regardless of the read result_order (i.e., the sort order).

Blockwise iterators yield an array “block” in some user-specified format, as well as a list of coordinates contained in the individual block.

All blockwise iterators will reindex coordinates (i.e., map them from soma_joinid to an integer in the range [0, N)), unless reindexing is specifically disabled for that axis, using the reindex_disable_on_axis argument. When reindexing: * the primary iterator axis coordinates, as indicated by the axis argument, will be reindexed into the range

[0, N), where N is the number of coordinates read for the block (controlled with the size argument).

  • all other axes will be reindexed to [0, M), where M is the number of points read on that axis across all blocks.

Parameters:
  • axis – Required. The axis across which to yield blocks, indicated as the dimension number, e.g., axis=0 will step across soma_dim_0 (the first dimension).

  • size – Optional. Number of coordinates in each block yielded by the iterator. A reasonable default will be provided if the argument is omitted. Current defaults are 2^16 for dimension 0 and 2^8 for all other dimensions. Defaults are subject to change and will likely remain relatively small.

  • reindex_disable_on_axis – Optional. Axis or sequence of axes which will _not_ be reindexed. Defaults to None, indicating all axes will be reindexed.

  • eager – Optional. If True, the iterator will read ahead (using multi-threading) to improve overall performance when iterating over a large result. Setting this flag to False will reduce memory consumption, at the cost of additional processing time.

Examples

A simple example iterating over the first 10000 elements of the first dimension, into blocks of SciPy sparse matrices:

>>> import tiledbsoma
>>> with tiledbsoma.open("a_sparse_nd_array") as X:
...     for (obs_coords, var_coords), matrix in X.read(
...         coords=(slice(9999),)
...     ).blockwise(
...         axis=0, size=4999
...     ).scipy():
...         print(repr(matrix))
<4999x60664 sparse matrix of type '<class 'numpy.float32'>'
        with 11509741 stored elements in Compressed Sparse Row format>
<4999x60664 sparse matrix of type '<class 'numpy.float32'>'
        with 13760197 stored elements in Compressed Sparse Row format>
<2x60664 sparse matrix of type '<class 'numpy.float32'>'
        with 3417 stored elements in Compressed Sparse Row format>

To stride over the second dimension, returning a CSC matrix, specify blockwise(axis=1). To iterate over COO matrices, on either axis, specify scipy(compress=False).

Lifecycle

Maturing.