tiledbsoma.DataFrame.read

DataFrame.read(coords: Sequence[None | bytes | Slice[bytes] | Sequence[bytes] | float | Slice[float] | Sequence[float] | int | Slice[int] | Sequence[int] | slice | Slice[slice] | Sequence[slice] | str | Slice[str] | Sequence[str] | datetime64 | Slice[datetime64] | Sequence[datetime64] | TimestampType | Slice[TimestampType] | Sequence[TimestampType] | Array | ChunkedArray | ndarray[Any, dtype[integer]] | ndarray[Any, dtype[datetime64]]] = (), column_names: Sequence[str] | None = None, *, result_order: ResultOrder | Literal['auto', 'row-major', 'column-major'] = ResultOrder.AUTO, value_filter: str | None = None, batch_size: BatchSize = BatchSize(count=None, bytes=None), partitions: ReadPartitions | None = None, platform_config: Dict[str, Mapping[str, Any]] | object | None = None) TableReadIter

Reads a user-defined subset of data, addressed by the dataframe indexing columns, optionally filtered, and return results as one or more Arrow tables.

Parameters:
  • coords – For each index dimension, which rows to read. Defaults to None, meaning no constraint – all IDs.

  • column_names – The named columns to read and return. Defaults to None, meaning no constraint – all column names.

  • result_order – Order of read results. This can be one of ‘row-major’, ‘col-major’, or ‘auto’.

  • value_filter – An optional [value filter] to apply to the results. Defaults to no filter.

  • partitions – An optional ReadPartitions hint to indicate how results should be organized.

Returns:

A TableReadIter that can be used to iterate through the result set.

Raises:
  • SOMAError – If value_filter can not be parsed.

  • ValueError – If coords are malformed or do not index this DataFrame.

  • SOMAError – If the object is not open for reading.

Notes

The coords parameter will support, per dimension: a list of values of the type of the indexed column.

Acceptable ways to index:

  • A sequence of coordinates is accepted, one per dimension.

  • Sequence length must be <= number of dimensions.

  • If the sequence contains missing coordinates (length less than number of dimensions), then slice(None) – i.e. no constraint – is assumed for the missing dimensions.

  • Per-dimension, explicitly specified coordinates can be one of: None, a value, a list/numpy.ndarray/pyarrow.Array/etc of values, a slice, etc.

  • Slices are doubly inclusive: slice(2,4) means [2,3,4] not [2,3]. Slice steps are not supported. Slices can be slice(None), meaning select all in that dimension, and may be half-specified, e.g. slice(2,None) or slice(None,4).

  • Negative indexing is unsupported.

Lifecycle

Maturing.