tiledbsoma.io.to_anndata¶

tiledbsoma.io.to_anndata(experiment: ~tiledbsoma._experiment.Experiment, measurement_name: str, *, X_layer_name: str | ~tiledbsoma._util.Sentinel | None = <tiledbsoma._util.Sentinel object>, extra_X_layer_names: ~typing.Sequence[str] | ~typing.KeysView[str] | None = None, obs_id_name: str | None = None, var_id_name: str | None = None, obsm_varm_width_hints: dict[str, dict[str, int]] | None = None, uns_keys: ~typing.Sequence[str] | None = None, dask: ~tiledbsoma._dask.util.SOMADaskConfig | None = None) → AnnData¶

Converts the experiment group to AnnData format.

The choice of matrix formats is following what we often see in input .h5ad files:

X as scipy.sparse.csr_matrix
obs, var as pandas.dataframe
obsm, varm arrays as numpy.ndarray
obsp, varp arrays as scipy.sparse.csr_matrix

The X_layer_name is the name of the TileDB-SOMA measurement’s X collection which will be outgested to the resulting AnnData object’s adata.X. If X_layer_name is unspecified, and the Measurement contains an X layer named “data”, it will be returned. If X_layer_name is None, then the return value’s adata.X will be None, and adata.layers will be unpopulated. If X_layer_name is a string, then adata.X will be taken from this layer name within the input measurement, and it will be an error if the measurement’s X does not contain that layer name.

The extra_X_layer_names are used to specify how the output adata object’s adata.layers is populated. The default behavior – extra_X_layer_names being None – means that adata.layers will be empty. If extra_X_layer_names is a provided list these will be used for populating adata.layers. If you want all the layers to be outgested, without having to name them individually, you can use extra_X_layer_names=experiment.ms[measurement_name].X.keys(). To make this low-friction for you, we introduce one more feature: we’ll ignore X_layer_name when populating adata.layers. For example, if X keys are "a", "b", "c", "d", and you say X_layer_name="b" and extra_X_layer_names=experiment.ms[measurement_name].X.keys(), we’ll not write "b" to adata.layers.

The obs_id_name and var_id_name are columns within the TileDB-SOMA experiment which will become index names within the resulting AnnData object’s obs/var dataframes. If not specified as arguments, the TileDB-SOMA’s dataframes will be checked for an original-index-name key. When that also is unavailable, these default to "obs_id" and "var_id", respectively.

The obsm_varm_width_hints is optional. If provided, it should be of the form {"obsm":{"X_tSNE":2}} to aid with export errors.

If uns_keys is provided, only the specified top-level uns keys are extracted. The default is to extract them all. Use uns_keys=[] to not outgest any uns keys.

If dask is present, the X matrix is returned as a Dask array, and the dask configs apply to that conversion and resulting array (lifecycle: experimental).

Lifecycle

Maturing.