tiledbsoma.DataFrame.create¶
- classmethod DataFrame.create(uri: str, *, schema: Schema, index_column_names: Sequence[str] = ('soma_joinid',), domain: Sequence[None | Tuple[Any, Any] | List[Any]] | None = None, platform_config: Dict[str, Mapping[str, Any]] | object | None = None, context: SOMATileDBContext | None = None, tiledb_timestamp: int | datetime | None = None) DataFrame ¶
Creates the data structure on disk/S3/cloud.
- Parameters:
schema – Arrow schema defining the per-column schema. This schema must define all columns, including columns to be named as index columns. If the schema includes types unsupported by the SOMA implementation, an error will be raised.
index_column_names – A list of column names to use as user-defined index columns (e.g.,
['cell_type', 'tissue_type']
). All named columns must exist in the schema, and at least one index column name is required.domain – An optional sequence of tuples specifying the domain of each index column. Each tuple must be a pair consisting of the minimum and maximum values storable in the index column. For example, if there is a single int64-valued index column, then
domain
might be[(100, 200)]
to indicate that values between 100 and 200, inclusive, can be stored in that column. If provided, this sequence must have the same length asindex_column_names
, and the index-column domain will be as specified. If omitted entirely, or ifNone
in a given dimension, the corresponding index-column domain will use an empty range, and data writes after that will fail with “A range was set outside of the current domain”. Unless you have a particular reason not to, you should always provide the desired domain at create time: this is an optional but strongly recommended parameter. See alsochange_domain
which allows you to expand the domain after create.platform_config – Platform-specific options used to create this array. This may be provided as settings in a dictionary, with options located in the
{'tiledb': {'create': ...}}
key, or as aTileDBCreateOptions
object.tiledb_timestamp – If specified, overrides the default timestamp used to open this object. If unset, uses the timestamp provided by the context.
- Returns:
The DataFrame.
- Raises:
TypeError – If the
schema
parameter specifies an unsupported type, or ifindex_column_names
specifies a non-indexable column.ValueError – If the
index_column_names
is malformed or specifies an undefined column name.ValueError – If the
schema
specifies illegal column names.tiledbsoma.AlreadyExistsError – If the underlying object already exists at the given URI.
tiledbsoma.NotCreateableError – If the URI is malformed for a particular storage backend.
TileDBError – If unable to create the underlying object.
Examples
>>> df = pd.DataFrame(data={"soma_joinid": [0, 1], "col1": ["a", "b"]}) ... with tiledbsoma.DataFrame.create( ... "a_dataframe", schema=pa.Schema.from_pandas(df) ... ) as soma_df: ... soma_df.write(pa.Table.from_pandas(df, preserve_index=False)) ... >>> with tiledbsoma.open("a_dataframe") as soma_df: ... a_df = soma_df.read().concat().to_pandas() ... >>> a_df soma_joinid col1 0 0 a 1 1 b
Lifecycle
Maturing.