tiledbsoma.DataFrame¶
- class tiledbsoma.DataFrame(handle: _WrapperType_co | DataFrameWrapper | DenseNDArrayWrapper | SparseNDArrayWrapper, *, _dont_call_this_use_create_or_open_instead: str = 'unset')¶
DataFrame
is a multi-column table with a user-defined schema. The schema is expressed as an Arrow Schema, and defines the column names and value types.Every
DataFrame
must contain a column calledsoma_joinid
, of typeint64
, with negative values explicitly disallowed. Thesoma_joinid
column contains a unique value for each row in the dataframe, and in some cases (e.g., as part of anExperiment
), acts as a join key for other objects, such asSparseNDArray
.Lifecycle
Maturing.
Examples
>>> import pyarrow as pa >>> import tiledbsoma >>> schema = pa.schema( ... [ ... ("soma_joinid", pa.int64()), ... ("A", pa.float32()), ... ("B", pa.large_string()), ... ] ... ) >>> with tiledbsoma.DataFrame.create("./test_dataframe", schema=schema) as df: ... data = pa.Table.from_pydict( ... { ... "soma_joinid": [0, 1, 2], ... "A": [1.0, 2.7182, 3.1214], ... "B": ["one", "e", "pi"], ... } ... ) ... df.write(data) >>> with tiledbsoma.DataFrame.open("./test_dataframe") as df: ... print(df.schema) ... print("---") ... print(df.read().concat().to_pandas()) ... soma_joinid: int64 A: float B: large_string --- soma_joinid A B 0 0 1.0000 one 1 1 2.7182 e 2 2 3.1214 pi
>>> import pyarrow as pa >>> import tiledbsoma >>> schema = pa.schema( ... [ ... ("soma_joinid", pa.int64()), ... ("A", pa.float32()), ... ("B", pa.large_string()), ... ] ...) >>> with tiledbsoma.DataFrame.create( ... "./test_dataframe_2", ... schema=schema, ... index_column_names=["A", "B"], ... domain=[(0.0, 10.0), None], ... ) as df: ... data = pa.Table.from_pydict( ... { ... "soma_joinid": [0, 1, 2], ... "A": [1.0, 2.7182, 3.1214], ... "B": ["one", "e", "pi"], ... } ... ) ... df.write(data) >>> with tiledbsoma.DataFrame.open("./test_dataframe_2") as df: ... print(df.schema) ... print("---") ... print(df.read().concat().to_pandas()) soma_joinid: int64 --- A B soma_joinid 0 1.0000 one 0 1 2.7182 e 1 2 3.1214 pi 2
Here the index-column names are specified. The domain is entirely optional: if it’s omitted, defaults will be applied yielding the largest possible domain for each index column’s datatype. If the domain is specified, it must be a tuple/list of equal length to
index_column_names
. It can beNone
in a given slot, meaning use the largest possible domain. For string/bytes types, it must beNone
.- __init__(handle: _WrapperType_co | DataFrameWrapper | DenseNDArrayWrapper | SparseNDArrayWrapper, *, _dont_call_this_use_create_or_open_instead: str = 'unset')¶
Internal-only common initializer steps.
This function is internal; users should open TileDB SOMA objects using the
create()
andopen()
factory class methods.
Methods
__init__
(handle, *[, ...])Internal-only common initializer steps.
exists
(uri[, context, tiledb_timestamp])Finds whether an object of this type exists at the given URI.
create
(uri, *, schema[, index_column_names, ...])Creates the data structure on disk/S3/cloud.
open
(uri[, mode, tiledb_timestamp, context, ...])Opens this specific type of SOMA object.
reopen
(mode[, tiledb_timestamp])Return a new copy of the SOMAObject with the given mode at the current Unix timestamp.
close
()Release any resources held while the object is open.
read
([coords, column_names, result_order, ...])Reads a user-defined subset of data, addressed by the dataframe indexing columns, optionally filtered, and return results as one or more Arrow tables.
write
(values[, platform_config])Writes an Arrow table to the persistent object.
Raises an error if the object is not open for writing.
keys
()Returns the names of the columns when read back as a dataframe.
tiledbsoma_upgrade_domain
(newdomain[, ...])Allows you to set the domain of a SOMA
DataFrame
, when theDataFrame
does not have a domain set yet.change_domain
(newdomain[, check_only])Allows you to enlarge the domain of a SOMA
DataFrame
, when theDataFrame
already has a domain.tiledbsoma_resize_soma_joinid_shape
(newshape)Increases the shape of the dataframe on the
soma_joinid
index column, if it indeed is an index column, leaving all other index columns as-is.tiledbsoma_upgrade_soma_joinid_shape
(newshape)This is like
upgrade_domain
, but it only applies the specified domain update to thesoma_joinid
index column.Retrieves the non-empty domain for each dimension, namely the smallest and largest indices in each dimension for which the array/dataframe has data occupied.
Returns metadata about the array that is not encompassed within the Arrow Schema, in the form of a PlatformConfig (deprecated).
Attributes
Accessor for the object's storage URI.
A string describing the SOMA type of this object.
Returns data schema, in the form of an Arrow Schema.
Returns index (dimension) column names.
Returns the number of rows in the dataframe.
Returns tuples of minimum and maximum values, one tuple per index column, currently storable on each index column of the dataframe.
Returns tuples of minimum and maximum values, one tuple per index column, to which the dataframe can have its domain resized.
Returns true if the array has the upgraded resizeable domain feature from TileDB-SOMA 1.15: the array was created with this support, or it has had
tiledbsoma_upgrade_domain
applied to it.The mode this object was opened in, either
r
orw
.True if the object has been closed.
A value storing implementation-specific configuration information.
The time that this object was opened in UTC.
The time this object was opened, as millis since the Unix epoch.
The metadata of this SOMA object.