{ "cells": [ { "cell_type": "markdown", "id": "2b8e72a7-129c-422c-b955-350fb9ee0541", "metadata": { "tags": [] }, "source": [ "# Tutorial: SOMA Experiment queries" ] }, { "cell_type": "code", "execution_count": 1, "id": "3a5fd5d3", "metadata": { "tags": [] }, "outputs": [], "source": [ "import tiledbsoma as soma" ] }, { "cell_type": "markdown", "id": "ccc8709a", "metadata": { "tags": [] }, "source": [ "In this notebook, we'll take a quick look at the SOMA experiment-query API. The dataset used is from Peripheral Blood Mononuclear Cells (PBMC), which is freely available from 10X Genomics.\n" ] }, { "cell_type": "markdown", "id": "2472cd1a-2d49-4268-9b9b-1bed49ccfa1b", "metadata": { "tags": [] }, "source": [ "First we'll unpack and open the experiment:" ] }, { "cell_type": "code", "execution_count": 2, "id": "c70b2d82-2012-481c-a7a6-5b574de69241", "metadata": { "tags": [] }, "outputs": [], "source": [ "import tarfile\n", "import tempfile\n", "\n", "sparse_uri = tempfile.mktemp()\n", "with tarfile.open(\"data/pbmc3k-sparse.tgz\") as handle:\n", " handle.extractall(sparse_uri)\n", "exp = soma.Experiment.open(sparse_uri)" ] }, { "cell_type": "markdown", "id": "fab7898c", "metadata": { "tags": [] }, "source": [ "Using the keys of the `obs` dataframe, we can see what fields are available to query on." ] }, { "cell_type": "code", "execution_count": 3, "id": "d67dfbc6-0382-4acc-8c56-3670549654f8", "metadata": { "tags": [] }, "outputs": [ { "data": { "text/plain": [ "('soma_joinid', 'obs_id', 'n_genes', 'percent_mito', 'n_counts', 'louvain')" ] }, "execution_count": 3, "metadata": {}, "output_type": "execute_result" } ], "source": [ "exp.obs.keys()" ] }, { "cell_type": "code", "execution_count": 4, "id": "9e4ede09-2303-4c21-92c1-bf42ed4e7dd1", "metadata": { "tags": [] }, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
louvain
0CD4 T cells
1B cells
2CD4 T cells
3CD14+ Monocytes
4NK cells
......
2633CD14+ Monocytes
2634B cells
2635B cells
2636B cells
2637CD4 T cells
\n", "

2638 rows × 1 columns

\n", "
" ], "text/plain": [ " louvain\n", "0 CD4 T cells\n", "1 B cells\n", "2 CD4 T cells\n", "3 CD14+ Monocytes\n", "4 NK cells\n", "... ...\n", "2633 CD14+ Monocytes\n", "2634 B cells\n", "2635 B cells\n", "2636 B cells\n", "2637 CD4 T cells\n", "\n", "[2638 rows x 1 columns]" ] }, "execution_count": 4, "metadata": {}, "output_type": "execute_result" } ], "source": [ "p = exp.obs.read(column_names=['louvain']).concat().to_pandas()\n", "p" ] }, { "cell_type": "markdown", "id": "f305fb7c", "metadata": { "tags": [] }, "source": [ "Focusing on the `louvain` column, we can now find out what column values are present in the data." ] }, { "cell_type": "code", "execution_count": 5, "id": "00f1ccad-3ee2-4947-8961-8bf9642fbbba", "metadata": { "tags": [] }, "outputs": [ { "name": "stderr", "output_type": "stream", "text": [ "/var/folders/7l/_wsjyk5d4p3dz3kbz7wxn7t00000gn/T/ipykernel_27669/1931588187.py:1: FutureWarning: The default of observed=False is deprecated and will be changed to True in a future version of pandas. Pass observed=False to retain current behavior or observed=True to adopt the future default and silence this warning.\n", " p.groupby('louvain').size().sort_values()\n" ] }, { "data": { "text/plain": [ "louvain\n", "Megakaryocytes 15\n", "Dendritic cells 37\n", "FCGR3A+ Monocytes 150\n", "NK cells 154\n", "CD8 T cells 316\n", "B cells 342\n", "CD14+ Monocytes 480\n", "CD4 T cells 1144\n", "dtype: int64" ] }, "execution_count": 5, "metadata": {}, "output_type": "execute_result" } ], "source": [ "p.groupby('louvain').size().sort_values()" ] }, { "cell_type": "markdown", "id": "fda99535", "metadata": { "tags": [] }, "source": [ "Now we can query the SOMA experiment -- here, by a few cell types." ] }, { "cell_type": "code", "execution_count": 6, "id": "e2ed76ca-5821-44c5-a220-ff96568686ec", "metadata": { "tags": [] }, "outputs": [], "source": [ "obs_query = soma.AxisQuery(value_filter='louvain in [\"B cells\", \"NK cells\"]')" ] }, { "cell_type": "code", "execution_count": 7, "id": "f3af70bc-3817-453c-a18c-56dc9aa874da", "metadata": { "tags": [] }, "outputs": [], "source": [ "query = exp.axis_query(\"RNA\", obs_query=obs_query)" ] }, { "cell_type": "markdown", "id": "fb94d898", "metadata": { "tags": [] }, "source": [ "Note that the query output is smaller than the original dataset's size -- since we've queried for only a particular pair of cell types." ] }, { "cell_type": "code", "execution_count": 8, "id": "2c60568b-0789-4dbf-aff9-4bea2860aef4", "metadata": { "tags": [] }, "outputs": [ { "data": { "text/plain": [ "[2638, 1838]" ] }, "execution_count": 8, "metadata": {}, "output_type": "execute_result" } ], "source": [ "[exp.obs.count, exp.ms[\"RNA\"].var.count]" ] }, { "cell_type": "code", "execution_count": 9, "id": "28ed8d40-36c5-4642-bd8f-53d35c3074f0", "metadata": { "tags": [] }, "outputs": [ { "data": { "text/plain": [ "[496, 1838]" ] }, "execution_count": 9, "metadata": {}, "output_type": "execute_result" } ], "source": [ "[query.n_obs, query.n_vars]" ] }, { "cell_type": "markdown", "id": "c9625771", "metadata": { "tags": [] }, "source": [ "Here we can take a look at the X data." ] }, { "cell_type": "code", "execution_count": 10, "id": "65063167-5015-497a-9712-d72c0ecac2ed", "metadata": { "tags": [] }, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
soma_dim_0soma_dim_1soma_data
010-0.214582
111-0.372653
212-0.054804
313-0.683391
4140.633951
............
91164326361833-0.149789
91164426361834-0.325824
91164526361835-0.005918
91164626361836-0.135213
91164726361837-0.482111
\n", "

911648 rows × 3 columns

\n", "
" ], "text/plain": [ " soma_dim_0 soma_dim_1 soma_data\n", "0 1 0 -0.214582\n", "1 1 1 -0.372653\n", "2 1 2 -0.054804\n", "3 1 3 -0.683391\n", "4 1 4 0.633951\n", "... ... ... ...\n", "911643 2636 1833 -0.149789\n", "911644 2636 1834 -0.325824\n", "911645 2636 1835 -0.005918\n", "911646 2636 1836 -0.135213\n", "911647 2636 1837 -0.482111\n", "\n", "[911648 rows x 3 columns]" ] }, "execution_count": 10, "metadata": {}, "output_type": "execute_result" } ], "source": [ "query.X(\"data\").tables().concat().to_pandas()" ] }, { "cell_type": "markdown", "id": "db7af8b8", "metadata": { "tags": [] }, "source": [ "To finish out this introductory look at the experiment-query API, we can convert our query outputs to AnnData format." ] }, { "cell_type": "code", "execution_count": 11, "id": "1ed8510b-343a-4f88-8aae-11a5c2069311", "metadata": { "tags": [] }, "outputs": [], "source": [ "adata = query.to_anndata(X_name=\"data\")" ] }, { "cell_type": "code", "execution_count": 12, "id": "b3118504-8c92-48d4-9b83-87176960e4f1", "metadata": { "tags": [] }, "outputs": [ { "data": { "text/plain": [ "AnnData object with n_obs × n_vars = 496 × 1838\n", " obs: 'soma_joinid', 'obs_id', 'n_genes', 'percent_mito', 'n_counts', 'louvain'\n", " var: 'soma_joinid', 'var_id', 'n_cells'" ] }, "execution_count": 12, "metadata": {}, "output_type": "execute_result" } ], "source": [ "adata" ] }, { "cell_type": "code", "execution_count": null, "id": "ddd6e682-5c54-47cb-8c54-f0d03a5b6567", "metadata": {}, "outputs": [], "source": [] } ], "metadata": { "kernelspec": { "display_name": "Python 3 (ipykernel)", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.11.6" } }, "nbformat": 4, "nbformat_minor": 5 }