Multi-modal#
Warning
This is, for now, just a stub.
Here, weβll showcase how to curate and register ECCITE-seq data from Papalexi21 in the form of MuData objects. ECCITE-seq is designed to enable interrogation of single-cell transcriptomes together with surface protein markers in the context of CRISPR screens.
Setup#
!lamin init --storage ./test-multimodal --schema bionty
Show code cell output
π‘ connected lamindb: testuser1/test-multimodal
import lamindb as ln
import bionty as bt
bt.settings.organism = "human"
π‘ connected lamindb: testuser1/test-multimodal
ln.transform.stem_uid = "yMWSFirS6qv2"
ln.transform.version = "0"
ln.track()
π‘ notebook imports: bionty==0.42.3 lamindb==0.69.2
π‘ saved: Transform(uid='yMWSFirS6qv26K79', name='Multi-modal', key='multimodal', version='0', type=notebook, updated_at=2024-03-26 12:05:52 UTC, created_by_id=1)
π‘ saved: Run(uid='1FZGtXME9dGnVm5BkLh6', transform_id=1, created_by_id=1)
Papalexi21#
Letβs use a MuData object:
Transform #
Show code cell content
mdata = ln.core.datasets.mudata_papalexi21_subset()
mdata
MuData object with n_obs Γ n_vars = 200 Γ 300 var: 'name' 4 modalities rna: 200 x 173 obs: 'orig.ident', 'nCount_RNA', 'nFeature_RNA', 'nCount_HTO', 'nFeature_HTO', 'nCount_GDO', 'nCount_ADT', 'nFeature_ADT', 'percent.mito', 'MULTI_ID', 'HTO_classification', 'guide_ID', 'gene_target', 'NT', 'perturbation', 'replicate', 'S.Score', 'G2M.Score', 'Phase' var: 'name' adt: 200 x 4 obs: 'orig.ident', 'nCount_RNA', 'nFeature_RNA', 'nCount_HTO', 'nFeature_HTO', 'nCount_GDO', 'nCount_ADT', 'nFeature_ADT', 'percent.mito', 'MULTI_ID', 'HTO_classification', 'guide_ID', 'gene_target', 'NT', 'perturbation', 'replicate', 'S.Score', 'G2M.Score', 'Phase' var: 'name' hto: 200 x 12 obs: 'orig.ident', 'nCount_RNA', 'nFeature_RNA', 'nCount_HTO', 'nFeature_HTO', 'nCount_GDO', 'nCount_ADT', 'nFeature_ADT', 'percent.mito', 'MULTI_ID', 'HTO_classification', 'guide_ID', 'gene_target', 'NT', 'perturbation', 'replicate', 'S.Score', 'G2M.Score', 'Phase' var: 'name' gdo: 200 x 111 obs: 'orig.ident', 'nCount_RNA', 'nFeature_RNA', 'nCount_HTO', 'nFeature_HTO', 'nCount_GDO', 'nCount_ADT', 'nFeature_ADT', 'percent.mito', 'MULTI_ID', 'HTO_classification', 'guide_ID', 'gene_target', 'NT', 'perturbation', 'replicate', 'S.Score', 'G2M.Score', 'Phase' var: 'name'
MuData objects build on top of AnnData objects to store and serialize multimodal data. More information can be found on the MuData documentation.
First we register the artifact:
artifact = ln.Artifact(
"papalexi21_subset.h5mu", description="Sub-sampled MuData from Papalexi21"
)
artifact.save()
Now letβs validate and register the 3 feature sets this data contains:
RNA (gene expression)
ADT (antibody derived tags reflecting surface proteins)
obs (metadata)
For the two modalities rna and adt, we use bionty tables as the reference:
Validate #
mdata["rna"].var_names[:5]
Index(['RP5-827C21.6', 'XX-CR54.1', 'SH2D6', 'RP11-379B18.5', 'RP11-778D9.12'], dtype='object', name='index')
bt.Gene.validate(mdata["rna"].var_names, bt.Gene.symbol);
β 173 terms (100.00%) are not validated for symbol: RP5-827C21.6, XX-CR54.1, SH2D6, RP11-379B18.5, RP11-778D9.12, RP11-703G6.1, AC005150.1, RP11-717H13.1, CTC-498J12.1, CTC-467M3.1, ARHGAP26-AS1, GABRA1, HIST1H4K, HLA-DQB1-AS1, RP11-524H19.2, SPACA1, VNN1, AC006042.7, AC002066.1, AC073934.6, ...
genes = bt.Gene.from_values(mdata["rna"].var_names, bt.Gene.symbol)
ln.save(genes)
β ambiguous validation in Bionty for 6 records: 'HLA-DQB1-AS1', 'CTAGE15', 'CTRB2', 'LGALS9C', 'PCDHB11', 'TBC1D3G'
β did not create Gene records for 84 non-validated symbols: 'AC002066.1', 'AC004019.13', 'AC005150.1', 'AC006042.7', 'AC011558.5', 'AC026471.6', 'AC073934.6', 'AC091132.1', 'AC092295.4', 'AC092687.5', 'AE000662.93', 'AL132989.1', 'AP000442.4', 'CTA-373H7.7', 'CTB-134F13.1', 'CTB-31O20.9', 'CTC-498J12.1', 'CTD-2562J17.2', 'CTD-3012A18.1', 'CTD-3065B20.2', ...
mdata["rna"].var_names = bt.Gene.standardize(mdata["rna"].var_names, bt.Gene.symbol)
validated = bt.Gene.validate(mdata["rna"].var_names, bt.Gene.symbol)
β 84 terms (48.60%) are not validated for symbol: RP5-827C21.6, XX-CR54.1, RP11-379B18.5, RP11-778D9.12, RP11-703G6.1, AC005150.1, RP11-717H13.1, CTC-498J12.1, RP11-524H19.2, AC006042.7, AC002066.1, AC073934.6, RP11-268G12.1, U52111.14, RP11-235C23.5, RP11-12J10.3, RP11-324E6.9, RP11-187A9.3, RP11-365N19.2, RP11-346D14.1, ...
new_genes = [bt.Gene(symbol=symbol) for symbol in mdata["rna"].var_names[~validated]]
ln.save(new_genes)
bt.Gene.validate(mdata["rna"].var_names, bt.Gene.symbol);
feature_set_rna = ln.FeatureSet.from_values(
mdata["rna"].var_names, field=bt.Gene.symbol
)
mdata["adt"].var_names
Index(['CD86', 'PDL1', 'PDL2', 'CD366'], dtype='object', name='index')
bt.CellMarker.validate(mdata["adt"].var_names);
β 4 terms (100.00%) are not validated for name: CD86, PDL1, PDL2, CD366
markers = bt.CellMarker.from_values(mdata["adt"].var_names)
ln.save(markers)
bt.CellMarker.validate(mdata["adt"].var_names);
Register #
feature_set_adt = ln.FeatureSet.from_values(
mdata["adt"].var_names, field=bt.CellMarker.name
)
Link them to artifact:
artifact.features._add_feature_set(feature_set_rna, slot="rna")
artifact.features._add_feature_set(feature_set_adt, slot="adt")
The 3rd feature set is the obs:
obs = mdata["rna"].obs
Weβre only interested in a single metadata column:
ln.Feature(name="gene_target", type="category").save()
features = ln.Feature.from_df(obs)
ln.save(features)
feature_set_obs = ln.FeatureSet.from_df(obs)
artifact.features._add_feature_set(feature_set_obs, slot="obs")
gene_targets = bt.Gene.from_values(obs["gene_target"], bt.Gene.symbol)
ln.save(gene_targets)
features = ln.Feature.lookup()
artifact.labels.add(gene_targets, feature=features.gene_target)
β ambiguous validation in Bionty for 4 records: 'MARCHF8', 'IRF7', 'IFNGR2', 'TNFRSF14'
β did not create Gene record for 1 non-validated symbol: 'NT'
nt = ln.ULabel(name="NT", description="Non-targeting control of perturbations")
nt.save()
artifact.labels.add(nt, feature=features.gene_target)
for col in ["orig.ident", "perturbation", "replicate", "Phase", "guide_ID"]:
labels = [ln.ULabel(name=name) for name in obs[col].unique()]
ln.save(labels)
β loaded ULabel record with same name: 'NT' (disable via ln.settings.upon_create_search_names)
β record with similar name exist! did you mean to load it?
uid | score | |
---|---|---|
name | ||
G1 | 7CioMifF | 90.0 |
β record with similar name exist! did you mean to load it?
uid | score | |
---|---|---|
name | ||
G1 | 7CioMifF | 90.0 |
β record with similar name exist! did you mean to load it?
uid | score | |
---|---|---|
name | ||
S | V1NB1NLX | 90.0 |
β records with similar names exist! did you mean to load one of them?
uid | score | |
---|---|---|
name | ||
G1 | 7CioMifF | 90.0 |
S | V1NB1NLX | 90.0 |
β record with similar name exist! did you mean to load it?
uid | score | |
---|---|---|
name | ||
NT | AqYbkdTx | 90.0 |
β records with similar names exist! did you mean to load one of them?
uid | score | |
---|---|---|
name | ||
G1 | 7CioMifF | 90.0 |
NT | AqYbkdTx | 90.0 |
β record with similar name exist! did you mean to load it?
uid | score | |
---|---|---|
name | ||
S | V1NB1NLX | 90.0 |
β record with similar name exist! did you mean to load it?
uid | score | |
---|---|---|
name | ||
G1 | 7CioMifF | 90.0 |
β record with similar name exist! did you mean to load it?
uid | score | |
---|---|---|
name | ||
G1 | 7CioMifF | 90.0 |
β records with similar names exist! did you mean to load one of them?
uid | score | |
---|---|---|
name | ||
G1 | 7CioMifF | 90.0 |
S | V1NB1NLX | 90.0 |
β record with similar name exist! did you mean to load it?
uid | score | |
---|---|---|
name | ||
G1 | 7CioMifF | 90.0 |
β record with similar name exist! did you mean to load it?
uid | score | |
---|---|---|
name | ||
S | V1NB1NLX | 90.0 |
β record with similar name exist! did you mean to load it?
uid | score | |
---|---|---|
name | ||
G1 | 7CioMifF | 90.0 |
β records with similar names exist! did you mean to load one of them?
uid | score | |
---|---|---|
name | ||
G1 | 7CioMifF | 90.0 |
NT | AqYbkdTx | 90.0 |
β record with similar name exist! did you mean to load it?
uid | score | |
---|---|---|
name | ||
G1 | 7CioMifF | 90.0 |
β record with similar name exist! did you mean to load it?
uid | score | |
---|---|---|
name | ||
G1 | 7CioMifF | 90.0 |
β record with similar name exist! did you mean to load it?
uid | score | |
---|---|---|
name | ||
G1 | 7CioMifF | 90.0 |
β record with similar name exist! did you mean to load it?
uid | score | |
---|---|---|
name | ||
S | V1NB1NLX | 90.0 |
β records with similar names exist! did you mean to load one of them?
uid | score | |
---|---|---|
name | ||
G1 | 7CioMifF | 90.0 |
S | V1NB1NLX | 90.0 |
β record with similar name exist! did you mean to load it?
uid | score | |
---|---|---|
name | ||
G1 | 7CioMifF | 90.0 |
β record with similar name exist! did you mean to load it?
uid | score | |
---|---|---|
name | ||
G1 | 7CioMifF | 90.0 |
β record with similar name exist! did you mean to load it?
uid | score | |
---|---|---|
name | ||
S | V1NB1NLX | 90.0 |
β record with similar name exist! did you mean to load it?
uid | score | |
---|---|---|
name | ||
NT | AqYbkdTx | 90.0 |
β record with similar name exist! did you mean to load it?
uid | score | |
---|---|---|
name | ||
G1 | 7CioMifF | 90.0 |
β record with similar name exist! did you mean to load it?
uid | score | |
---|---|---|
name | ||
NT | AqYbkdTx | 90.0 |
β record with similar name exist! did you mean to load it?
uid | score | |
---|---|---|
name | ||
G1 | 7CioMifF | 90.0 |
β record with similar name exist! did you mean to load it?
uid | score | |
---|---|---|
name | ||
G1 | 7CioMifF | 90.0 |
β record with similar name exist! did you mean to load it?
uid | score | |
---|---|---|
name | ||
NT | AqYbkdTx | 90.0 |
β record with similar name exist! did you mean to load it?
uid | score | |
---|---|---|
name | ||
S | V1NB1NLX | 90.0 |
β record with similar name exist! did you mean to load it?
uid | score | |
---|---|---|
name | ||
G1 | 7CioMifF | 90.0 |
β record with similar name exist! did you mean to load it?
uid | score | |
---|---|---|
name | ||
G1 | 7CioMifF | 90.0 |
β record with similar name exist! did you mean to load it?
uid | score | |
---|---|---|
name | ||
S | V1NB1NLX | 90.0 |
β record with similar name exist! did you mean to load it?
uid | score | |
---|---|---|
name | ||
G1 | 7CioMifF | 90.0 |
β record with similar name exist! did you mean to load it?
uid | score | |
---|---|---|
name | ||
S | V1NB1NLX | 90.0 |
β record with similar name exist! did you mean to load it?
uid | score | |
---|---|---|
name | ||
G1 | 7CioMifF | 90.0 |
β record with similar name exist! did you mean to load it?
uid | score | |
---|---|---|
name | ||
S | V1NB1NLX | 90.0 |
β record with similar name exist! did you mean to load it?
uid | score | |
---|---|---|
name | ||
S | V1NB1NLX | 90.0 |
Because none of these labels seem like something weβd want to track in the registry or validate, we donβt link them to the artifact.
artifact.features
Features:
rna: FeatureSet(uid='EPnzPeiIeofPgY6etvSD', n=184, type='number', registry='bionty.Gene', hash='MkNEiIXppO5tY6LeG271', updated_at=2024-03-26 12:05:58 UTC, created_by_id=1)
'SH2D6', 'MEF2C-AS2', 'ARHGAP26-AS1', 'GABRA1', 'H4C12', 'HLA-DQB1-AS1', 'HLA-DQB1-AS1', 'HLA-DQB1-AS1', 'HLA-DQB1-AS1', 'HLA-DQB1-AS1', 'HLA-DQB1-AS1', 'HLA-DQB1-AS1', 'SPACA1', 'VNN1', 'CTAGE15', 'CTAGE15', 'PFKFB1', 'TRPC5', 'RBPMS-AS1', 'CA8', ...
adt: FeatureSet(uid='44C6FsnipfPMorO9jUos', n=4, type='number', registry='bionty.CellMarker', hash='o8EDT805HnP0Fmk4uZ9e', updated_at=2024-03-26 12:05:58 UTC, created_by_id=1)
'CD86', 'PDL1', 'PDL2', 'CD366'
obs: FeatureSet(uid='85TazTs0jCfHZkli97p5', n=19, registry='core.Feature', hash='52pUCCmPSXllFegewLGI', updated_at=2024-03-26 12:05:58 UTC, created_by_id=1)
π gene_target (bionty.Gene|core.ULabel)
π gene_target (28, bionty.Gene): 'MARCHF8', 'MARCHF8', 'IFNGR1', 'CAV1', 'IRF7', 'IRF7', 'ATF2', 'NFKBIA', 'STAT1', 'SPI1', ...
π gene_target (1, core.ULabel): 'NT'
orig.ident (category)
nCount_RNA (number)
nFeature_RNA (number)
nCount_HTO (number)
nFeature_HTO (number)
nCount_GDO (number)
nCount_ADT (number)
nFeature_ADT (number)
percent.mito (number)
MULTI_ID (category)
HTO_classification (category)
guide_ID (category)
NT (category)
perturbation (category)
replicate (category)
S.Score (number)
G2M.Score (number)
Phase (category)
artifact.describe()
Artifact(uid='cMXew46Mn1Jegqnv4nTb', suffix='.h5mu', description='Sub-sampled MuData from Papalexi21', size=606320, hash='RaivS3NesDOP-6kNIuaC3g', hash_type='md5', visibility=1, key_is_virtual=True, updated_at=2024-03-26 12:05:53 UTC)
Provenance:
ποΈ storage: Storage(uid='zOIa84HQ', root='/home/runner/work/lamin-usecases/lamin-usecases/docs/test-multimodal', type='local', updated_at=2024-03-26 12:05:49 UTC, created_by_id=1)
π« transform: Transform(uid='yMWSFirS6qv26K79', name='Multi-modal', key='multimodal', version='0', type=notebook, updated_at=2024-03-26 12:05:52 UTC, created_by_id=1)
π£ run: Run(uid='1FZGtXME9dGnVm5BkLh6', started_at=2024-03-26 12:05:52 UTC, is_consecutive=True, transform_id=1, created_by_id=1)
π€ created_by: User(uid='DzTjkKse', handle='testuser1', name='Test User1', updated_at=2024-03-26 12:05:49 UTC)
Features:
rna: FeatureSet(uid='EPnzPeiIeofPgY6etvSD', n=184, type='number', registry='bionty.Gene', hash='MkNEiIXppO5tY6LeG271', updated_at=2024-03-26 12:05:58 UTC, created_by_id=1)
'SH2D6', 'MEF2C-AS2', 'ARHGAP26-AS1', 'GABRA1', 'H4C12', 'HLA-DQB1-AS1', 'HLA-DQB1-AS1', 'HLA-DQB1-AS1', 'HLA-DQB1-AS1', 'HLA-DQB1-AS1', 'HLA-DQB1-AS1', 'HLA-DQB1-AS1', 'SPACA1', 'VNN1', 'CTAGE15', 'CTAGE15', 'PFKFB1', 'TRPC5', 'RBPMS-AS1', 'CA8', ...
adt: FeatureSet(uid='44C6FsnipfPMorO9jUos', n=4, type='number', registry='bionty.CellMarker', hash='o8EDT805HnP0Fmk4uZ9e', updated_at=2024-03-26 12:05:58 UTC, created_by_id=1)
'CD86', 'PDL1', 'PDL2', 'CD366'
obs: FeatureSet(uid='85TazTs0jCfHZkli97p5', n=19, registry='core.Feature', hash='52pUCCmPSXllFegewLGI', updated_at=2024-03-26 12:05:58 UTC, created_by_id=1)
π gene_target (bionty.Gene|core.ULabel)
π gene_target (28, bionty.Gene): 'MARCHF8', 'MARCHF8', 'IFNGR1', 'CAV1', 'IRF7', 'IRF7', 'ATF2', 'NFKBIA', 'STAT1', 'SPI1', ...
π gene_target (1, core.ULabel): 'NT'
orig.ident (category)
nCount_RNA (number)
nFeature_RNA (number)
nCount_HTO (number)
nFeature_HTO (number)
nCount_GDO (number)
nCount_ADT (number)
nFeature_ADT (number)
percent.mito (number)
MULTI_ID (category)
HTO_classification (category)
guide_ID (category)
NT (category)
perturbation (category)
replicate (category)
S.Score (number)
G2M.Score (number)
Phase (category)
Labels:
π·οΈ genes (28, bionty.Gene): 'MARCHF8', 'MARCHF8', 'IFNGR1', 'CAV1', 'IRF7', 'IRF7', 'ATF2', 'NFKBIA', 'STAT1', 'SPI1', ...
π·οΈ ulabels (1, core.ULabel): 'NT'
artifact.view_lineage()
# clean up test instance
!lamin delete --force test-multimodal
!rm -r test-multimodal
Show code cell output
π‘ deleting instance testuser1/test-multimodal
β manually delete your stored data: /home/runner/work/lamin-usecases/lamin-usecases/docs/test-multimodal