---name: universal-single-cell-annotator description: A unified interface for annotating single-cell RNA-seq data using Marker Genes, Deep Learning (CellTypist), or LLMs. license: MIT metadata: author: AI Group version: "1.0.0" category: Genomics compatibility:
- system: Python 3.9+
- library: scanpy
- library: celltypist (optional) allowed-tools:
- run_shell_command
- read_file
keywords:
- rna
- automation
- biomedical measurable_outcome: execute task with >95% success rate. ---"
Universal Single-Cell Annotator
This skill wraps multiple cell type annotation strategies into a single Python class. It allows agents to flexibly choose between rule-based (markers), data-driven (CellTypist), or reasoning-based (LLM) approaches depending on the context.
When to Use This Skill
- Initial Analysis: When processing raw AnnData objects.
- Validation: When cross-referencing automated labels with known markers.
- Discovery: When identifying rare cell types using LLM reasoning on marker lists.
Core Capabilities
- Marker-Based Scoring: Scores cells based on provided gene lists (e.g., "T-cell": ["CD3D", "CD3E"]).
- Deep Learning Reference: Wraps
celltypistto transfer labels from massive atlases. - LLM Reasoning: Extracts top markers per cluster and constructs prompts for LLM interpretation.
Workflow
- Load Data: Ensure data is in
AnnDataformat (standard for Scanpy). - Choose Strategy:
- Use Markers if you have a known gene panel.
- Use CellTypist for broad immune/tissue profiling.
- Use LLM for novel clusters.
- Annotate: Run the corresponding method.
- Inspect: Check
adata.obsfor the new annotation columns.
Example Usage
User: "Annotate this dataset looking for T-cells and B-cells."
Agent Action:
from universal_annotator import UniversalAnnotator
import scanpy as sc
adata = sc.read_h5ad('data.h5ad')
annotator = UniversalAnnotator(adata)
markers = {
'T-cell': ['CD3D', 'CD3E', 'CD8A'],
'B-cell': ['CD79A', 'MS4A1']
}
annotator.annotate_marker_based(markers)
# Results in adata.obs['predicted_cell_type']