name: cdr3aaphyschem description: Analyzes physicochemical properties of CDR3 amino acid sequences to understand biochemical characteristics of T-cell receptor repertoires. Performs regression analysis between two cell groups at different CDR3 lengths for each physicochemical feature (hydrophobicity, volume, isoelectric point, etc.).
CDR3AAPhyschem Process Configuration
Purpose
Analyzes physicochemical properties of CDR3 amino acid sequences to understand biochemical characteristics of T-cell receptor repertoires. Performs regression analysis between two cell groups at different CDR3 lengths for each physicochemical feature (hydrophobicity, volume, isoelectric point, etc.).
When to Use
- To analyze CDR3 biochemical properties differences between cell groups (e.g., Treg vs Tconv)
- For feature engineering in TCR machine learning models
- To identify sequence features that distinguish cell subsets
- After
ScRepCombiningExpression(requires combined TCR + RNA data) - When investigating T cell fate determination (regulatory vs conventional T cells)
Configuration Structure
Process Enablement
[CDR3AAPhyschem]
cache = true
Input Specification
[CDR3AAPhyschem.in]
scrfile = ["ScRepCombiningExpression"]
scrfile: Output fromScRepCombiningExpression(RDS or qs/qs2 format)- Must contain both TRA and TRB chains
- Generated by
scRepertoire::combineExpression()
Environment Variables
[CDR3AAPhyschem.envs]
# Group comparison specification
group = "CellType"
comparison = {Treg = ["CD4 CTL", "CD4 Naive", "CD4 TCM", "CD4 TEM"], Tconv = "Tconv"}
target = "Treg"
each = "Sample"
# Chain selection
chain = "TRB"
Key Parameters:
group: Column name in metadata defining groups to compare (e.g.,CellType,seurat_clusters)comparison: Two-group specification for regression analysis- Format 1 (dict):
Group1 = ["cell1", "cell2"], Group2 = "cell3" - Format 2 (list):
["Group1", "Group2"](when groups exist in column)
- Format 1 (dict):
target: Which group to label as 1 in regression (default: first group incomparison)each: Column(s) to split data for separate analyses- Single column:
"Sample" - Multiple columns:
["Sample", "Patient"] - Comma-separated:
"Sample,Patient" - If not provided, all cells used together
- Single column:
Configuration Examples
Minimal Configuration
[CDR3AAPhyschem]
[CDR3AAPhyschem.in]
scrfile = ["ScRepCombiningExpression"]
Standard Treg vs Tconv Analysis
[CDR3AAPhyschem]
[CDR3AAPhyschem.envs]
# Define cell type groups for comparison
group = "CellType"
comparison = {Treg = ["Treg"], Tconv = ["Tconv"]}
target = "Treg"
chain = "TRB"
Multi-Sample Analysis
[CDR3AAPhyschem]
[CDR3AAPhyschem.envs]
group = "CellType"
comparison = ["Treg", "Tconv"]
target = "Treg"
# Run regression separately for each sample
each = "Sample"
chain = "TRB"
Custom Group Definition
[CDR3AAPhyschem]
[CDR3AAPhyschem.envs]
group = "Cluster"
# Define clusters to compare
comparison = {
HighQuality = ["c1", "c2", "c5"],
LowQuality = ["c3", "c4"]
}
target = "HighQuality"
chain = "TRB"
Physicochemical Properties
Available Properties
The process calculates 8 key physicochemical properties from CDR3 amino acid sequences:
| Property | Description | Biological Significance |
|---|---|---|
| length | Total amino acid count in CDR3 | Influences binding loop size and flexibility |
| gravy | Grand Average of Hydrophobicity (Kyte-Doolittle scale) | Hydrophobic CDR3s associate with self-reactivity and Treg fate |
| bulkiness | Average bulkiness (Zimmerman scale) | Measures steric bulk of amino acids |
| polarity | Average polarity (Grantham scale) | Influences interactions with peptide-MHC |
| aliphatic | Normalized aliphatic index (Ikai scale) | Related to thermal stability |
| charge | Normalized net charge at physiological pH | Affects electrostatic interactions |
| acidic | Acidic side chain residue content (D, E proportion) | Contributes to negative charge |
| aromatic | Aromatic side chain content (F, W, Y proportion) | Important for π-π interactions |
Property Calculation Methods
- Default scales: Standard biophysical scales from peer-reviewed literature
- GRAVY: Kyte & Doolittle (1982) hydropathy scale
- Bulkiness: Zimmerman et al. (1968) bulkiness parameters
- Polarity: Grantham (1974) amino acid difference index
- Aliphatic index: Ikai (1980) thermodynamic stability scale
- Charge: Normalized based on pKa values (EMBOSS database)
- Acidic/Basic/Aromatic: Direct residue counting proportions
Regression Analysis
- Performed for each physicochemical property independently
- Compares properties across CDR3 length distributions
- Binary classification: target group (1) vs non-target (0)
- Output: Statistical significance of property differences
Common Patterns
Pattern 1: Treg vs Tconv (TRB Chain)
[CDR3AAPhyschem]
[CDR3AAPhyschem.envs]
# Literature-based: hydrophobic CDR3β promotes Treg fate
group = "CellType"
comparison = {Treg = ["Treg", "CD4+Treg"], Tconv = ["Tconv", "CD4+Tconv"]}
target = "Treg"
chain = "TRB"
each = "" # Analyze all samples together
Pattern 2: Selected Properties Only
[CDR3AAPhyschem]
[CDR3AAPhyschem.envs]
# Focus on hydrophobicity (key Treg feature)
group = "CellType"
comparison = ["Treg", "Tconv"]
target = "Treg"
chain = "TRB"
# To analyze specific chains separately
Pattern 3: Multi-Chain Analysis
Run separate processes for different chains:
# TRB analysis
[CDR3AAPhyschem]
[CDR3AAPhyschem.envs]
chain = "TRB"
group = "CellType"
comparison = ["Treg", "Tconv"]
# Note: Create separate config for TRA analysis if needed
Pattern 4: Multi-Group Comparisons
[CDR3AAPhyschem]
[CDR3AAPhyschem.envs]
group = "CellType"
comparison = {
Naive = ["CD4 Naive", "CD8 Naive"],
Memory = ["CD4 TEM", "CD4 TCM", "CD8 TEM", "CD8 TCM"],
Effector = ["CD4 CTL", "CD8 CTL"]
}
target = "Naive"
chain = "TRB"
Dependencies
- Upstream:
ScRepCombiningExpression(required) - Downstream: Feature analysis, ML model training, publication figures
- Required data: Both TRA and TRB chains in combined object
Validation Rules
- CDR3 sequence requirements: Must have valid amino acid sequences (no Ns)
- Chain requirement: Data must contain specified chain (TRA or TRB)
- Group specification: Groups must exist in metadata
- Minimum cells: Sufficient cells per group for statistical regression
- Length distribution: CDR3 length range must be adequate for regression
Troubleshooting
Issue: "Missing chain in data"
Cause: Specified chain (TRA/TRB) not found in combined object Solution:
# Change to available chain
[CDR3AAPhyschem.envs]
chain = "TRA" # or "TRB"
Issue: "Group not found in metadata"
Cause: group column or comparison values don't exist
Solution:
- Check available metadata columns in
ScRepCombiningExpressionoutput - Verify group names match exactly (case-sensitive)
[CDR3AAPhyschem.envs]
group = "seurat_clusters" # If CellType not available
comparison = ["0", "1"] # Use cluster IDs
Issue: "Insufficient cells for regression"
Cause: Too few cells in one or more groups Solution:
- Use
eachto analyze samples separately if pooled analysis fails - Combine similar cell types in
comparison
[CDR3AAPhyschem.envs]
# Combine rare subtypes
comparison = {HighExpander = ["Treg", "Tconv"], LowExpander = ["Tfh"]}
Issue: "No significant property differences"
Cause: Groups may not differ in physicochemical properties Solution:
- Check if
comparisongroups are biologically distinct - Consider different
groupcolumn (e.g., gene expression clusters) - Verify CDR3 sequences are high-quality
Scientific Context
Key Publications
- Stadinski et al. (2016): "Hydrophobic CDR3 residues promote development of self-reactive T cells" - Nature Immunology
- Lagattuta et al. (2022): "TCR sequence features influence T cell fate" - Nature Immunology
- Ostmeyer et al. (2019): "Biophysicochemical motifs distinguish TILs from healthy tissue" - Cancer Research
Interpretation Guidelines
- High GRAVY: More hydrophobic CDR3 (associated with self-reactivity, Treg)
- High charge: Electrostatic potential may affect binding affinity
- High aromaticity: Increased π-π interactions, structural stability
- Length distribution: Longer CDR3s may provide broader specificity
Feature Engineering Applications
Use properties as features for:
- TCR specificity prediction models
- T cell fate classification (Treg vs Tconv)
- Antigen binding affinity estimation
- Cross-reactivity assessment
Output Format
- Directory:
{{in.scrfile | stem}}.cdr3aaphyschem/ - Files:
- Regression plots per property (hydrophobicity, volume, pI)
- Statistical tables comparing groups
- CDR3 length distributions
- Property correlation matrices
- Visualizations:
- Property vs length scatter plots
- Group-wise property boxplots
- Regression curves with confidence intervals
Advanced Usage
Custom Property Scales
If using non-default scales (requires modifying underlying R script):
# Note: Advanced usage - may require script modification
[CDR3AAPhyschem]
[CDR3AAPhyschem.envs]
# Specify alternative hydrophobicity scale
hydro_scale = "Wimley"
pK_source = "Murray"
Length-Based Stratification
[CDR3AAPhyschem]
[CDR3AAPhyschem.envs]
# Analyze by CDR3 length bins
group = "CellType"
comparison = ["Treg", "Tconv"]
# Use metadata column with length information
each = "CDR3_Length_Bin"
chain = "TRB"
Publication-Ready Plots
[CDR3AAPhyschem]
[CDR3AAPhyschem.envs]
group = "CellType"
comparison = {Treg = "Treg", Tconv = "Tconv"}
target = "Treg"
chain = "TRB"
# Publication parameters
plot_theme = "nature"
fig_dpi = 300
fig_format = "pdf"