AGENTS.md — Speech Translation Project

Guidelines for AI coding agents working on this Vietnamese-English Speech Translation repository.

Project Overview

This repository contains training and inference code for Speech-to-Text Translation (ST) models using SeamlessM4T for the Vietnamese ↔ English language pair.

Build / Test / Lint Commands

Code Quality (via seamless_communication submodule)

cd src/seamless_communication

# Format code with Black
black src/ tests/

# Type checking with mypy
mypy src/

# Pre-commit hooks
pre-commit run --all-files

Installation

# Install SeamlessM4T dependencies

## Code Style Guidelines

### Imports
- **Standard library** first, **third-party** second, **local** third
- Group imports with a blank line between groups
- Use absolute imports over relative imports
- Example:
```python
import json
import os
from pathlib import Path
from typing import List, Optional

import torch
import torchaudio
from tqdm import tqdm

from src.metrics import compute_wer

Formatting

Black code formatter (line length: 88 characters)
isort with "black" profile for import sorting
Use double quotes for strings consistently
Trailing commas in multi-line structures

Type Hints

Use Python 3.8+ typing syntax
Annotate function parameters and return types
Use Optional[Type] for nullable values
Use List[Type], Dict[Key, Value] from typing module
Example:

def compute_cer(reference: str, hypothesis: str, normalize: bool = True) -> float:
    ...

Naming Conventions

snake_case for functions, variables, methods
PascalCase for classes
SCREAMING_SNAKE_CASE for constants
Private methods/functions prefix with underscore: _helper()
Example:

MAX_SAMPLES = 50000
TARGET_SAMPLE_RATE = 16_000

def load_audio(filepath: str) -> tuple[torch.Tensor, int]:
    ...

class MetricsEvaluator:
    def _normalize_text(self, text: str) -> str:
        ...

Docstrings

Use Google-style docstrings
Document all public functions and classes
Include Args, Returns, and Raises sections
Example:

def evaluate(self, references: List[str], hypotheses: List[str]) -> MetricsResult:
    """
    Evaluate all metrics on the full corpus.

    Args:
        references: List of ground-truth strings.
        hypotheses: List of model output strings (same length).

    Returns:
        MetricsResult with aggregated CER, WER and BLEU.

    Raises:
        ValueError: If references and hypotheses differ in length.
    """

Error Handling

Use specific exceptions (ValueError, RuntimeError, etc.)
Raise exceptions with descriptive messages
Handle expected errors gracefully in inference code
Use try/except blocks sparingly, only for expected failures
Example:

if len(ref_chars) == 0:
    if len(hyp_chars) == 0:
        return 0.0
    raise ValueError("Reference is empty but hypothesis is not.")

Comments

Write docstrings for modules, classes, and public functions
Use inline comments sparingly, only for non-obvious logic
Prefix comments with # (space after hash)
Use section dividers for long files:

# ---------------------------------------------------------------------------
# Helpers
# ---------------------------------------------------------------------------

Project Structure

SpeechTranslation/
├── src/
│   ├── metrics.py              # CER/WER/BLEU evaluation metrics
│   ├── llm.py                  # Gemini LLM wrapper
│   ├── early_stopping.py       # Training utilities
│   └── seamless_communication/ # SeamlessM4T model code
├── scripts/
│   ├── prepare_data.py         # JSONL → TSV conversion
│   ├── train_spm.py            # SentencePiece training
│   └── compute_gcmvn.py        # GCMVN statistics
├── inference/
│   ├── seamless_infer.py       # SeamlessM4T inference
│   ├── single_infer.py         # Single audio inference
│   └── batch_infer.py          # Multi-GPU batch inference
├── configs/                    # Training configs
├── datasets/                   # JSONL datasets (metadata)
└── data/                       # TSV manifests

Key Technologies

PyTorch / fairseq2 — Deep learning framework
torchaudio — Audio processing
sentencepiece — Text tokenization
sacrebleu — BLEU score computation
hydra — Configuration management
unittest — Testing framework

Language Considerations

This project handles both English and Vietnamese text:

Use Unicode NFC normalization for text comparison
Vietnamese requires special handling for diacritics
Use sacrebleu's "char" tokenizer for Vietnamese BLEU scores

ナビゲーション

Skillsとは？

リンク

AGENTS.md — Speech Translation Project

AGENTS.md — Speech Translation Project

Project Overview

Build / Test / Lint Commands

Code Quality (via seamless_communication submodule)

Installation

Formatting

Type Hints

Naming Conventions

Docstrings

Error Handling

Comments

Project Structure

Key Technologies

Language Considerations

関連スキル(📄 ドキュメント)