name: cross-reference-entry description: Guidelines for adding and maintaining cross-references between dictionary entries. Covers reference types, format requirements, and extraction from notes.
Cross-Reference Entry Guidelines
When creating or revising entries, add cross-references to related vocabulary. This improves navigation and helps learners understand word relationships.
Cross-Reference Types
1. pair - Transitivity Pairs (HIGH PRIORITY)
Use for verb transitivity pairs ({自動詞|じどうし}/{他動詞|たどうし}).
{
"type": "pair",
"reading": "しまる",
"headword": "{閉|し}まる",
"label": "intransitive"
}
Labels: intransitive or transitive
Common pairs:
- 開く/開ける, 閉まる/閉める, 始まる/始める
- 出る/出す, 入る/入れる, 付く/付ける
- 決まる/決める, 変わる/変える, 上がる/上げる
2. antonym - Opposites (HIGH PRIORITY)
Use for direct opposites.
{
"type": "antonym",
"reading": "あける",
"headword": "{開|あ}ける",
"label": "to open"
}
Label: Brief gloss of target word
3. keigo - Honorific/Humble Forms (HIGH PRIORITY)
Use for formal speech equivalents.
{
"type": "keigo",
"reading": "めしあがる",
"headword": "{召|め}し{上|あ}がる",
"label": "honorific"
}
Labels: honorific or humble
Common keigo links:
- 食べる → 召し上がる (hon.), いただく (hum.)
- 行く → いらっしゃる (hon.), 参る (hum.)
- 言う → おっしゃる (hon.), 申す (hum.)
- 見る → ご覧になる (hon.), 拝見する (hum.)
4. synonym - Similar Meaning (MEDIUM PRIORITY)
Use for words with similar meaning but different nuance.
{
"type": "synonym",
"reading": "りかいする",
"headword": "{理解|りかい}する",
"label": "formal"
}
Label: Distinguishing characteristic (e.g., "formal", "written", "casual")
5. contrast - Easily Confused (MEDIUM PRIORITY)
Use for words learners often confuse.
{
"type": "contrast",
"reading": "が",
"headword": "が",
"label": "subject marking"
}
Especially important for:
- Particles: は vs が, に vs で, に vs へ
- Similar verbs: 聞く vs 聴く, 見る vs 見える vs 見せる
6. related - Semantically Connected (LOW PRIORITY)
Use for derived words, compounds, or thematically related vocabulary.
{
"type": "related",
"reading": "たべもの",
"headword": "{食|た}べ{物|もの}",
"label": "food (noun)"
}
7. see_also - General Reference (LOW PRIORITY)
Use for general cross-references that don't fit other categories.
{
"type": "see_also",
"reading": "しょくじ",
"headword": "{食事|しょくじ}",
"label": null
}
Format Requirements
Each cross-reference object requires:
| Field | Required | Description |
|---|---|---|
type | Yes | One of: pair, synonym, antonym, keigo, related, see_also, contrast, homophone |
target_id | No | Hard-coded entry ID for direct resolution (takes priority over reading/headword) |
reading | Yes | Hiragana reading (fallback lookup key when no target_id) |
headword | Yes* | Display form with furigana (required for homonym disambiguation) |
label | No | Short descriptor |
*Headword is required for proper resolution. Without it, cross-references cannot be disambiguated between homonyms.
Note: Valid cross-reference types are defined centrally in build/constants.py and shared across the schema, validation, and build scripts.
Hybrid Cross-Reference System
The dictionary uses a hybrid system that supports both:
- Hard-coded
target_id- Direct reference to an entry ID (unambiguous) - Forward references - References by reading/headword to entries that may not exist yet
Resolution Priority
When resolving a cross-reference:
- If
target_idpresent AND entry exists → resolved (use ID directly) - If
target_idpresent AND entry missing → ERROR (stale reference) - If no
target_id→ resolve by reading/headword (may be pending if target doesn't exist)
When to Use target_id
Use target_id when:
- The target entry exists in the dictionary
- You want guaranteed, unambiguous resolution
- The reference was previously validated
Don't manually add target_id when:
- Creating forward references to entries that don't exist yet
- You're unsure which homonym is correct
Instead, use the harden_references.py script to automatically add target_id to resolvable references.
Example with target_id
{
"type": "pair",
"target_id": "00754_shimaru",
"reading": "しまる",
"headword": "{閉|し}まる",
"label": "intransitive"
}
Example forward reference (no target_id)
{
"type": "antonym",
"reading": "ひらく",
"headword": "{開|ひら}く",
"label": "to open"
}
Homonym Disambiguation
CRITICAL: Many Japanese words share the same reading but have different kanji (homonyms). The headword field is essential for correct resolution.
Example: The reading かんじょう has multiple entries:
- {感情|かんじょう} - emotion, feeling
- {勘定|かんじょう} - bill, calculation
If you reference かんじょう without specifying the headword, the system cannot determine which entry you mean.
Always include the headword to ensure cross-references link to the correct entry.
// CORRECT - specifies headword for disambiguation
{
"type": "synonym",
"reading": "かんじょう",
"headword": "{勘定|かんじょう}",
"label": "bill, calculation"
}
// INCORRECT - no headword, may link to wrong homonym
{
"type": "synonym",
"reading": "かんじょう",
"label": "bill, calculation"
}
Validation detects homonym mismatches: When you specify a headword that doesn't match any existing entry with that reading (e.g., 勘定 when only 感情 exists), the validator will warn you. This indicates either:
- The target entry doesn't exist yet (forward reference - OK)
- The headword is incorrect (fix it)
Priority Order
When adding references to entries, prioritize:
-
HIGH - Always add if applicable:
- Transitivity pairs (pair)
- Keigo equivalents (keigo)
- Direct antonyms (antonym)
-
MEDIUM - Add when natural:
- Close synonyms with clear distinction (synonym)
- Particle contrasts (contrast)
- Related compounds (related)
-
LOW - Add sparingly:
- Thematic groupings
- General see_also references
Extracting from Notes
The notes field often contains vocabulary that should be cross-referenced. Look for:
Patterns to Extract
-
Pair verbs:
- "Pair: {閉|し}まる" or "PAIR VERB: ..."
- "The intransitive counterpart is ..."
-
Antonyms:
- "Opposite: {開|あ}ける"
- "Antonym: ..."
-
Keigo:
- "{召|め}し{上|あ}がる (honorific)"
- "Humble form: いただく"
-
Related words:
- Words in furigana notation within COMMON PATTERNS
- Nouns derived from verbs: 食べる → 食べ物
Automated Extraction
Run the extraction script to find potential references:
# Dry run - see proposed changes
python3 build/extract_references.py
# Apply changes
python3 build/extract_references.py --apply
# Single entry
python3 build/extract_references.py --id 00396_taberu
Note: The extraction script now performs immediate resolution. When a target entry exists, the extracted reference automatically includes target_id.
Hardening References
The harden_references.py script scans entries and adds target_id to resolvable cross-references. This "hardens" forward references into direct ID-based references once the target entry exists.
# Dry run - see what would change
python3 build/harden_references.py
# Apply changes
python3 build/harden_references.py --apply
# Single entry
python3 build/harden_references.py --id 00485_shimeru
When to run:
- After adding new entries that are targets of existing forward references
- Periodically to ensure all resolvable references have
target_id - Before releases to maximize resolution coverage
The script will:
- Add
target_idto unambiguously resolvable references - WARN about ambiguous references (multiple candidates, need headword)
- ERROR on stale
target_idreferences (target no longer exists) - Skip forward references (legitimate refs to non-existent entries)
Handling Non-Existent Entries
Important: You can add references to entries that don't exist yet.
- Use
readingas the primary key (required) - Include
headwordfor display purposes - The link will be marked as "pending" in the web interface
- When the target entry is created, the link automatically becomes active
This allows you to:
- Plan future entries
- Track vocabulary relationships before full coverage
- Show learners related vocabulary even if not yet in dictionary
Validation
After adding references, validate:
python3 build/validate.py --id {entry_id}
The validator checks:
- Required fields present (type, reading)
- Valid type values
- Reading is valid hiragana
- No self-references
- Homonym mismatches - warns when a headword is specified but doesn't match any existing entry with that reading
- Stale target_id - ERRORS when
target_idpoints to a non-existent entry - Hardenable references - warns when a reference could be hardened (target exists but no
target_id)
Validation Messages
| Type | Meaning | Action |
|---|---|---|
| ERROR: Stale target_id | target_id points to deleted entry | Remove or update target_id |
| WARNING: Hardenable | Reference resolvable but missing target_id | Run harden_references.py --apply |
| WARNING: Homonym mismatch | Headword doesn't match any entry with that reading | Verify correct homonym or wait for entry creation |
Example Entry
Before:
{
"id": "00485_shimeru",
"cross_references": []
}
After:
{
"id": "00485_shimeru",
"cross_references": [
{
"type": "pair",
"reading": "しまる",
"headword": "{閉|し}まる",
"label": "intransitive"
},
{
"type": "antonym",
"reading": "あける",
"headword": "{開|あ}ける",
"label": "to open"
}
]
}
Quality Checklist
- Transitivity pair linked (for verbs)
- Keigo forms linked (for common verbs)
- Antonyms linked (if obvious opposite exists)
- References in notes are also in cross_references
- Each reference has correct type
- Reading is valid hiragana
- Headword uses proper furigana notation
- Labels are concise and consistent