Kanji Index Maintenance
The kanji index allows users to click on any kanji in a dictionary headword to find all other entries containing that same kanji.
How It Works
- Headword kanji are linked to kanji index pages
- Kanji index pages list all entries containing that kanji
- Entry lists are sorted by reading (hiragana order)
Directory Structure
kanji/
├── kanji_list.json # Master list: kanji → kanji_id mapping
├── kanji_extracted.json # Temporary: extracted kanji needing IDs
├── 00001_jin_hito_person.json # Entry list for 人
├── 00002_nichi_hi_day.json # Entry list for 日
└── ...
docs/kanji/
├── 00001_jin_hito_person.html # HTML page for 人
├── 00002_nichi_hi_day.html # HTML page for 日
└── ...
Kanji ID Format
Format: {5-digit}_{onyomi}_{kunyomi}_{gloss}
- 5-digit: Sequential number (00001, 00002, ...)
- onyomi: Most common on'yomi in romaji (or "none")
- kunyomi: Most common kun'yomi in romaji without okurigana (or "none")
- gloss: Single English word for primary meaning
Examples
| Kanji | Kanji ID |
|---|---|
| 人 | 00001_jin_hito_person |
| 日 | 00002_nichi_hi_day |
| 大 | 00003_dai_oo_big |
| 畑 | 00004_none_hatake_field |
| 茶 | 00005_cha_none_tea |
Romaji Rules
- Long vowels: "ou" not "ō" (e.g., 高 → "kou")
- Voiced: "ga", "za", "da", "ba" (e.g., 学 → "gaku")
- No okurigana in kun'yomi (e.g., 高い → "taka", not "takai")
Assigning New Kanji IDs
When new entries introduce kanji not in kanji_list.json:
-
Detect new kanji:
bash1python3 build/update_kanji_index.py --check-new -
Assign readings and gloss using your knowledge:
- Most common on'yomi
- Most common kun'yomi (without okurigana)
- Single-word English gloss
-
Update kanji_list.json:
json1{ 2 "新": { 3 "kanji_id": "00123_shin_atara_new", 4 "onyomi": "shin", 5 "kunyomi": "atara", 6 "gloss": "new" 7 } 8} -
Rebuild:
bash1python3 build/build_flat.py
Common Tasks
Check for New Kanji
bash1python3 build/update_kanji_index.py --check-new
Rebuild All Kanji JSON Files
bash1python3 build/update_kanji_index.py --rebuild-all
Rebuild Kanji HTML Pages
bash1python3 build/build_kanji_html.py
Full Site Build (includes kanji)
bash1python3 build/build_flat.py
Troubleshooting
"Warning: X kanji need IDs assigned"
New kanji were found in entries. Assign IDs manually:
- Run
--check-newto see the full list - For each kanji, determine on'yomi, kun'yomi, gloss
- Add to
kanji/kanji_list.json - Rebuild
Missing kanji index page
Check that:
- Kanji is in
kanji/kanji_list.json - JSON file exists:
kanji/{kanji_id}.json - Run
python3 build/build_kanji_html.py
Kanji link not appearing in headword
Check that:
- Kanji is in
kanji/kanji_list.json - Entry HTML was rebuilt after kanji was added
Entry count wrong on kanji page
Rebuild the kanji JSON file:
bash1python3 build/update_kanji_index.py --rebuild-all 2python3 build/build_kanji_html.py
File Formats
kanji_list.json
json1{ 2 "metadata": { 3 "description": "Index mapping kanji characters to their kanji index IDs", 4 "generated": "2026-01-22T10:30:00Z", 5 "total_kanji": 1500 6 }, 7 "kanji": { 8 "人": { 9 "kanji_id": "00001_jin_hito_person", 10 "onyomi": "jin", 11 "kunyomi": "hito", 12 "gloss": "person" 13 } 14 } 15}
Individual kanji JSON
json1{ 2 "metadata": { 3 "kanji": "人", 4 "kanji_id": "00001_jin_hito_person", 5 "onyomi": "jin", 6 "kunyomi": "hito", 7 "gloss": "person", 8 "entry_count": 245, 9 "generated": "2026-01-22T10:30:00Z" 10 }, 11 "entries": [ 12 { 13 "id": "01234_akunin", 14 "headword": "{悪|あく}{人|にん}", 15 "reading": "あくにん", 16 "gloss": "villain, bad person" 17 } 18 ] 19}
Design Decisions
Why invisible links?
- Preserves clean headword appearance
- Users discover feature through tooltip
- No visual clutter
Why romaji in kanji IDs?
- ASCII-safe file names
- Human-readable
- Easy to search and sort
Why sort by reading?
- Natural Japanese ordering (gojuon)
- Consistent with how dictionaries organize entries
- Helps users find related words