I am trying to find a configuration that will allow Word to render Chinese characters contained in a UTF-16 text file.
The text file may contain any combination of characters in
CJK Unified Ideographs (4E00–9FFF) CJK Unified Ideographs Extension A (3400–4DBF) CJK Unified Ideographs Extension B (20000–2A6DF) Latin (English) characters from BMP The plain text file will not have any information indicating what block the text is in. I have been running tests on a text file containing these few characters:
Traditional Chinese 義 禮 Simplified Chinese 义 礼 CJK Extension A 㡛 㬐 CJK Extension B ઘƠঞՍ
I tried editing font linking in the registry at [HKEY_LOCAL_MACHINE\SOFTWARE\Microsoft\Windows NT\CurrentVersion\FontLink\SystemLink]
Word correctly detects and displays the first 6 characters. However, Word sets the font of the last two characters as SimSun, a character that does not contain the necessary characters. I should point out that those two characters are surrogates in UTF-16. However, Word has not displayed them correctly even when I try UTF-8.
My edit to the registry was to add these fonts to each base font (including Courier New): Sun-ExtB.ttf,Sun-ExtB SimSun18030.ttc,SimSun-18030 Sun-ExtA.ttf,Sun-ExtA HAN NOM B.ttf,HAN NOM B
All four fonts are installed on my system. I also made sure to edit [HKEY_LOCAL_MACHINE\SOFTWARE\Microsoft\Windows NT\CurrentVersion\LanguagePack] "SURROGATE"=dword:00000002
Surrogate fallback planes 1 and 2 are configured to HAN NOM B, which is a CJK Extension B-containing font.