Omni Systems, Inc.

  

Mif2Go User's Guide, Version 55

  

Valid HTML 4.01!

 

Made with Mif2Go

9 Generating Microsoft HTML Help > 9.2 Understanding why Unicode is not the answer


9.2 Understanding why Unicode is not the answer

Microsoft HTML Help does not use Unicode; instead it uses Windows code pages. This means that characters with glyphs that are not present in the default code page (for Western languages this is ANSI code page 1252) might not display correctly, and will interfere with use of TOC, index, and search functions

People often think they can get away with using Unicode encoding instead of code-page encoding, because the HTML Help viewer uses Internet Explorer to display the topic pane, and Internet Explorer does understand Unicode. However, if you use any non-ANSI (above U+007F) characters, search will not work right, and if any of your non-ANSI characters appear in titles or in index terms, the TOC and index will not work right, either. If you are processing a language with accented characters, such as German, you cannot get away with Unicode in the topic pane. For example, Unicode represents code points from hexadecimal A0 to FF as two-byte UTF-8 sequences, and code page 1252 represents them as single characters. So even though the code points are the same, and the display looks fine, search fails because the single byte in the search string does not match the two bytes in the UTF-8 encoding.

With a few isolated symbols, you might get away with Unicode content, but it is not good practice. Mif2Go goes to considerable lengths to convert from Unicode to code page for HTML Help. It is not trivial; for Asian languages, Mif2Go uses enormous look-up tables and dozens of lines of C++ code. It is a Bad Idea to blow it off and use Unicode in any form (including numeric character references) instead.

It might be easy to dismiss all this when your language is English, but the rest of the world feels differently.

See also:

§9.13 Generating HTML Help in non-Western languages

§21.5 Assigning properties to text formats



9 Generating Microsoft HTML Help > 9.2 Understanding why Unicode is not the answer