14.3.3 Specifying character encoding for generic XML

Character encoding determines the method used to represent character value greater than 0x7F (decimal 127). Such double-byte characters constitute the “high ASCII” set; whereas FrameMaker characters, except in the Japanese version, are all single-byte. The default for XML output is UTF-8:

[HTMLOptions]

; Encoding = UTF-8 (XML default), ISO-8859-1 (HTML default, numeric

; refs), or None (write 0x80-0xFF as single characters)

Encoding=UTF-8

; XMLEncoding default is "UTF-8", entities are used for ANSI chars

XMLEncoding=UTF-8

; NumericCharRefs = Yes (default, always use &#nnn;)

; or No (use UTF-8 for XML)

NumericCharRefs=No

Entity references for browsers

If your XML output is to be rendered by Web browsers, be aware that even though UTF-8 is the XML standard encoding, many browsers do not support it. The Mif2Go default is to claim UTF-8 as the encoding, but to use numeric references of the form &#nnn; for all characters that would have to be encoded; this satisfies all browsers. That is, with default settings, Mif2Go does not actually produce any characters with values greater than 127 using the UTF-8 encoding; instead, Mif2Go uses entities for such characters, readable under any eight-bit encoding scheme.

The setting for XMLEncoding controls the content of the encoding attribute of the XML declaration. If you set Encoding=UTF-8, you get real UTF-8 encoding (two characters) in place of the numeric character references. However, you can still force use of numeric references by also setting NumericCharRefs=Yes.

While Encoding=None is not strictly compliant, this setting can be useful in places like Russia, where almost the entire text would otherwise consist of numeric character references. Encoding=None provides a 6:1 reduction in such references.

See also:

§13.3 Including starting code and entity references

§13.4.3 Specifying character encoding for HTML

§13.16.2 Replacing high ASCII characters for W3C validation

§21.5 Assigning properties to text formats