Omni Systems, Inc.

  

Mif2Go User's Guide, Version 55

  

Valid HTML 4.01!

 

Made with Mif2Go

14 Converting to generic XML > 14.3 Specifying generic XML output settings > 14.3.3 Specifying character encoding for generic XML


14.3.3 Specifying character encoding for generic XML

Character encoding determines the method used to represent character value greater than 0x7F (decimal 127). Such double-byte characters constitute the “high ASCII” set; whereas FrameMaker characters, except in the Japanese version, are all single-byte. The default for XML output is UTF-8:

[HTMLOptions]

; Encoding = UTF-8 (XML default), ISO-8859-1 (HTML default, numeric

; refs), or None (write 0x80-0xFF as single characters)

Encoding=UTF-8

; XMLEncoding default is "UTF-8", entities are used for ANSI chars

XMLEncoding=UTF-8

; NumericCharRefs = Yes (default, always use &#nnn;)

; or No (use UTF-8 for XML)

NumericCharRefs=No

Entity references for browsers

If your XML output is to be rendered by Web browsers, be aware that even though UTF-8 is the XML standard encoding, many browsers do not support it. The Mif2Go default is to claim UTF-8 as the encoding, but to use numeric references of the form &#nnn; for all characters that would have to be encoded; this satisfies all browsers. That is, with default settings, Mif2Go does not actually produce any characters with values greater than 127 using the UTF-8 encoding; instead, Mif2Go uses entities for such characters, readable under any eight-bit encoding scheme.

The setting for XMLEncoding controls the content of the encoding attribute of the XML declaration. If you set Encoding=UTF-8, you get real UTF-8 encoding (two characters) in place of the numeric character references. However, you can still force use of numeric references by also setting NumericCharRefs=Yes.

While Encoding=None is not strictly compliant, this setting can be useful in places like Russia, where almost the entire text would otherwise consist of numeric character references. Encoding=None provides a 6:1 reduction in such references.

See also:

§13.3 Including starting code and entity references

§13.4.3 Specifying character encoding for HTML

§13.16.2 Replacing high ASCII characters for W3C validation

§21.5 Assigning properties to text formats



14 Converting to generic XML > 14.3 Specifying generic XML output settings > 14.3.3 Specifying character encoding for generic XML