Omni Systems, Inc. Mif2Go User's Guide, Version 55
> 14 Converting to generic XML > 14.3 Specifying generic XML output settings > 14.3.3 Specifying character encoding for generic XML
Character encoding determines the method used to represent
character value greater than 0x7F
(decimal
127). Such double-byte characters constitute the “high ASCII”
set; whereas FrameMaker characters, except in the Japanese version, are
all single-byte. The default for XML output is UTF-8
:
; Encoding = UTF-8 (XML default), ISO-8859-1 (HTML default, numeric
; refs), or None (write 0x80-0xFF as single characters)
; XMLEncoding default is "UTF-8", entities are used for ANSI chars
; NumericCharRefs = Yes (default, always use &#nnn;)
Entity references for browsers
If your XML output is to be rendered by Web browsers,
be aware that even though UTF-8 is the XML standard encoding, many browsers
do not support it. The Mif2Go default is to claim UTF-8 as the encoding, but
to use numeric references of the form &#
nnn;
for all characters
that would have to be encoded; this satisfies all browsers. That is,
with default settings, Mif2Go does not actually produce any characters with
values greater than 127 using the UTF-8 encoding; instead, Mif2Go uses entities for such characters, readable under
any eight-bit encoding scheme.
The setting for XMLEncoding
controls the content of the encoding attribute
of the XML declaration. If you set Encoding=UTF-8
,
you get real UTF-8 encoding (two characters) in place of the numeric
character references. However, you can still force use of numeric references
by also setting NumericCharRefs=Yes
.
While Encoding=None
is not strictly compliant, this setting can be useful in places like
Russia, where almost the entire text would otherwise consist of numeric
character references. Encoding=None
provides
a 6:1 reduction in such references.
§13.3 Including starting code and entity references
§13.4.3 Specifying character encoding for HTML
§13.16.2 Replacing high ASCII characters for W3C validation
§21.5 Assigning properties to text formats