Omni Systems, Inc.

  

Mif2Go User's Guide, Version 55

  

Valid HTML 4.01!

 

Made with Mif2Go

13 Converting to HTML/XHTML > 13.4 Supplying values for the <head> element > 13.4.3 Specifying character encoding for HTML > 13.4.3.3 Specifying encoding for double-byte characters


13.4.3.3 Specifying encoding for double-byte characters

Character encoding determines what method is used to represent double-byte characters in the <body> section of HTML output. To specify encoding or, alternatively, numeric references:

[HTMLOptions]

; Encoding = ISO-8859-1 (HTML default, numeric refs),

;  or None (write 0x80-0xFF as single characters)

Encoding=ISO-8859-1

; QuotedEncoding = No (default, W3C usage, required for JavaHelp),

;  or Yes (put encoding in meta tag in single quotes, needed by some

;  older browsers)

QuotedEncoding=No

; NumericCharRefs = Yes (default, always use &#nnn;)

; or No (use UTF-8 for XML)

NumericCharRefs=Yes

For XHTML, the Mif2Go default is to claim UTF-8 as the encoding, but to use numeric references of the form &#nnn; for all characters that would have to be encoded; this satisfies all browsers. That is, Mif2Go does not actually produce any characters with values greater than 127 using the UTF-8 encoding; instead, Mif2Go uses entities for such characters, readable under any eight-bit encoding scheme.

For XHTML, you can specify a value for XMLEncoding (see §14.3.3 Specifying character encoding for generic XML) other than the default UTF-8. If you set Encoding=UTF-8, you get real UTF-8 encoding (two characters) in place of the numeric character references. However, you can still force use of numeric references by also setting NumericCharRefs=Yes.

While Encoding=None is not strictly compliant, this setting can be useful in places like Russia, where almost the entire text would otherwise consist of numeric character references. Encoding=None provides a 6:1 reduction in such references.

To direct Mif2Go to supply single quotes around the charset attribute value, specify QuotedEncoding=Yes:

<meta http-equiv="Content-type" content="text/html; charset='ISO-8859-1'">

The default is not to enclose the value in quotes.

See also:

§13.16.2 Replacing high ASCII characters for W3C validation

§14.3.3 Specifying character encoding for generic XML

§21.5 Assigning properties to text formats



13 Converting to HTML/XHTML > 13.4 Supplying values for the <head> element > 13.4.3 Specifying character encoding for HTML > 13.4.3.3 Specifying encoding for double-byte characters