Nothing's quite as fun as staying up until 2am hacking an entity encoding problem. We're committed to keeping this excitement all to ourselves (so you don't have to).
Technically, XML only supports & < > " '. All the others are XHTML. Flash also supports . Flash doesn't natively understand anything else, but you can add the support yourself. You're on the wrong side of the spec though, unless you add some processing instructions to define these new-fangled entities. Flash can't understand those either, but they'll keep browsers (and other parsers that do) from choking—and hey, interoperability is supposed to be the whole point of XML, right?
Before we dive in, two thoughts:
- I hate CDATA. I can never remember the sequence, and that means I always screw it up. You have to go to all the trouble to stick it in there, and then parse for it on the other side. No thanks.
- Numeric entities are supported by XML, but who can remember those? I just want to enter em-dashes and curly quotes, and I want to be able to recognize that in my highly-readable-non-cdata-infested markup.
-
It'd be nice if you're doing transformations on your HTML (a future article will explore an example), to be able to leverage the XML parser so you'll have access to e4x, XMLList, prettyPrinting, etc. Wrapping HTML in CDATA helps you load it and apply as-is, but doesn't get you around Flash's re-encoding of "orphaned" ampersands.
Download XmlUtil, CharacterEntity, and StringUtils
It isn't required for Flash, but you'll want to add a DTD, or an inline definition of your entities like this:
<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE site [
<!ENTITY mdash "—">
]>
"site" is the top-level node of your document—this one happens to be from a Gaia Framework project I've been working on. In this example, we've added "mdash," so now we can use — in our content.
How do we use this entity-encoded content in Flash?
var copyContent = XmlUtil.getHTMLContent ( myXML.description)
copy_tf.htmlText = "<body>" + copyContent + "</body>"
The body tag's on there because I'm using a stylesheet (with body rules). I could have put it directly in my XML if I'd wanted to. This example assumes my document has a node <description> as a direct child of the document root.
As I encountered in this post, ActionScript will reENcode ampersands on entities it doesn't understand when you get XML content via toXMLString(). The getHTMLContent() function DEcodes all & to & before continuing to replace the XHTML entity set. Not exactly elegant, but it's better than the 1st thing I had, and I'm totally open to suggestions.
Additional Reading:



