Load HTML from XML – Part II

Update: had to take another look at this due to Flash busting my character entities!

In the last installment, I showed you a function that will walk through an XML node's (multi-part) children and return an HTML string. This approach is unfortunately flawed -- whitespace collapsing is a bit over-eager resulting in XML such as the following:
This is a <a href="page.html">link</a>

converting to:
This is alink
(Note the missing space.)

Fortunately, I have not only a solution, but since it uses regex, it ought to be a good bit more efficient:

package {

	import StringUtils;
	import CharacterEntity;

	public class XmlUtil {

		static public function getHTMLContent (xml:*):String {
			//trace (typeof(xml) + "   " + xml.toXMLString())

			if (typeof (xml) == 'string') xml = new XML(xml)

			var html = ""
			var prettyPrint = XML.prettyPrinting
			XML.prettyPrinting = false
			var ignoreWhite = XML.ignoreWhitespace
			XML.ignoreWhitespace = false

			var children = xml.children()
			var len = children.length()
			if (len)
			{
				//trace ('Multiple Children')
				for ( var i=0; i<len; i++ )
				{
					var decoded = CharacterEntity.decodeXHTML(children[i].toXMLString() , true)
					html += decoded
				}
				html = StringUtils.removeExtraWhitespace( html )

			}
			else
			{
				//trace ('Simple Content')
				var str = StringUtils.removeExtraWhitespace( CharacterEntity.decodeXHTML(xml.toXMLString(), true) )
				html += str
			}

			XML.prettyPrinting = prettyPrint
			XML.ignoreWhitespace = ignoreWhite

			//logger.info ("HTML " + escape(html))

			return html
		}
	}
}

You'll need two libraries: StringUtils from the worship-worthy studio of Grant Skinner CharacterEntity, originally written for AS2 by Jim Cheng and kindly converted to AS3 by Thirdparty Labs.

The code is a lot simpler now, but for completeness, I'll give you a quick run-down. If you pass in a String (accessing an attribute or text node could actually cause this), we convert it to XML first. First, we turn ignoreWhitespace off since it's the source of the issue above. Walk through the children (if they exist) decoding the entities and remove any additional whitespace. The "true" parameter on the decodeXHTML method is explained in this post.

Load HTML from XML source – Part I

UPDATE:
I managed to forget TextField.condenseWhite = true!

Now, although that will handle 90% of your XML -> HTML whitespace issues. There are a few scenarios where the hints in this series come in handy:

  • You want more control over XML -> HTML tag handling. For instance you want to avoid an extraneous wrapping tag.
  • The CharacterEntity class is still relevant and useful.
  • You're using TextField.styleSheet. If you're not using a stylesheet, you can set the htmlText and then access the html with collapsed whitespace by reading the htmlText property back. If you do use a stylesheet, sniffing TextField.htmlText won't show collapsed whitespace).
  • We often, pull our dynamic content into Flash via XML. A Lot. As in pretty much exclusively. Sometimes, we'll have honest-to-goodness information architecture to carve up that data semantically. We end up with a structured tree of simple text nodes. The XML parsing routine figures out how to turn this into a view. Great for all kinds of reasons. But sometimes, we just want to treat a node as a block of HTML and pass it into the htmlText property of a TextField.

    This is unnecessarily hard. Neither XML.toString() nor XML.toXMLString() do what you want for this case. But the following function will:

    private function getHTMLContent (xmlString:String):String
    {
      var html = ""
      var prettyPrint = XML.prettyPrinting
      XML.prettyPrinting = false
      var ignoreWhite = XML.ignoreWhitespace
      XML.ignoreWhitespace = true

      var xml:XML = new XML (xmlString)
      var children = xml.children()

      if (children.length())
      {
        for each (var i in children)
        {
          html += i.toXMLString()
        }
      }
      else
      {
        html += xml
      }

      XML.prettyPrinting = prettyPrint
      XML.ignoreWhitespace = ignoreWhite

      return html
    }

    To use it, pass the XML node containing the HTML, and the function will return the appropriate HTML String. Let's say your XML looks like:

    <services>
    <description>Lorem ipsum dolor sit amet, consectetur adipiscing elit.<br /><br />Aliquam in lectus quis nisl lacinia dignissim.
    <ul>
      <li>Proin viverra.</li>
      <li>Phasellus tristique.</li>
    </ul>
    </description>
    </services>

    Then, after loading the XML document, you'd do something like:

    var desc = getHTMLContent(myXML.description)
    content_tf.condenseWhite = true
    content_tf.htmlText = desc

    If you skip this step, you'll end up including the description node (which isn't what you want) or mangling the mixed-content nature (omitting the br tags or the parent-less "Lorem ipsum" opening sentence or something equally as wrong). The script should also works if your content is a straight-forward text node.

    The prettyPrint and ignoreWhitespace stuff keeps your whitespace from wreaking havok with Flash's HTML format (which is different from a browser in that it'll happily add extra newlines when they appear in the source).

    Next time, I'll incorporate Rob's entity decoding script since Flash's built-in entity support is pretty lacking.

    FDT trace w/o Debug Revisited

    My last post on this subject outlined a pretty good solution to this problem. I had one minor annoyance, and the further I chased it, the further from my original solution I drifted. Tail-ing the Flash debug log works great. It's fast, simple, easy to set up -- but I wasn't quite satisfied. I wanted to clear the log at compile so that Eclipse's console window would only contain text from the current run. Clearing the log file is easy enough, but getting Eclipse's console window to refresh turned out to be next to impossible -- at least via an Ant workflow (comments very welcome).

    I dropped a note on the FDT message board, and one of the moderators pointed me to SosMax -- their free XML-based logging server. I was immediately resistant. That's just way way too much overhead for something as simple as trace output. But I gave it a chance, and am I ever glad I did. After struggling with the initial setup (as I always do with these packages), I quickly became a convert. SosMax really is a sweet little system, and after a bit of hackery, I was able to get it incorporated into my project without converting all my trace statements to myLogger.debug calls.

    In addition to the great filtering, color-coding, etc features you'd expect with a custom logger, SosMax will also pull the trace log file, and it has an option for clearing the console automatically when a new connection is established. Ah ha!

    Rather than write a log connector class from scratch, I downloaded one from Sönke Rohde . His class was almost perfect for my needs: by default, he only connects to the logging server if there's a message to be sent (ie, the first time you attempt to send a message). I only wanted to connect for the purpose of clearing the console, so I moved the connection logic into its own public method. Then, in my Main class:


    import com.soenkerohde.logging.SOSLoggingTarget; 

    import mx.logging.Log;
    import mx.logging.ILogger;
    import mx.logging.LogEventLevel;

    public class Main {

    private static const logger:ILogger = Log.getLogger("Main");

    public function Main()
      {

       var sosLoggerTarget = new SOSLoggingTarget();
       //sosLoggerTarget.includeCategory = true;
        sosLoggerTarget.includeLevel = true;

       Log.addTarget(sosLoggerTarget) 
        sosLoggerTarget.connect()

     }
    }

    Trace in FDT without Debug Perspective

    I recently switched to OS X, so I can't use my beloved FlashDevelop. FDT is really the most viable option despite the price tag. It takes awhile to get used to Eclipse and get all the settings dialed in, but it's worth it.

    The only last major issue I had was getting trace output from the debug player to appear in the Eclipse console without launching a Debug task (way too much additional overhead just to see trace messages).

    I'm on OS X which has 'tail' -- if you're on windows (why aren't you using FlashDevelop?), you'll have to use cygwin or get fancy with the batch scripting. There's also something called tail.exe from MS...

    Found this: http://flash-focus.blogspot.com/2007/06/creating-bare-bones-output-window-in.html which outlines how to configure the Debug Player to drop trace messages into a log file.

    First, I followed those instructions ... then in Eclipse added an External tool configuration:

    Run >> External Tools >> External Tools Configurations
    New Program
    Location: /usr/bin/tail
    Arguments: -f "/Users/MYUSER/Library/Preferences/Macromedia/Flash Player/Logs/flashlog.txt"

    I wasn't able to point to the user Library via ~/Library -- maybe that's an Eclipse shortcoming...

    Launch that, and it appears as one of the running console logs, then you can run your build normally, ANT or what-have-you, and you'll still get the trace output without all the debug perspective stuff, and it's about a zillion times faster. Yay!

    Seems like there ought to be console level configuration somewhere, but I'm no Eclipse expert.

    Hope this helps someone else.

    Weird note of the day. If you place a multiline input Textfield on the stage (as opposed to creating it with the class constructor), it will contain a newline by default rather than the expected null. Simply reset it to proceed normally:
    myTextField.text = ""

    AS3 package names vs property names

    Tattoo it on your arm, whatever it takes to remember:

    Package names can't match property names in your classes. Try to plan accordingly. If you accidentally create a conflict, the compiler might not be as helpful as you'd hope.

    Let this be a warning to you.

    AS3 scrollRect vs height & getBounds()

    The new scrollRect property in AS3 DisplayObjects is pretty cool. If you haven't run into it yet, it's a fairly simple way to set a rectangular window on a larger piece of content without the hassle of drawing a rectangle and using a mask and inverse positioning the content... For the most common masking operations it comes in really handy.

    And then reality sets in. You want to check the height of your content so you can set up boundary conditions for the scrolling behavior. But! When you get the height property, it's been modified to reflect the fact that the content is now masked. You don't get the "native" height anymore. What's even worse, this update to the property doesn't take effect until the next frame (following a change to scrollRect -- and you'll need another frame if you're doing this the first frame the asset is on the stage -- YMMV), so you've got to add (and remove) a temporary enterframe event listener ... it's a total mess. Thinking about using getBounds instead? Save yourself the trip. It works (as in "doesn't work") the same way.

    Now suppose you suck it up and decide to deal with waiting a frame after your content fires onResize. You can handle a little add/remove listener juggling. No biggie. Alas, you've got no way to get access to your original height once a scrollRect has been set. You could temporarily remove the scrollRect, but that change won't take effect until the next frame. Gah!

    So, I went digging through the docs to find an alternative. The only thing that looked even remotely promising was the transform property. I went through my code and replaced references to content.height with content.transform.pixelBounds.height and I was in business. As far as I can tell, this property responds appropriately to scale, rotation, etc while omitting any scaleRect clipping. However, if you were to grab content.parent.transform.pixelBounds.height, it would take the clipping into account.

    Hopes this helps someone else out there...

    -- UPDATE --

    You know what's awesome? If the DisplayObject isn't on the stage, pixelBounds values are the "native" x, y, width, and height you'd expect looking at the Flash IDE property inspector. Times 5. Don't look at me like that. Try it. You'll get values five times bigger than you ought to. Maybe it's something to do with twips...

    The whole idea of this exercise was to find a way to grab the height of a DisplayObject without a bunch of if/else hassle. But if you've got to sniff for a stage object, you might as well sniff for the scrollRect object itself -- and the value you read won't be subject to the frame delay. That's only a problem when reading it via the height property. Hope that clears up any confusion.

    , , , ,