2009-10-05

Safe for XML files

I've been working more and more with creating and parsing XML files with ExtendScript. Came across a problem when trying to write a string that was enclosed in < and >
ExtendScript complained when asked to convert the file's contents to an XML reference that there was a missing tag. And it was correct, at least for its way of thinking.
So below is the quick little function to parse a string to convert the 4 reserved characters into their entity references necessary to write them.

Note, publishing this requires that a bunch of things get further escaped to post. I've got no idea how it will copy back out.


function entityReference ( str ) {
//-------------------------------------------------------------------------
//-- E N T I T Y R E F E R E N C E
//-------------------------------------------------------------------------
//-- Generic: Yes. Works for ExtendScript and likely for all ECMAScript
//-- and even JavaScript providing the Regular Expression converts.
//-------------------------------------------------------------------------
//-- Purpose: To replace the XML Reserved Characters in a passed string
//-- with their Entity References
//-------------------------------------------------------------------------
//-- Arguments: A string to clean up.
//-------------------------------------------------------------------------
//-- Calls: Nothing.
//-------------------------------------------------------------------------
//-- Returns: The string with the reserved characters replaced with their
//-- entitity references
//-------------------------------------------------------------------------
//-- Sample Use:
//~ var unfitForXML = '< open & ampersand > close % percent'
//~ var safeForXML = entityReference ( unfitForXML ) ;
//-------------------------------------------------------------------------
//-- Notes:
//-- 1) You can add any other cleanups you need for your particular XML
//-- 2) Be wary of using something to excape the entity references
//-- prior to sending to this function as you will escape all the
//-- ampersands have have areal mess in the XML.
//-------------------------------------------------------------------------
//-- Written: 2009.10.04 by Jon S. Winters of electronic publishing support
//-- eps@electronicpublishingsupport.com
//-------------------------------------------------------------------------
//-- replacing each individually to make it more portable and easier to read
str = str.replace ( new RegExp ('&' , 'gm' ) , '&amp;' ) ;
str = str.replace ( new RegExp ('>' , 'gm' ) , '&gt;' ) ;
str = str.replace ( new RegExp ('<' , 'gm' ) , '&lt;' ) ;
str = str.replace ( new RegExp ('%' , 'gm' ) , '&#37;' ) ;
//-- return to caller
return str ;
}
//

No comments:

Post a Comment