jQuery: Parse/Manipulate HTML without executing scripts

By : noah
Source: Stackoverflow.com
Question!

I'm loading some HTML via Ajax with this format:

<div id="div1">
  ... some content ...
</div>
<div id="div2">
  ...some content...
</div>
... etc.

I need to iterate over each div in the response and handle it separately. Having a separate string for the HTML content of each div mapped to the id would satisfy my requirements. However, the divs may contain script tags, which I need to preserve but not execute (they'll execute later when I stick the HTML into the document, so executing during parsing would be bad). My first thought was to do something like this:

// data being the result from $.get
var clean = data.replace(/<script.*?</script>/,function() {
    // insert some unique token, save the tag, put it back while I'm processing
}); 

$('<div/>').html(clean).children().each( /* ... process here ... */);

But I worry that some stupid dev is going to come along and put something like this in one of the divs:

<script> var foo = '</script>'; // ... </script>

Which would screw it all up. Not to mention, the whole thing feels like a hack to begin with. Does anyone know a better way?

EDIT: Here's the solution I've come up with:

var divSplitRegex = /(?:^|<\/div>)\s*<div\s+id="prefix-(.+?)">/g,
    idReplacement = preDelimeter+'$1'+postDelimeter;
var r = data.replace(<\/div>\s*$/,'').
    replace(divSplitRegex,idReplacement).split(preDelimeter);
$.each(r,function() {
    var content;
    if(this) {
        callback.apply(null,this.split(postDelimeter));
    }
});

Where preDelimiter and postDelimeter are just unique strings like "###I'd have to be an idiot to embed this string in my content unescaped because it would break everything###', and callback is a function expecting the div id and the div content. This only works because I know that the divs will have only an id atribute, and the id will have a special prefix. I suppose someone could put a div in their content with an id having the same prefix and it would screw things up too.

So, I still don't love this solution. Anyone have a better one?

By : noah


Answers

Probably, an alternative approach will be useful for you. You can use the following function to prevent JavaScript from running:

function preventJS(html) {
   return html.replace(/<script(?=(\s|>))/i, '<script type="text/xml" ');
}

And it preserves the script-tags inside the DOM, so scripts can be used later.

I described this way in my blog here - JavaScript: How to prevent execution of JavaScript within a html being added to the DOM.

By : perpetus


In some cases removing script tags results in invalid html:

 <html>
    <head>
    </head>
    <body>
        <p>This should be
        <script type="text/javascript">
            document.writeln("<b");
        </script>>bolded</b>.
    </body>
 </html>
By : Shannon


FYI, Using unescaped in any JavaScript script causes this issue in a browser. Developers have to escape it anyway so there is no excuse. So you can "trust" that would break in any case.

<body>
 <div>
   <script>
     alert('<script> tags </script> are not '+
         'valid in regular old HTML without being escaped.');
   </script>
</body>

See

http://jsbin.com/itevu

to see it break. :)



This video can help you solving your question :)
By: admin