jQuery: Parse/Manipulate HTML without executing scripts

By : noah
Source: Stackoverflow.com

I'm loading some HTML via Ajax with this format:

<div id="div1">
  ... some content ...
<div id="div2">
  ...some content...
... etc.

I need to iterate over each div in the response and handle it separately. Having a separate string for the HTML content of each div mapped to the id would satisfy my requirements. However, the divs may contain script tags, which I need to preserve but not execute (they'll execute later when I stick the HTML into the document, so executing during parsing would be bad). My first thought was to do something like this:

// data being the result from $.get
var clean = data.replace(/<script.*?</script>/,function() {
    // insert some unique token, save the tag, put it back while I'm processing

$('<div/>').html(clean).children().each( /* ... process here ... */);

But I worry that some stupid dev is going to come along and put something like this in one of the divs:

<script> var foo = '</script>'; // ... </script>

Which would screw it all up. Not to mention, the whole thing feels like a hack to begin with. Does anyone know a better way?

EDIT: Here's the solution I've come up with:

var divSplitRegex = /(?:^|<\/div>)\s*<div\s+id="prefix-(.+?)">/g,
    idReplacement = preDelimeter+'$1'+postDelimeter;
var r = data.replace(<\/div>\s*$/,'').
$.each(r,function() {
    var content;
    if(this) {

Where preDelimiter and postDelimeter are just unique strings like "###I'd have to be an idiot to embed this string in my content unescaped because it would break everything###', and callback is a function expecting the div id and the div content. This only works because I know that the divs will have only an id atribute, and the id will have a special prefix. I suppose someone could put a div in their content with an id having the same prefix and it would screw things up too.

So, I still don't love this solution. Anyone have a better one?

By : noah


Probably, an alternative approach will be useful for you. You can use the following function to prevent JavaScript from running:

function preventJS(html) {
   return html.replace(/<script(?=(\s|>))/i, '<script type="text/xml" ');

And it preserves the script-tags inside the DOM, so scripts can be used later.

I described this way in my blog here - JavaScript: How to prevent execution of JavaScript within a html being added to the DOM.

By : perpetus

In some cases removing script tags results in invalid html:

        <p>This should be
        <script type="text/javascript">
By : Shannon

FYI, Using unescaped in any JavaScript script causes this issue in a browser. Developers have to escape it anyway so there is no excuse. So you can "trust" that would break in any case.

     alert('<script> tags </script> are not '+
         'valid in regular old HTML without being escaped.');



to see it break. :)

This video can help you solving your question :)
By: admin