How browsers are parsing HTML script tags

Posted on

The project that caused this article is React and headless architecture, a lot of API calls are returning JSONs.
We had a problem, sometimes we had white page because of a lot of JS errors.

Do you know how browsers are parsing the HTML and especially the script tags?

It is sequentially, line by line, character by character.
Look at the below HTML, what do you think will happen if you run this code?

    <html>
        <head>
        </head>
        <body>
            <script>
                let myJsonBackendResponse = {"source": "</script><script>alert(1)</script>"};
            </script>
        </body>
    </html>

If you answered that it won’t run the alert() you are wrong.
Browsers will read the above closing script tag in the JSON, close the first open tag, then opens new, execute the alert() and then stop the reading till you close it 🙂

In order to fix it we have 2 options, remove the closing tag from the response OR split it however you want and it won’t be parsed as closing tag

    <html>
        <head>
        </head>
        <body>
            <script>
                let myJsonBackendResponse = {"source": "</" + "script><script>alert(1)</" + "script>"};
            </script>
        </body>
    </html>

Our case was that one of the APIs is returning closing script tag (probably you should not have this) and breaks everything.

Hope it is helpful for someone!

3 thoughts on “How browsers are parsing HTML script tags”

  1. Is treated as a special case? where the browser would scan to the tag? (Rather than looking for another element like … — From an XML point of view shouldn’t it allow nested elements (even though it wouldn’t ‘make sense’ in a element?)

    Also, strictly speaking, shouldn’t the ‘<' be replaced with an < escape sequence or something?

    Is this just a convenience for developers to be able to put (almost) anything in a element?

  2. Relevant snippet from w3schools.com:

    In XHTML, the content inside scripts is declared as #PCDATA (instead of CDATA), which means that entities will be parsed.

    This means that in XHTML, all special characters should be encoded, or all content should be wrapped inside a CDATA section:

    //<![CDATA[
    var i = 10;
    if (i

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.