How browsers are parsing HTML script tags

LyubomirSeptember 21, 2019September 21, 2019

The project that caused this article is React and headless architecture, a lot of API calls are returning JSONs.
We had a problem, sometimes we had white page because of a lot of JS errors.

Do you know how browsers are parsing the HTML and especially the script tags?

It is sequentially, line by line, character by character.
Look at the below HTML, what do you think will happen if you run this code?

    <html>
        <head>
        </head>
        <body>
            <script>
                let myJsonBackendResponse = {"source": "</script><script>alert(1)</script>"};
            </script>
        </body>
    </html>

If you answered that it won’t run the alert() you are wrong.
Browsers will read the above closing script tag in the JSON, close the first open tag, then opens new, execute the alert() and then stop the reading till you close it 🙂

In order to fix it we have 2 options, remove the closing tag from the response OR split it however you want and it won’t be parsed as closing tag

    <html>
        <head>
        </head>
        <body>
            <script>
                let myJsonBackendResponse = {"source": "</" + "script><script>alert(1)</" + "script>"};
            </script>
        </body>
    </html>

Our case was that one of the APIs is returning closing script tag (probably you should not have this) and breaks everything.

Hope it is helpful for someone!

3 thoughts on “How browsers are parsing HTML script tags”

Bob Flavin says:

October 20, 2019 at 4:19 am

Is treated as a special case? where the browser would scan to the tag? (Rather than looking for another element like … — From an XML point of view shouldn’t it allow nested elements (even though it wouldn’t ‘make sense’ in a element?)

Also, strictly speaking, shouldn’t the ‘<' be replaced with an < escape sequence or something?

Is this just a convenience for developers to be able to put (almost) anything in a element?

Reply
Bob Flavin says:

October 20, 2019 at 4:23 am

Relevant snippet from w3schools.com:

In XHTML, the content inside scripts is declared as #PCDATA (instead of CDATA), which means that entities will be parsed.

This means that in XHTML, all special characters should be encoded, or all content should be wrapped inside a CDATA section:

//<![CDATA[
var i = 10;
if (i

Reply
superfly says:

October 21, 2019 at 10:04 pm

It should not be special case.
Surrounding with CDATA won’t fix this and the escaping of the <> will fix it.

Reply

3 thoughts on “How browsers are parsing HTML script tags”

Leave a Reply Cancel reply