How browsers are parsing HTML script tags

The project that caused this article is React and headless architecture, a lot of API calls are returning JSONs.
We had a problem, sometimes we had white page because of a lot of JS errors.

Do you know how browsers are parsing the HTML and especially the script tags?

It is sequentially, line by line, character by character.
Look at the below HTML, what do you think will happen if you run this code?

    <html>
        <head>
        </head>
        <body>
            <script>
                let myJsonBackendResponse = {"source": "</script><script>alert(1)</script>"};
            </script>
        </body>
    </html>

If you answered that it won’t run the alert() you are wrong.
Browsers will read the above closing script tag in the JSON, close the first open tag, then opens new, execute the alert() and then stop the reading till you close it 🙂

In order to fix it we have 2 options, remove the closing tag from the response OR split it however you want and it won’t be parsed as closing tag

    <html>
        <head>
        </head>
        <body>
            <script>
                let myJsonBackendResponse = {"source": "</" + "script><script>alert(1)</" + "script>"};
            </script>
        </body>
    </html>

Our case was that one of the APIs is returning closing script tag (probably you should not have this) and breaks everything.

Hope it is helpful for someone!

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.