Correctly filtering XSS from input is pretty hard. Most XSS filters can be bypassed by modifying the payload. For example, the following HTML will pass through your XSS filter:
<h1 onmouseover =alert`XSS`>Test</h1>
Because there is a space between onmouseover
and =
, your regex doesn't match. Another possible bypass is this:
<h1 oonmouseover=nmouseover=alert`XSS`>Test</h1>
The occurrence of onmouseover=
will be removed once, leaving the following:
<h1 onmouseover=alert`XSS`>Test</h1>
One possibility is to use HTML Purifier, a library that is pretty good at cleaning up XSS. But the real solution against XSS is output encoding, so that when a user enters <script>alert(1)
, it just appears on the webpage as <script>alert(1)
instead of being parsed as HTML.