URL Validation in WordPress

3 minute read

If your WordPress website/theme/plugin allows users to submit URLs, and you’re not sanitizing them properly, you could have a whole host of security problems. On the flipside, if you’re removing too much, you might not be allowing valid URLs either.
This issue is pretty complex, and there’s quite a bit of confusion surrounding it, but it actually has a really simple solution in WordPress.

tl;dr; take-away

If your WordPress website/theme/plugin accepts user input of a URL, you need to either:

  • sanitize it using esc_url_raw($url) before saving it
  • validate it using esc_url_raw($url) === $url , and if it fails validation, reject it

Don’t use filter_var($url, FILTER_VALIDATE_URL)!

A Problematic Issue

Apparently, determining what’s a valid URL in PHP is a struggle. And it has been for a while.

“Just use filter_var” It’s So Easy…

The “PHP approved” way of doing it is to use PHP’s built-in function filter_var($url, FILTER_VALIDATE_URL). That’s the most commonly accepted answer on Stack Overflow.So, problem solved right?
Actually, using filter_var has a number of problems:

    • it rejects URLs with underscores, eg http://my_site.com
    • it accepts URLs that could lead to Cross-Site-Scripting Attacks, eg http://example.com/">
  • it rejects URLs with multi-byte characters, eg http://스타벅스코리아.com

The problems with filter_var are explained better in this article, and discussed extensively on php.net’s documentation page.
Some have argued that filter_var is technically correct about what’s a valid or invalid URL. But really what we want is a **safe** URL, not just a technically valid one. And filter_var doesn’t do much to verify the URL is safe.
So that’s no good.

Just Make a Regex…

How hard is it to make a Regular Expression to validate URLs? Some poor soul asked that question once on Stack Overflow, and received a barrage of 19 answers. There was really no universally accepted answer (the most popular saying to use filter_var, and the accepted answer had other issues). Besides, I personally find regexes impossible to understand.

Just Find a Library to Do That…

Mika Epstein blogged about her struggles to validate a URL. She found a PHP library that mostly did it, but it still required some tweaking.
If you’re not using WordPress, you’re right to look for a pre-made library to do this, because it’s not straight forward…
I personally was quite unsatisfied that such a common task had no well-documented, good solution.

WordPress’ built-in Solution to Validating URLs

It turns out WordPress has a good option that’s super simple: esc_url_raw(). As documented here on wordpress.org, the function is meant to sanitize a URL before saving it to the database (not to prepare for outputting on the screen, that’s what esc_url() is for.)
Technically, the function is for sanitizing URLs (ie, removing bad stuff from them), not validating them (asserting whether or not they’re valid). But you can use it for validating like so:


function isValid($url)
 {

    return  esc_url_raw($url) === $url;

}

If the url had nothing invalid in it, then it’s valid. Pretty simple eh?
And it works well too. None of the criticisms of filter_var apply to it. We ran it through some unit tests and I have yet to see any problems with it.

Invalid URLs according to esc_url_raw():

http://example.com/"<script>alert("xss")<script>
php://filter/read=convert.base64-encode/resource=/etc/passw
foo://bar
javascript://test%0Aalert(321)

Valid URLs According to esc_url_raw():

http://foo.bar?foo=bar&other=thing
http://스타벅스코리아.com
http://localhost

A Better-Sounding, but Inferior, Alternative

There’s also a better-sounding function, wp_http_validate_url(). But from my testing, it found `http://localhost` invalid, when it should be valid. And it found URLs like  ​​​http://example.com/"<script>alert("xss")<script> to be valid.
The light documentation says this function is primarily meant for validating a URL for use in the WP HTTP API, not for storing a user-submitted URL. So although it’s name sounds better, it’s probably not what you’re looking for, unless you’re using the WP HTTP API.

Conclusion

esc_url_raw() function is used to ensure website URLs of commenters on WordPress websites are safe. Ie, it’s used to sanitize input from public users on websites running over 30% of the web, so it’s pretty battle-tested. If there was a security problem with it, or it was rejecting valid URLs, I’m pretty sure it would have already been discovered.
So is a URL valid? Just check esc_url_raw($url) === $url.
Thoughts on this?

One thought on “URL Validation in WordPress

Leave a Reply