Handling Polluted JSON in AJAX

5 minute read


The bane of all Javascript code using jQuery.ajax() is polluted JSON responses: when some HTML or other junk gets into your JSON response, making it unparsable and bringing the whole system to a screeching halt. In this post, I’ll show what you can do with polluted JSON to keep everything working.

How It’s Supposed to Work

Eg, let’s say you have the following Javascript code

jQuery.ajax(
    'https://mysite.com/users/123',
    {
        success: function(parsed_json) {
            alert('Hi ' + parsed_json.name
        }
    }
);

and it expects a response like:

{
"name": "Mike",

}

So, when it works properly, the call is made to the server, which responds with the JSON, and then the success callback is called which says “Hi Mike”. 🎉

When Things Go Wrong

Now what if there is some other code that echoes something else before your JSON response? Eg

<!-- Here is an HTML comment. Oups! Should I not put HTML comments in a JSON response? My bad!! -->
{
"name": "Mike",

}

This is what I’m calling “polluted JSON”, because the response is no longer parseable JSON, and so the success callback won’t get called. That little bit of extra text in the response just killed your code.

So what to do? The first and most obvious solution, is to remove that pollution server-side. It’s probably there by mistake, so fix that mistake.

But if you don’t control the entire server code, that might not be an option. For example, if you have a WordPress plugin or theme that sends an AJAX request and returns a JSON response, it’s possible another plugin or theme is outputting that pollution. And your users won’t really care who’s fault it is, to them it looks like your code doesn’t work.

Custom Converter to the Rescue

There’s a way to recover. The response in this case isn’t all bad. It’s just that there’s some junk before the JSON that we need to remove.

The way to do that is to provide jQuery.ajax() with a custom converter (documented here, just search for “converters”), that will remove all that pollution, like so:

jQuery.ajax(
	'https://mysite.com/users/123',
	{
		success: function(parsed_json) {
			alert('Hi ' + parsed_json.name);
		},
		// Use a custom converter to remove the pollution from the JSON
		converters: {
			'text json': function(result) {
				// Find the beginning of JSON object or array...
				const start_of_json = Math.min(
					result.indexOf('{'),
					result.indexOf('[')
				);
				// ...and only send that, skip everything before it.
				return jQuery.parseJSON(result.substring(start_of_json));
			}
		}
    }
);

Converters have the job of converting stuff from one format to another. Apparently there are lots of them, but the one we’re interested in is the one that takes the originally plaintext response and interprets it as JSON. Here’s how the code works:

  • converters = { tells jQuery we want to register custom converters
  • 'text json': function(result) { says we want a new the text-to-json converter
  • const start_of_json = Math.min(... looks for the beginning of a JSON object or array (ie, skip over everything before that)
  • return jQuery.parseJSON(result.substring(start_of_json)); invokes jQuery’s ordinary JSON parsing function, but only on the JSON part of the response, ignoring all the pollution that came before it

So, even with a response like

<!-- Here is an HTML comment. Oups! Should I not put HTML comments in a JSON response? My bad!! -->
{
"name": "Mike",

}

your code will still work. ✨

A limitation: the above converter assumes the JSON responses will always contain an object or array; and it assumes the earlier “pollution” won’t contain the characters { or [. To overcome that, a slightly more complex solution is needed: you need to recursively check for a substring that is valid JSON. Here is my code that does that:

jQuery.ajax(
	'https://mysite.com/users/123',
	{
		success: function (parsed_json) {
			alert('Hi ' + parsed_json.name);
		},
		// Use a custom converter to remove the pollution from the JSON
		converters: {
			'text json': function (result) {
				let new_result = result;
				// Sometimes other plugins echo out junk before the start of the real JSON response.
				// So we need to chop off all that extra stuff.
				do {
					// Find the first spot that could be the beginning of valid JSON...
					var start_of_json = Math.min(
						new_result.indexOf('{'),
						new_result.indexOf('['),
						new_result.indexOf('true'),
						new_result.indexOf('false'),
						new_result.indexOf('"')
					);
					// Remove everything before it...
					new_result = new_result.substring(start_of_json);
					try {
						// Try to parse it...
						let i = jQuery.parseJSON(new_result);
						// If that didn't have an error, great. We found valid JSON!
						return i;
					} catch (error) {
						// There was an error parsing that substring. So let's chop off some more and keep hunting for valid JSON.
						// Chop off the character that made this look like it could be valid JSON, and then continue iterating...
						new_result = new_result.substring(1);
					}
				} while (start_of_json !== false);
				// Never found any valid JSON. Throw the error.
				throw "No JSON found in AJAX response using custom JSON parser.";
			}
        }
    }
);

Making WordPress Backbone JS client Handle Polluted JSON

I had this exact problem recently with my WordPress plugin Print My Blog. It loads a regular-ish page, then uses the WordPress REST API Backbone Client to fetch all the posts over AJAX, so that they can all be placed on the same page, so the user can easily print their blog to paper or PDF etc. It works quite surprisingly well- unless some other plugin pollutes the JSON response.

In that case, I needed to jump through one or two more hoops, because I needed to tell Backbone to handle the polluted JSON. Basically, I needed to customize Backbone.sync. Here’s a copy-and-pasteable fix:

var original_backbone_sync;
jQuery(document).ready(function () {
    // Override Backbone's jQuery AJAX calls to be tolerant of erroneous text before the start of the JSON.
    original_backbone_sync = Backbone.sync;
    Backbone.sync = function(method,model,options){
        // Change the jQuery AJAX "converters" text-to-json method.
		options.converters = {
			'text json': function(result) {
                let new_result = result;
                // Sometimes other plugins echo out junk before the start of the real JSON response.
                // So we need to chop off all that extra stuff.
                do{
                    // Find the first spot that could be the beginning of valid JSON...
                    var start_of_json = Math.min(
                        new_result.indexOf('{'),
                        new_result.indexOf('['),
                        new_result.indexOf('true'),
                        new_result.indexOf('false'),
                        new_result.indexOf('"')
                    );
                    // Remove everything before it...
                    new_result = new_result.substring(start_of_json);
                    try{
                        // Try to parse it...
                        let i = jQuery.parseJSON(new_result);
                        // If that didn't have an error, great. We found valid JSON!
                        return i;
                    }catch(error){
                        // There was an error parsing that substring. So let's chop off some more and keep hunting for valid JSON.
                        // Chop off the character that made this look like it could be valid JSON, and then continue iterating...
                        new_result = new_result.substring(1);
                    }
                }while(start_of_json !== false);
                // Never found any valid JSON. Throw the error.
                throw "No JSON found in AJAX response using custom JSON parser.";
            }
		};
        return original_backbone_sync(method,model,options);
    };
});

You can see a lot of the same code reused from before. Here’s some more explanations:

  • var original_backbone_sync; creates a variable to store Backbone’s original sync method
  • original_backbone_sync = Backbone.sync; stores the original Backbone.sync so we can use it later
  • Backbone.sync = function(method,model,options){ creates a new Backbone.sync callback that Backbone will now use instead of its default
  • options.converters = {changes the converters object which Backbone passes to jQuery.ajax later. The converter is the exact same as the earlier code snippet.
  • return original_backbone_sync(method,model,options); now that we’ve modified the Ajax text-to-json converter, we call the original Backbone.sync method so everything else proceeds as normal.

Here I’m using it my plugin’s Javascript so I don’t need to worry about other plugins having warnings or HTML that breaks my JSON responses. We all just get along fine. ✌️

Questions or comments accepted!

Leave a Reply