Innerdvations

20Aug/112

Youtube’s Zero Width No-Break Spaces (aka BOM, aka 65279, aka 0xFEFF)

I use Opera as my primary web browser. I also use YouTube. They go together pretty well, except for one little thing: The byte-order mark (BOM) that YouTube sticks, seemingly randomly, in the middle of nearly every single comment.

Now, I realize that even assuming someone would want to actually read YouTube comments is a stretch. That's probably why nobody has ever noticed this bug before -- I'm probably the first person to actually read a comment on YouTube. So that you don't have to do so as well, I've taken a screenshot of the issue:

YouTube comments as displayed by Opera

If you want to see how your own browser renders an inline BOM, I've put them in brackets here:
[] (using )
[] (the actual character itself)

If you use Firefox or Chrome, you probably don't see anything. In fact, the second one is probably even completely missing when you view source. I'm guessing that rather than treat it as a "zero-width non-breaking space" they assume there's an out-of-place BOM and trash it altogether.

YouTube comments as displayed by Lynx, proving I'm not imagining things

So, whose fault is this? As I understand it, it's Opera's fault, because it should be displayed as a zero-width space, but YouTube should share a good portion of blame for sticking a BOM, which is only supposed to appear as the first character in a document, in the middle of content. I can't think of a single reason for any of them to be there... they're not joining words, they're not doing any sort of strange character set handling. Additionally, Opera could have a security argument for visibly rendering it, since it's one of those characters that phishers like to use to make something appear legit when it's not.

Anyway, here's how to fix it (if you're like me and you use Opera and are driven crazy by insignificant little details). All you need to do is go to Site Preferences, Scripts, set a user-script directory, and then put this file or create a .js file with this in it:

last_clean = new Date().getTime();
function clean_youtube_comments() {
	// if the element we're cleaning doesn't exist, don't try cleaning it
	if (!document.getElementById('comments-view')) {
		return;
	}

	original = document.getElementById('comments-view').innerHTML;
	cleaned = document.getElementById('comments-view').innerHTML.replace(/[]*/gi,'');
	// make sure we should actually replace it
	if(original != cleaned && original.length >= cleaned.length && cleaned.length) {
		document.getElementById('comments-view').innerHTML = cleaned;
		last_clean = new Date().getTime();
	}
}

function yt_DOMNodeInserted() {
	// if we cleaned less than 50ms ago, skip it. this prevents cleaning
	// multiple times on paging, because DOMNodeInserted gets called for
	// each node inserted, which amounts to about 5 times in a row in
	// this case.
	now = new Date().getTime();
	if(now - last_clean < 50) {
		return;
	}
	clean_youtube_comments();
}

addEventListener('load', function (e) {
  // clean on page load
  clean_youtube_comments();
  // clean on DOMNodeInserted (listener for the ajax paging of comments)
  document.getElementById("watch-discussion").addEventListener("DOMNodeInserted", yt_DOMNodeInserted, false);
}, false);

Updated 2011-08-27: It took a week before I had the urge to view a second page of comments and realized it didn't catch those, so I've had to add a listener for YouTube AJAX paging.