I've thought about creating my own syntax highlighter. I've been using Chili, but there are some odd bugs that pop up here and there and it doesn't seem to play well with Chrome. And it hasn't been updated in 2 years.

One thing I did want was line numbering, but that's been a bugaboo of syntax highighlighters for a long time—you want the numbers but do not want them copied when code is selected. Firefox copies the numbers when using <li> elements, and tables or inserted text will also copy everything. The answer seems to be using :before to insert the line numbers, since that text won't be copied in any modern browser (IE 8 and below don't support :before, but we won't worry about that).

The issue then is how to tell CSS about the lines. We want to wrap them in <span>s, as so:

<pre>
<span class=line>This is a <em>text</em></span>
<span class=line>This is the second line</span>
</pre>

And number everything with CSS:

pre.test1 {
	counter-reset: linecounter;
}
pre.test1 span.line{
	counter-increment: linecounter;
}
pre.test1 span.line:before{
	content: counter(linecounter);
	width: 2em;
	display: inline-block;
	border-right: 1px solid black;
}

And this is the result, exactly as desired.

This is a text
This is the second line

The keys in the CSS are lines 1 and 4 that set up the counter (change line 1 to linecounter 4 to start the numbering at 5 (counter-increment increments before displaying)) (change linecounter to anything you want as long as its consistent). Line 7 displays the value of the counter in the :before pseudoelement, and lines 8-10 are just old-fashioned styling to make it prettier. You of course would want to add some padding, margin, odd/even backgrounds etc., but that's old hat.

But how do we get the <span>s to wrap the lines? We could just take the text and split it on '\n' and use string processing to wrap them: element.innerHTML = element.textContent.replace(/.+/g, '$&') but that loses all internal markup. Luckily, browsers that implement contentEditable know how to insert stuff without messing up the structure by using ranges, and we know how to manipulate ranges.

Rather than including the whole bililiteRange class, since I know we're only going to be dealing with standards-compliant browsers, I can just take out the relevant code:

function wrapLines (el){
	var text = el.textContent.split('\n');
	var range = document.createRange();
	var pointer = 0; // start of text
	el.textContent.split('\n').forEach(function(line, i){
		var len = line.length;
		setBounds (pointer, pointer+len); // sets range to the characters of the line
		var wrapper = document.createElement('span');
		wrapper.setAttribute('class', 'line');
		wrapper.appendChild(range.extractContents()); // pulls the contents of the range out of the document and into wrapper
		range.insertNode(wrapper); // and put back the wrapped line
		pointer += len+1; // skip the newline
	});
	// now, we're left with a bunch of empty spans/other elements that were split across lines and the browser divided them into three parts (first line, newline character, second line)
	// those mess up the odd/even calculations. Replace them with plain text.
	for (var node = el.firstChild; node; node = node.nextSibling){
		if (node.nodeType != 3 && node.getAttribute('class') != 'line'){
			var replacement = document.createTextNode(node.textContent);
			el.replaceChild(replacement, node);
			node = replacement;
		}
	}
	
	function setBounds (start, end){
		// since the browser throws an error if we try to move the beginning past the end (unlike IE, which just adusts the end)
		// we have to reset the range to cover the entire element, then move the start, then move the end to the start, then move the end
		range.selectNodeContents(el);
		moveBoundary (start, 'start');
		range.collapse (true);
		moveBoundary (end-start, 'end');
	}
	function moveBoundary (n, start){
		// move the boundary n characters forward, up to the end of the element. Forward only!
		//  start is 'start' or 'end', and is used to create the appropriate method names ('startContainer' or 'endContainer' etc.)
		// if the start is moved after the end, then an exception is raised
		if (n <= 0) return;
		var startNode = range[start+'Container'];
		// we may be starting somewhere into the text
		if (startNode.nodeType == 3) n += range[start+'Offset'];
		// nodeIterators from http://www.w3.org/TR/DOM-Level-2-Traversal-Range/traversal.html
		var iter = document.createNodeIterator(el, 4 /* SHOW_TEXT */), node;
		while (node = iter.nextNode()){
			if (startNode.compareDocumentPosition(node) & 2 /* DOCUMENT_POSITION_PRECEDING */ ) continue;
			if (n <= node.nodeValue.length){
				// we found the last character!
				range[start == 'start' ? 'setStart' :'setEnd'](node, n);
				return;
			}else{
				n -= node.nodeValue.length; // eat these characters
			}
		}
	}
}

And now it works (note the original markup has no line-wrapping spans; that's added with javascript):

<pre class="test1 numbered">
This is a <em>text</em>
This is the second line
This is the third line; this text should have line numbers.</pre>
wrapLines($('.numbered')[0])

14 Comments

  1. Mike says:

    Can you add some code to change background colour for alternate lines?

  2. Danny says:

    @Mike:
    Use CSS:

    span.line:nth-child(odd){
      background:#ff0000;
    }

    –Danny

  3. Mohammad Hamza says:

    I think instead this the syntax highlighter codes are pretty better

  4. Ezeh says:

    How can one install this code on blogger blog please.

  5. Danny says:

    @Ezeh:
    I don’t know anything about blogger. Can you include javascript?
    –Danny

  6. Danny says:

    @Mohammad:
    I agree. This was more a proof-of-concept. I like my Prism plugin, though I have to admit that Lea Verou (the creator of Prism) doesn’t like wrapping lines to set line numbers.
    –Danny

  7. Ezeh says:

    @Danny:
    Yes on my template i can do that. But i need steps that will guide me.

  8. Danny says:

    I don’t know anything about Blogger to tell you how to include javascript and CSS files. Sorry

  9. David says:

    I’m seeing odd behavior with an embedded span that crosses multiple lines, e.g,

    blah blah
    blah blah blah
    blah blah

    Is the middle line of samples like this formatted correctly for everyone else (i.e., have I don’t something wrong), or does the code not retrieve the context except in cases where the range logic has to patch up something that isn’t well formed?

  10. David says:

    Sorry; new to the way WordPress seems to eat markup. What I meant was: I’m seeing odd behavior with an embedded span that crosses multiple lines, e.g,

    blah <span style=”color:red;”>blah
    blah blah blah
    blah blah

    Is the middle line of samples like this formatted correctly for everyone else (i.e., have I don’t something wrong), or does the code not retrieve the context except in cases where the range logic has to patch up something that isn’t well formed?

  11. David says:

    Sorry; messed up again. What I *really* meant was: I’m seeing odd behavior with an embedded span that crosses multiple lines, e.g,

    blah <span style=”color:red;”>blah
    blah blah blah
    blah</span> blah

    Is the middle line of samples like this formatted correctly for everyone else (i.e., have I don’t something wrong), or does the code not retrieve the context except in cases where the range logic has to patch up something that isn’t well formed?

  12. Danny says:

    @David:
    Unfortunately, what happens in that case is out of my control. My code does range.extractContents() and then inserts that extracted element into a new <span>. If range.extractContents() doesn’t preserve the formatting then you will have to split the span by hand.
    When I have time, I will look at it. If you inspect the results, what do you get?

    –Danny

  13. David says:

    @Danny, after sleeping on it, I think I understand the cause of the problem and have a solution; I’ll try to get it out in the next day or so, and I’ll drop a note here and post the code in an accessible space when that happens. Thank you for the quick response (and feel free to delete my first few malformed postings, which apparently the blog site won’t let me delete myself).

  14. David says:

    The problem was that the range strategy fixes stranded start and end tags so that the range contents will be well-formed, but it doesn’t walk up the tree. If after the automatic range fix-up the start and end points of the range are both contained in an element that starts before and ends after the range, any styling associated with that element doesn’t get applied. I’ve addressed this by walking up the tree; code is at https://github.com/djbpitt/numberlines. My version is heavily dependent on the one here (credited in the initial comments), but because I’m not a very adept JavaScript coder, I deviated in places where I didn’t understand how the original worked. I haven’t tested this extensively, but it seems to maintain formatting that starts anywhere inside the <pre> wrapper element.

Leave a Reply


Warning: Undefined variable $user_ID in /home/public/blog/wp-content/themes/evanescence/comments.php on line 75