Michael Tyson had a cool idea: instead of the search results page showing an excerpt of the first words of the post, show an excerpt that contains the search terms and highlight them (say, by making them bold). I thought his method was too complex—it requires replacing your theme's search.php with a custom page, and it shows the context of every occurrence of the search terms. I thought it would be more straightforward to use the existing search page, which should be using the_excerpt to show an extract of the found page, and use the existing filters to change the text. Also, there's no need to show every occurrence of the search terms; the first ones should be fine.

There are two parts to displaying the excerpt. The first is generating the text of the excerpt, with (possibly) a suffix that indicates that it was truncated. This is done by the function get_the_excerpt. That uses the excerpt_length filter to determine the length of the excerpt in words, and the excerpt_more filter to determine the suffix. It then calls the get_the_extract filter to return the final results. We'll hook that filter to return our own text, one that contains the search terms.

The second part is displaying the excerpt, done by the_excerpt. All that does is embed the text in a p element, call the the_extract filter, and echo the result. We'll hook that to highlight the search terms.

First, we create a regular expression that looks for the search terms (basically, same as that used by Michael Tyson):

function query_re(){
	global $wp_query;
	$terms = $wp_query->query_vars['search_terms'];
	foreach ($terms as &$term) $term = preg_quote($term, '/');
	return '/'.implode('|', $terms).'/iu';
}

Then, we use my preg_replace_text to highlight the terms when the excerpt is displayed (the the_excerpt filter):

add_filter('the_excerpt', function ($text){
	return preg_replace_text (query_re(), '<span class="searchterm">$0</span>', $text);
});

And now make sure our stylesheet does something pretty with span.searchterm.

Selecting the correct excerpt is slightly more complicated. First choice is the author-composed excerpt (if it matches the search terms), which is passed to the filter directly. Second choice is the default excerpt (if it matches the search terms), which is the first excerpt_length words. Third choice is the excerpt_length words surrounding the first search term; I'll arbitrarily take one-third of the length before and two thirds after. Fourth choice, which means that neither the excerpt or the whole text contain the search term, is the original excerpt (this can happen if the search term matched the title). Fifth choice is the first excerpt_length words.

WordPress has a function wp_trim_words that we can use to limit the excerpt size, but it only trims on the end. To trim the start of the text, we use a little hack: trim the reverse of the text. The definition of a word doesn't depend on spelling. Since we may well want Unicode text and strrev can't handle that, we use this cute function: function strrev_utf8($str) { return join("", array_reverse(preg_split("##u", $str))); }:

remove_all_filters('get_the_excerpt'); // we want to take over handling the excerpt
add_filter( 'get_the_excerpt', function($excerpt){
	global $post;
	$excerpt_length = apply_filters('excerpt_length', 55);
	$excerpt_more = apply_filters('excerpt_more', '…');
	$query_re =  query_re();
	
	// First choice: the author-composed excerpt
	if ($excerpt && preg_match($query_re, $excerpt)) return $excerpt;

	// Second choice: the start of the text
	// get the actual text of the post
	$text = wp_strip_all_tags(apply_filters('the_content', $post->post_content));
	// Create the default excerpt
	$excerpted_text = wp_trim_words($text, $excerpt_length, $excerpt_more);
	if (preg_match($query_re, $excerpted_text)) return $excerpted_text;
	
	// Third choice: context of the search term
	$text_matched = preg_match ($query_re, $text, $matches, PREG_OFFSET_CAPTURE); // save the matched terms with their offsets
	if ($text_matched){
		$offset = $matches[0][1]+strlen($matches[0][0]); // the offset into the end of the text where the term was found
		// hack: we want to add context for where the term was found, but we want it to use whole words. wp_trim_words will trim the end,
		// but we want so many words (empirically, one third the excerpt length) in the beginning. So we reverse the text and use that.
		$len = $excerpt_length/3;
		// need to use a single character to indicate truncation since we are reversing the text
		$reversetext = strrev_utf8(wp_trim_words(strrev_utf8(substr($text, 0, $offset)), $len, '…'));
		$context = $reversetext.substr($text, $offset); // rebuild it
		return wp_trim_words($context, $excerpt_length, $excerpt_more);
	}
	
	// No matches. Just use the usual excerpt 
	return $excerpt ? $excerpt : $excerpted_text;
});

This seems to work well, and should work with any theme that uses the_excerpt() to display search results. One note is that the wp_trim_excerpt function, which this replaces (it is the original get_the_excerpt filter) does a strip_shortcodes on the content, which I specifically left out. I want to include the text of my shortcodes. Also, it does $text = str_replace(']]>', ']]&gt;', $text);, for reasons I don't understand. Where would ]]> come from? So I left it out.

2 Comments

  1. supernaut says:

    Hi Danny, I’ve been using this and noticed it threw some php errors, which can be fixed by wrapping the foreach in query_re like so:


    if (is_array($terms)) {
    …two lines here
    }

  2. Danny says:

    @supernaut:
    I haven’t analyzed the WP code. Shouldn’t $wp_query->query_vars['search_terms'] always be an array? I haven’t gotten any errors. What is the type of $wp_query->query_vars['search_terms'] if not an array?
    –Danny

Leave a Reply


Warning: Undefined variable $user_ID in /home/public/blog/wp-content/themes/evanescence/comments.php on line 75