Working with Parsedown, I want to string manipulation but only in certain parts. For instance, on text not in HTML tags or not in quotes. The right way to do that is with a real parser. The easy way is by removing the unwanted strings, replacing them with a marker that won't come up in normal text, doing the manipulation, then replacing the markers (it is the replacement step that requires "a marker that won't come up in normal text"; you don't want to replace text that was present in the original).
I would use a marker that can't be typed but still is legal HTML; turns out that U+FFFC (OBJECT REPLACEMENT CHARACTER, ) is perfect for that. So I made a pair of functions, `StringReplace\remove` and `StringReplace\restore` to make that easy.
StringReplace\remove ($re, $target)
- Any string that matches the regular expression
$re
in$target
is replaced by a numbered marker,"{number}"
. The new string is returned. So for instance,$rawtext = StringReplace\remove ('#</?[^>]*>#', $html);
will remove tags.
StringReplace\restore ($target)
- Returns a string with the markers replaced by their original versions.
The code
namespace StringReplace;
define ('OBJECT_REPLACEMENT_CHARACTER', '');
define ('RE_REPLACEMENT', '/'.OBJECT_REPLACEMENT_CHARACTER.'(\d+)'.OBJECT_REPLACEMENT_CHARACTER.'/');
$strings = array();
$remover = function ($matches){
global $strings;
$strings []= $matches[0];
return OBJECT_REPLACEMENT_CHARACTER.count($strings).OBJECT_REPLACEMENT_CHARACTER;
};
$replacer = function ($matches){
global $strings;
return $strings[$matches[1]-1];
};
function remove ($re, $target){
global $remover;
return preg_replace_callback ($re, $remover, $target);
}
function restore ($target){
global $replacer;
return preg_replace_callback (RE_REPLACEMENT, $replacer, $target);
}
Leave a Reply