Skip to content

Extending Parsedown: attributes

Markdown Extra (including Parsedown Extra) allows for attributes to be applied to certain elements: headers, fenced code blocks, links, and images. I'd like to be able to apply them to any element. I'm going to use the syntax of Python Markdown attributes, but the attribute lists go before the elements. For block level elements, they go on their own line before the element.

My attribute lists start with {: (not just {) and end with }. Anything that would be legal in HTML (as is fine, since I didn't write my own parser. I just used DOMDocument. There are three special cases:

  • .foo is changed to class="foo". Note that this is different from the Python code, which appends class names that start with .. Repeated attribute names in actual HTML are ignored, so to use two classes, use class="foo bar", not .foo .bar.
  • #foo is changed to id="foo".
  • Two letters alone are changed to lang=xx, since I use that attribute so much.

So {:fr}*Bonjour* becomes <em lang=fr>Bonjour</em> and

{:style="border: 1px solid black;" #table1}
| Name | Color |
| Joe  | Red   |


<table style="border: 1px solid black;" id="table1">

The code

There are two parts to the code: first, create an <attr> with the appropriate attributes. Second, find the next element and apply those attributes to it, and delete the <attr> element.

For the first, we adjust the attributes (pulling out the quoted strings first, with StringReplace), then create an element with those attributes and use PHP's HTML parser to figure them out.

function __construct(){
  $this->InlineTypes['{'] []= 'Attributes';
  // rest of the constructor

protected function inlineAttributes($excerpt){
  if (preg_match('#^{:(.+?)}\s*#', $excerpt['text'], $matches)) {
    return array(
      'extent' => strlen($matches[0]), 
      'element' => array(
        'name' => 'attr',
        'attributes' => $this->parseAttributes($matches[1])

// and to allow for an attribute list on a line by itself, to apply to block-level elements, we need to stop paragraph
protected function paragraphContinue($Line, array $Block){
  if (preg_match ('#^{:(.+?)}\s*#',  $Block['element']['handler']['argument'])){
    // the previous block was an attribute list in a paragraph
    $Block['interrupted'] = 1;
  return parent::paragraphContinue($Line, $Block);

protected function parseAttributes ($attrString){
  $attrString = " $attrString "; // make the regular expressions simpler by prepending and appending spaces
  $attrString = StringReplace\remove ('/("[^"]*")|(\'[^\']*\')/', $attrString); // pull out quoted strings
  $attrString = preg_replace ('/ #(\w+)(?= )/', ' id=$1 ', $attrString); // #foo
  $attrString = preg_replace ('/ \.(\w+)(?= )/', ' class=$1 ', $attrString); // .foo
  $attrString = preg_replace ('/ ([a-zA-Z]{2})(?= )/', ' lang=$1 ', $attrString); // two letter language names
  $attrString = StringReplace\restore ($attrString); // put the quotes back

  // the loadHTML is so complicated because we want to ignore errors and not add DTD's, ,  etc.
  // and we want it to use UTF-8, which isn't automatic
  $dom = DOMDocument::loadHTML("",
  // now, $dom->firstChild is the  and nextSibling is the 
  $ret = [];
  foreach ($dom->firstChild->nextSibling->attributes as $attr){
    $ret [$attr->nodeName] = $attr->nodeValue;
  return $ret;

Now we will have invalid HTML (there's no such things as an <attr> element). So we use DOMDocument again to manipulate the newly created DOM by overriding the basic method of Parsedown, text(){

function text($text){
  $text = parent::text($text);
  $dom = DOMDocument::loadHTML('
', LIBXML_HTML_NOIMPLIED | LIBXML_NOWARNING | LIBXML_NOERROR | LIBXML_HTML_NODEFDTD); $xpath = new DOMXpath($dom); foreach ($xpath->query("//attr") as $attr) self::moveAttributes ($attr); // do any other manipulation of the DOM // turn it back into HTML $text = $dom->saveHTML($dom->documentElement); // do any other manipulation of the text return $text; } static protected function moveAttributes ($attrNode){ // apply the attributes of $attrNode to the next node, then delete it. // Special case: an attribute on a line by itself will be enclosed in a

. Delete that

if ($attrNode->parentNode->childNodes->length == 1 && $attrNode->parentNode->nodeName == 'p'){ self::copyAttributes ($attrNode->parentNode, $attrNode); $attrNode->parentNode->parentNode->replaceChild ($attrNode, $attrNode->parentNode); } // Find the next element. Skip text nodes or anything that isn't an element for ($target = $attrNode->nextSibling; $target && $target->nodeType != XML_ELEMENT_NODE; $target = $target->nextSibling); // empty loop body if ($target) self::copyAttributes ($attrNode, $target); $attrNode->parentNode->removeChild($attrNode); } static protected function copyAttributes ($from, $to){ try{ foreach ($from->attributes as $attr) $to->setAttribute($attr->nodeName, $attr->nodeValue); }catch(Exception $e){ // ignore errors; generally are illegal characters in the attribute name } }

Post a Comment

Your email is never published nor shared. Required fields are marked *