Hacking at 0300

Extending Parsedown: attributes

Markdown Extra (including Parsedown Extra) allows for attributes to be applied to certain elements: headers, fenced code blocks, links, and images. I'd like to be able to apply them to any element. I'm going to use the syntax of Python Markdown attributes, but the attribute lists go before the elements. For block level elements, they go on their own line before the element.

My attribute lists start with {: (not just {) and end with }. Anything that would be legal in HTML (as is fine, since I didn't write my own parser. I just used DOMDocument. There are three special cases:

.foo is changed to class="foo". Note that this is different from the Python code, which appends class names that start with .. Repeated attribute names in actual HTML are ignored, so to use two classes, use class="foo bar", not .foo .bar.
#foo is changed to id="foo".
Two letters alone are changed to lang=xx, since I use that attribute so much.

So {:fr}*Bonjour* becomes <em lang=fr>Bonjour</em> and


{:style="border: 1px solid black;" #table1}
| Name | Color |
|------|-------|
| Joe  | Red   |

becomes

<table style="border: 1px solid black;" id="table1">
<thead>
<tr>
<th>Name</th>
<th>Color</th>
</tr>
</thead>
<tbody>
<tr>
<td>Joe</td>
<td>Red</td>
</tr>
</tbody>
</table>

The code

There are two parts to the code: first, create an <attr> with the appropriate attributes. Second, find the next element and apply those attributes to it, and delete the <attr> element.

For the first, we adjust the attributes (pulling out the quoted strings first, with StringReplace), then create an element with those attributes and use PHP's HTML parser to figure them out.

function __construct(){
  $this->InlineTypes['{'] []= 'Attributes';
  // rest of the constructor
}

protected function inlineAttributes($excerpt){
  if (preg_match('#^{:(.+?)}\s*#', $excerpt['text'], $matches)) {
    return array(
      'extent' => strlen($matches[0]), 
      'element' => array(
        'name' => 'attr',
        'attributes' => $this->parseAttributes($matches[1])
      )
    );
  }
}

// and to allow for an attribute list on a line by itself, to apply to block-level elements, we need to stop paragraph
protected function paragraphContinue($Line, array $Block){
  if (preg_match ('#^{:(.+?)}\s*#',  $Block['element']['handler']['argument'])){
    // the previous block was an attribute list in a paragraph
    $Block['interrupted'] = 1;
  }
  return parent::paragraphContinue($Line, $Block);
}

protected function parseAttributes ($attrString){
  $attrString = " $attrString "; // make the regular expressions simpler by prepending and appending spaces
  $attrString = StringReplace\remove ('/("[^"]*")|(\'[^\']*\')/', $attrString); // pull out quoted strings
  $attrString = preg_replace ('/ #(\w+)(?= )/', ' id=$1 ', $attrString); // #foo
  $attrString = preg_replace ('/ \.(\w+)(?= )/', ' class=$1 ', $attrString); // .foo
  $attrString = preg_replace ('/ ([a-zA-Z]{2})(?= )/', ' lang=$1 ', $attrString); // two letter language names
  $attrString = StringReplace\restore ($attrString); // put the quotes back

  // the loadHTML is so complicated because we want to ignore errors and not add DTD's, ,  etc.
  // and we want it to use UTF-8, which isn't automatic
  $dom = DOMDocument::loadHTML("",
    LIBXML_HTML_NOIMPLIED | LIBXML_NOWARNING | LIBXML_NOERROR | LIBXML_HTML_NODEFDTD);
  // now, $dom->firstChild is the  and nextSibling is the 
  $ret = [];
  foreach ($dom->firstChild->nextSibling->attributes as $attr){
    $ret [$attr->nodeName] = $attr->nodeValue;
  }
  return $ret;
}

Now we will have invalid HTML (there's no such things as an <attr> element). So we use DOMDocument again to manipulate the newly created DOM by overriding the basic method of Parsedown, text(){

function text($text){
  $text = parent::text($text);
  $dom = DOMDocument::loadHTML(''.$text.'',
   LIBXML_HTML_NOIMPLIED | LIBXML_NOWARNING | LIBXML_NOERROR | LIBXML_HTML_NODEFDTD);
  $xpath = new DOMXpath($dom);

  foreach ($xpath->query("//attr") as $attr) self::moveAttributes ($attr);
  // do any other manipulation of the DOM

  // turn it back into HTML
  $text = $dom->saveHTML($dom->documentElement);

  // do any other manipulation of the text

  return $text;
}

static protected function moveAttributes ($attrNode){
  // apply the attributes of $attrNode to the next node, then delete it.
  // Special case: an attribute on a line by itself will be enclosed in a . Delete that 
  if ($attrNode->parentNode->childNodes->length == 1 && $attrNode->parentNode->nodeName == 'p'){
    self::copyAttributes ($attrNode->parentNode, $attrNode);
    $attrNode->parentNode->parentNode->replaceChild ($attrNode, $attrNode->parentNode);
  }
  // Find the next element. Skip text nodes or anything that isn't an element
  for ($target = $attrNode->nextSibling; $target && $target->nodeType != XML_ELEMENT_NODE; $target = $target->nextSibling); // empty loop body
  if ($target) self::copyAttributes ($attrNode, $target);
  $attrNode->parentNode->removeChild($attrNode);
}

static protected function copyAttributes ($from, $to){
  try{
    foreach ($from->attributes as $attr) $to->setAttribute($attr->nodeName, $attr->nodeValue);
  }catch(Exception $e){
    // ignore errors; generally are illegal characters in the attribute name
  }
}

This entry was posted by Danny on May 27, 2020 at 12:22 pm under Parsedown, PHP. You can leave a response, or trackback from your own site. Follow any responses to this entry through the RSS 2.0 feed.

S	M	T	W	T	F	S
					1	2
3	4	5	6	7	8	9
10	11	12	13	14	15	16
17	18	19	20	21	22	23
24	25	26	27	28	29	30
31

Extending Parsedown: attributes

The code

Leave a Reply

Free Medical Advice

Recent Posts

Pages

Archives

Judaism

Medical Informatics

Web Design

Meta

Hacking at 0300

Extending Parsedown: attributes

The code

Leave a Reply

Free Medical Advice

Recent Posts

Pages

Categories

Archives

Judaism

Medical Informatics

Web Design

Meta