{"id":3639,"date":"2020-05-27T12:22:36","date_gmt":"2020-05-27T18:22:36","guid":{"rendered":"http:\/\/bililite.com\/blog\/?p=3639"},"modified":"2022-07-11T11:04:28","modified_gmt":"2022-07-11T17:04:28","slug":"extending-parsedown-attributes","status":"publish","type":"post","link":"https:\/\/bililite.com\/blog\/2020\/05\/27\/extending-parsedown-attributes\/","title":{"rendered":"Extending Parsedown: attributes"},"content":{"rendered":"<p><a href=\"https:\/\/michelf.ca\/projects\/php-markdown\/extra\/\">Markdown Extra<\/a> (including <a href=\"https:\/\/github.com\/erusev\/parsedown-extra\">Parsedown Extra<\/a>) allows for attributes to be applied to certain elements: headers, fenced code blocks, links, and images. I'd like to be able to apply them to any element. I'm going to use the syntax of <a href=\"https:\/\/python-markdown.github.io\/extensions\/attr_list\/\">Python Markdown attributes<\/a>, but the attribute lists go <em>before<\/em> the elements. For block level elements, they go on their own line <em>before<\/em> the element.<\/p>\n<p>My attribute lists start with <code>{:<\/code> (<em>not<\/em> just <code>{<\/code>) and end with <code>}<\/code>. Anything that would be legal in HTML (as <code class=\"language-html\" ><element title=\"hello, world\" style='color: blue' class=foo \/><\/code> is fine, since I didn't write my own parser. I just used <a href=\"https:\/\/www.php.net\/manual\/en\/class.domdocument.php\"><code class=\"language-html\" >DOMDocument<\/code><\/a>. There are three special cases:<\/p>\n<ul>\n<li><code class=\"language-css\" >.foo<\/code> is changed to <code class=\"language-css\" >class=\"foo\"<\/code>. Note that this is different from the Python code, which <em>appends<\/em> class names that start with <code class=\"language-css\" >.<\/code>. Repeated attribute names in actual HTML are ignored, so to use two classes, use <code class=\"language-css\" >class=\"foo bar\"<\/code>, not <code class=\"language-css\" >.foo .bar<\/code>.<\/li>\n<li><code class=\"language-css\" >#foo<\/code> is changed to <code class=\"language-css\" >id=\"foo\"<\/code>.\n<li>Two letters alone are changed to <code class=\"language-css\" >lang=xx<\/code>, since I use that attribute so much.<\/li>\n<\/ul>\n<p><!--more--><\/p>\n<p>So <code class=\"language-md\" >{:fr}*Bonjour*<\/code> becomes <code class=\"language-html\" >&lt;em lang=fr&gt;Bonjour&lt;\/em&gt;<\/code> and <\/p>\n<pre><code class=\"language-markdown\" >\r\n{:style=\"border: 1px solid black;\" #table1}\r\n| Name | Color |\r\n|------|-------|\r\n| Joe  | Red   |\r\n<\/code><\/pre>\n<p>becomes<\/p>\n<pre><code class=\"language-html\" >&lt;table style=\"border: 1px solid black;\" id=\"table1\"&gt;\r\n&lt;thead&gt;\r\n&lt;tr&gt;\r\n&lt;th&gt;Name&lt;\/th&gt;\r\n&lt;th&gt;Color&lt;\/th&gt;\r\n&lt;\/tr&gt;\r\n&lt;\/thead&gt;\r\n&lt;tbody&gt;\r\n&lt;tr&gt;\r\n&lt;td&gt;Joe&lt;\/td&gt;\r\n&lt;td&gt;Red&lt;\/td&gt;\r\n&lt;\/tr&gt;\r\n&lt;\/tbody&gt;\r\n&lt;\/table&gt;<\/code><\/pre>\n<h2>The code<\/h2>\n<p>There are two parts to the code: first, create an <code class=\"language-html\" >&lt;attr&gt;<\/code> with the appropriate attributes. Second, find the next element and apply those attributes to it, and delete the <code class=\"language-html\" >&lt;attr&gt;<\/code> element.<\/p>\n<p>For the first, we adjust the attributes (pulling out the quoted strings first, with <a href=\"http:\/\/bililite.com\/blog\/2020\/05\/26\/string-replacement-in-php\/\">StringReplace<\/a>), then create an element with those attributes and use PHP's HTML parser to figure them out.<\/p>\n<pre><code class=\"language-php\" >function __construct(){\r\n  $this->InlineTypes['{'] []= 'Attributes';\r\n  \/\/ rest of the constructor\r\n}\r\n\r\nprotected function inlineAttributes($excerpt){\r\n  if (preg_match('#^{:(.+?)}\\s*#', $excerpt['text'], $matches)) {\r\n    return array(\r\n      'extent' => strlen($matches[0]), \r\n      'element' => array(\r\n        'name' => 'attr',\r\n        'attributes' => $this->parseAttributes($matches[1])\r\n      )\r\n    );\r\n  }\r\n}\r\n\r\n\/\/ and to allow for an attribute list on a line by itself, to apply to block-level elements, we need to stop paragraph\r\nprotected function paragraphContinue($Line, array $Block){\r\n  if (preg_match ('#^{:(.+?)}\\s*#',  $Block['element']['handler']['argument'])){\r\n    \/\/ the previous block was an attribute list in a paragraph\r\n    $Block['interrupted'] = 1;\r\n  }\r\n  return parent::paragraphContinue($Line, $Block);\r\n}\r\n\r\nprotected function parseAttributes ($attrString){\r\n  $attrString = \" $attrString \"; \/\/ make the regular expressions simpler by prepending and appending spaces\r\n  $attrString = StringReplace\\remove ('\/(\"[^\"]*\")|(\\'[^\\']*\\')\/', $attrString); \/\/ pull out quoted strings\r\n  $attrString = preg_replace ('\/ #(\\w+)(?= )\/', ' id=$1 ', $attrString); \/\/ #foo\r\n  $attrString = preg_replace ('\/ \\.(\\w+)(?= )\/', ' class=$1 ', $attrString); \/\/ .foo\r\n  $attrString = preg_replace ('\/ ([a-zA-Z]{2})(?= )\/', ' lang=$1 ', $attrString); \/\/ two letter language names\r\n  $attrString = StringReplace\\restore ($attrString); \/\/ put the quotes back\r\n\r\n  \/\/ the loadHTML is so complicated because we want to ignore errors and not add DTD's, <html>, <body> etc.\r\n  \/\/ and we want it to use UTF-8, which isn't automatic\r\n  $dom = DOMDocument::loadHTML(\"<?xml encoding='UTF-8'><element $attrString \/>\",\r\n    LIBXML_HTML_NOIMPLIED | LIBXML_NOWARNING | LIBXML_NOERROR | LIBXML_HTML_NODEFDTD);\r\n  \/\/ now, $dom->firstChild is the <?xml> and nextSibling is the <element>\r\n  $ret = [];\r\n  foreach ($dom->firstChild->nextSibling->attributes as $attr){\r\n    $ret [$attr->nodeName] = $attr->nodeValue;\r\n  }\r\n  return $ret;\r\n} <\/code><\/pre>\n<p>Now we will have invalid HTML (there's no such things as an <code class=\"language-html\" >&lt;attr&gt;<\/code> element). So we use <code>DOMDocument<\/code> again to manipulate the newly created DOM by overriding the basic method of Parsedown, <code class=\"language-php\" >text()<\/code>{<\/p>\n<pre><code class=\"language-php\" >function text($text){\r\n  $text = parent::text($text);\r\n  $dom = DOMDocument::loadHTML('<?xml encoding=\"UTF-8\"><div>'.$text.'<\/div>',\r\n   LIBXML_HTML_NOIMPLIED | LIBXML_NOWARNING | LIBXML_NOERROR | LIBXML_HTML_NODEFDTD);\r\n  $xpath = new DOMXpath($dom);\r\n\r\n  foreach ($xpath->query(\"\/\/attr\") as $attr) self::moveAttributes ($attr);\r\n  \/\/ do any other manipulation of the DOM\r\n\r\n  \/\/ turn it back into HTML\r\n  $text = $dom->saveHTML($dom->documentElement);\r\n\r\n  \/\/ do any other manipulation of the text\r\n\r\n  return $text;\r\n}\r\n\r\nstatic protected function moveAttributes ($attrNode){\r\n  \/\/ apply the attributes of $attrNode to the next node, then delete it.\r\n  \/\/ Special case: an attribute on a line by itself will be enclosed in a <p>. Delete that <p>\r\n  if ($attrNode->parentNode->childNodes->length == 1 && $attrNode->parentNode->nodeName == 'p'){\r\n    self::copyAttributes ($attrNode->parentNode, $attrNode);\r\n    $attrNode->parentNode->parentNode->replaceChild ($attrNode, $attrNode->parentNode);\r\n  }\r\n  \/\/ Find the next element. Skip text nodes or anything that isn't an element\r\n  for ($target = $attrNode->nextSibling; $target && $target->nodeType != XML_ELEMENT_NODE; $target = $target->nextSibling); \/\/ empty loop body\r\n  if ($target) self::copyAttributes ($attrNode, $target);\r\n  $attrNode->parentNode->removeChild($attrNode);\r\n}\r\n\r\nstatic protected function copyAttributes ($from, $to){\r\n  try{\r\n    foreach ($from->attributes as $attr) $to->setAttribute($attr->nodeName, $attr->nodeValue);\r\n  }catch(Exception $e){\r\n    \/\/ ignore errors; generally are illegal characters in the attribute name\r\n  }\r\n}<\/code><\/pre>\n","protected":false},"excerpt":{"rendered":"<p>Markdown Extra (including Parsedown Extra) allows for attributes to be applied to certain elements: headers, fenced code blocks, links, and images. I'd like to be able to apply them to any element. I'm going to use the syntax of Python Markdown attributes, but the attribute lists go before the elements. For block level elements, they [&hellip;]<\/p>\n","protected":false},"author":2,"featured_media":0,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[22,9],"tags":[],"_links":{"self":[{"href":"https:\/\/bililite.com\/blog\/wp-json\/wp\/v2\/posts\/3639"}],"collection":[{"href":"https:\/\/bililite.com\/blog\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/bililite.com\/blog\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/bililite.com\/blog\/wp-json\/wp\/v2\/users\/2"}],"replies":[{"embeddable":true,"href":"https:\/\/bililite.com\/blog\/wp-json\/wp\/v2\/comments?post=3639"}],"version-history":[{"count":8,"href":"https:\/\/bililite.com\/blog\/wp-json\/wp\/v2\/posts\/3639\/revisions"}],"predecessor-version":[{"id":3647,"href":"https:\/\/bililite.com\/blog\/wp-json\/wp\/v2\/posts\/3639\/revisions\/3647"}],"wp:attachment":[{"href":"https:\/\/bililite.com\/blog\/wp-json\/wp\/v2\/media?parent=3639"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/bililite.com\/blog\/wp-json\/wp\/v2\/categories?post=3639"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/bililite.com\/blog\/wp-json\/wp\/v2\/tags?post=3639"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}