Markdown Extra (including Parsedown Extra) allows for attributes to be applied to certain elements: headers, fenced code blocks, links, and images. I'd like to be able to apply them to any element. I'm going to use the syntax of Python Markdown attributes, but the attribute lists go before the elements. For block level elements, they go on their own line before the element.
My attribute lists start with {:
(not just {
) and end with }
. Anything that would be legal in HTML (as
is fine, since I didn't write my own parser. I just used DOMDocument
. There are three special cases:
.foo
is changed toclass="foo"
. Note that this is different from the Python code, which appends class names that start with.
. Repeated attribute names in actual HTML are ignored, so to use two classes, useclass="foo bar"
, not.foo .bar
.#foo
is changed toid="foo"
.- Two letters alone are changed to
lang=xx
, since I use that attribute so much.
So {:fr}*Bonjour*
becomes <em lang=fr>Bonjour</em>
and
{:style="border: 1px solid black;" #table1}
| Name | Color |
|------|-------|
| Joe | Red |
becomes
<table style="border: 1px solid black;" id="table1">
<thead>
<tr>
<th>Name</th>
<th>Color</th>
</tr>
</thead>
<tbody>
<tr>
<td>Joe</td>
<td>Red</td>
</tr>
</tbody>
</table>
The code
There are two parts to the code: first, create an <attr>
with the appropriate attributes. Second, find the next element and apply those attributes to it, and delete the <attr>
element.
For the first, we adjust the attributes (pulling out the quoted strings first, with StringReplace), then create an element with those attributes and use PHP's HTML parser to figure them out.
function __construct(){
$this->InlineTypes['{'] []= 'Attributes';
// rest of the constructor
}
protected function inlineAttributes($excerpt){
if (preg_match('#^{:(.+?)}\s*#', $excerpt['text'], $matches)) {
return array(
'extent' => strlen($matches[0]),
'element' => array(
'name' => 'attr',
'attributes' => $this->parseAttributes($matches[1])
)
);
}
}
// and to allow for an attribute list on a line by itself, to apply to block-level elements, we need to stop paragraph
protected function paragraphContinue($Line, array $Block){
if (preg_match ('#^{:(.+?)}\s*#', $Block['element']['handler']['argument'])){
// the previous block was an attribute list in a paragraph
$Block['interrupted'] = 1;
}
return parent::paragraphContinue($Line, $Block);
}
protected function parseAttributes ($attrString){
$attrString = " $attrString "; // make the regular expressions simpler by prepending and appending spaces
$attrString = StringReplace\remove ('/("[^"]*")|(\'[^\']*\')/', $attrString); // pull out quoted strings
$attrString = preg_replace ('/ #(\w+)(?= )/', ' id=$1 ', $attrString); // #foo
$attrString = preg_replace ('/ \.(\w+)(?= )/', ' class=$1 ', $attrString); // .foo
$attrString = preg_replace ('/ ([a-zA-Z]{2})(?= )/', ' lang=$1 ', $attrString); // two letter language names
$attrString = StringReplace\restore ($attrString); // put the quotes back
// the loadHTML is so complicated because we want to ignore errors and not add DTD's, , etc.
// and we want it to use UTF-8, which isn't automatic
$dom = DOMDocument::loadHTML(" ",
LIBXML_HTML_NOIMPLIED | LIBXML_NOWARNING | LIBXML_NOERROR | LIBXML_HTML_NODEFDTD);
// now, $dom->firstChild is the and nextSibling is the
$ret = [];
foreach ($dom->firstChild->nextSibling->attributes as $attr){
$ret [$attr->nodeName] = $attr->nodeValue;
}
return $ret;
}
Now we will have invalid HTML (there's no such things as an <attr>
element). So we use DOMDocument
again to manipulate the newly created DOM by overriding the basic method of Parsedown, text()
{
function text($text){
$text = parent::text($text);
$dom = DOMDocument::loadHTML(''.$text.'',
LIBXML_HTML_NOIMPLIED | LIBXML_NOWARNING | LIBXML_NOERROR | LIBXML_HTML_NODEFDTD);
$xpath = new DOMXpath($dom);
foreach ($xpath->query("//attr") as $attr) self::moveAttributes ($attr);
// do any other manipulation of the DOM
// turn it back into HTML
$text = $dom->saveHTML($dom->documentElement);
// do any other manipulation of the text
return $text;
}
static protected function moveAttributes ($attrNode){
// apply the attributes of $attrNode to the next node, then delete it.
// Special case: an attribute on a line by itself will be enclosed in a . Delete that
if ($attrNode->parentNode->childNodes->length == 1 && $attrNode->parentNode->nodeName == 'p'){
self::copyAttributes ($attrNode->parentNode, $attrNode);
$attrNode->parentNode->parentNode->replaceChild ($attrNode, $attrNode->parentNode);
}
// Find the next element. Skip text nodes or anything that isn't an element
for ($target = $attrNode->nextSibling; $target && $target->nodeType != XML_ELEMENT_NODE; $target = $target->nextSibling); // empty loop body
if ($target) self::copyAttributes ($attrNode, $target);
$attrNode->parentNode->removeChild($attrNode);
}
static protected function copyAttributes ($from, $to){
try{
foreach ($from->attributes as $attr) $to->setAttribute($attr->nodeName, $attr->nodeValue);
}catch(Exception $e){
// ignore errors; generally are illegal characters in the attribute name
}
}
Leave a Reply