{"id":3591,"date":"2020-05-22T17:15:23","date_gmt":"2020-05-22T23:15:23","guid":{"rendered":"http:\/\/bililite.com\/blog\/?p=3591"},"modified":"2022-07-11T11:05:03","modified_gmt":"2022-07-11T17:05:03","slug":"extending-parsedown","status":"publish","type":"post","link":"https:\/\/bililite.com\/blog\/2020\/05\/22\/extending-parsedown\/","title":{"rendered":"Extending Parsedown"},"content":{"rendered":"<p>I've been spending all my intellectual free time on working on <a href=\"http:\/\/kavanot.name\">my Kavanot site<\/a>, so I haven't been doing any independent programming. But that site uses raw HTML, which is a pain to type. So I decided to start using <a href=\"https:\/\/daringfireball.net\/projects\/markdown\/syntax\">Markdown<\/a> to make writing easier. After a little trial and error, I decided to use <a href=\"https:\/\/github.com\/erusev\/parsedown\">Parsedown<\/a> with <a href=\"https:\/\/github.com\/erusev\/parsedown-extra\">Parsedown Extra<\/a>.<\/p>\n<p><a href=https:\/\/github.com\/dwachss\/kavanotparsedown>See the code<\/a>.<\/p>\n<p><!--more--><\/p>\n<p>This gives me tables and blockquotes along with simple URL's and <code class=\"language-html\" >&lt;em&gt;<\/code> and <code class=\"language-html\" >&lt;strong&gt;<\/code>. But it's not perfect.<\/p>\n<p>(As an aside, tables were a bit of work to figure out. They have to start with <code class=language-markdown>| whatever | whatever<\/code> and the next line has to be the divider, <code class=language-markdown>|---|---|<\/code>, with <em>exactly<\/em> the same number of cells. Only that number of cells will display, so<\/p>\n<pre><code class=language-markdown >| first header | second header \r\n|--------------|--------------\r\n| first element| second element| third element\r\n<\/code><\/pre>\n<p>will only produce<\/p>\n<pre><code class=\"language-html\" >&lt;table&gt;\r\n&lt;thead&gt;\r\n&lt;tr&gt;\r\n&lt;th&gt;first header&lt;\/th&gt;\r\n&lt;th&gt;second header&lt;\/th&gt;\r\n&lt;\/tr&gt;\r\n&lt;\/thead&gt;\r\n&lt;tbody&gt;\r\n&lt;tr&gt;\r\n&lt;td&gt;first element&lt;\/td&gt;\r\n&lt;td&gt;second element&lt;\/td&gt;\r\n&lt;\/tr&gt;\r\n&lt;\/tbody&gt;\r\n&lt;\/table&gt;<\/code><\/pre>\n<p>losing that third column. Also, there's no way to eliminate the header entirely, but if the header cells are blank, then the empty &lt;thead&gt;<br \/>\nwill take minimal space.)<\/p>\n<h3>Under the Hood<\/h3>\n<p>I wanted to add things that would make my life easier, such as adding language attributes (since I go between English and Hebrew text, with a smattering of Greek and even some <a href=\"http:\/\/kavanot.name\/prST+SmvT+TSAH\/\">Hieroglyphics<\/a>) and easily entering <code class=\"language-html\" >&lt;cite&gt;<\/code> and <a href=\"https:\/\/developer.mozilla.org\/en-US\/docs\/Web\/HTML\/Element\/i\"><code class=\"language-html\" >&lt;i&gt;<\/code><\/a> elements.<\/p>\n<p>So that meant <a href=\"https:\/\/www.urbandictionary.com\/define.php?term=Use%20the%20Source%2C%20Luke\">looking at the source code<\/a>. There is a <a href=\"https:\/\/github.com\/erusev\/parsedown\/wiki\/Tutorial:-Create-Extensions\">tutorial<\/a> for creating extensions, but it is not based on the most recent version (which as of this writing is 1.8.0-beta-7), so it's incomplete. <\/p>\n<p>Parsedown has only one useful public method, <code class=\"language-php\" >Parsedown::text($text)<\/code>. It works by breaking the text into lines, then calling <code class=\"language-html\" >linesElements($lines)<\/code> which iterates over each line with <code class=\"language-html\" >linesElements($lines)<\/code> (yes, it's confusing to have the only difference being an 's' in the middle of the name) to parse the lines into an array of \"element\"s, each of which is an array of the form:<\/p>\n<pre><code class=\"language-php\" >array(\r\n  'name' => 'tag name',\r\n  'attributes' =&gt; array ('attribute name' =&gt; 'attribute value'),\r\n  'rawHTML' =&gt; 'a string of HTML that can optionally be escaped as unsafe',\r\n  \/\/ OR\r\n  'text' =&gt; 'a string of text that will not be further parsed',\r\n  \/\/ OR\r\n  'element' =&gt; array('a single \"element\" array that represents the child of this element'),\r\n  \/\/ OR\r\n  'elements' =&gt; array (array('an array of \"element\" arrays that represent all the children of this element'))\r\n  \/\/ OR\r\n  'handler' =&gt; array ('an array that tells Parsedown that further processing is needed')\r\n);<\/code><\/pre>\n<p>and the <code class=\"language-php\" >'handler'<\/code> array is:<\/p>\n<pre><code class=\"language-php\" >array(\r\n  'handler' =&gt; 'name of method that will parse the text into markup, which will be either the \"lineElements\" or \"linesElements\" methods',\r\n  'argument' =&gt; 'the text to be passed to \"handler\", which is either a string for \"lineElements\" or an array of strings for \"linesElements\"',\r\n  'destination' =&gt; 'index to insert the parsed text, which will be one of \"rawHTML\", \"text\", \"element\", or \"elements\"'\r\n);<\/code><\/pre>\n<p>The method <code class=\"language-php\" >elements(array $Elements)<\/code> then recursively processes the elements to produce a string of markup.<\/p>\n<h3>The Details: Block level elements<\/h3>\n<p>Parsing a line consists of looking for a marker of a \"block element\" as the first character:<\/p>\n<pre><code class=\"language-php\" > protected $BlockTypes = array(\r\n        '#' => array('Header'),\r\n        '*' => array('Rule', 'List'),\r\n        '+' => array('List'),\r\n        '-' => array('SetextHeader', 'Table', 'Rule', 'List'),\r\n        '0' => array('List'),\r\n        '1' => array('List'),\r\n        '2' => array('List'),\r\n        '3' => array('List'),\r\n        '4' => array('List'),\r\n        '5' => array('List'),\r\n        '6' => array('List'),\r\n        '7' => array('List'),\r\n        '8' => array('List'),\r\n        '9' => array('List'),\r\n        ':' => array('Table'),\r\n        '<' => array('Comment', 'Markup'),\r\n        '=' => array('SetextHeader'),\r\n        '>' => array('Quote'),\r\n        '[' => array('Reference'),\r\n        '_' => array('Rule'),\r\n        '`' => array('FencedCode'),\r\n        '|' => array('Table'),\r\n        '~' => array('FencedCode'),\r\n    );<\/code><\/pre>\n<p>or no marker, which is either a <code class=\"language-html\" >&lt;p&gt;<\/code> or a <code class=\"language-html\" >&lt;pre&gt;&lt;code&gt;<\/code> element, depending on if it is indented or not. Parsedown then creates a method name of <code class=\"language-php\" >'block'.$blockType<\/code> (for instance <code class=\"language-php\" >blockQuote<\/code>, and calls that with the line to be parsed and the current state of the parser, which is called a \"Block\" and is an array: <\/p>\n<pre><code class=\"language-php\" >array(\r\n  'type' => 'the name from the array above'\r\n  'element' => array ('element array as defined above, for the most recently defined element')\r\n  'interrupted' => NULL \/\/ or the number of blank lines before the current line. Blank lines separate blocks. It's not clear why he counts them; the only thing that matters is if it is set or not\r\n  'continuable' => TRUE or FALSE \/\/ TRUE if this block automatically continues on the next line, like a <code class=\"language-html\" >&lt;table&gt;<\/code>, or FALSE if it only spans one line, like an <code class=\"language-html\" >&lt;h1&gt;<\/code>\r\n  'identified' => TRUE or FALSE \/\/ TRUE if the function is returning the same block or FALSE if a whole new one \r\n  \/\/ and other aspects of the state.\r\n);\r\n<\/code><\/pre>\n<p>The function returns <code class=\"language-php\" >NULL<\/code> if it <em>cannot<\/em> handle the text, returns the original \"Block\" array (modified as necessary) or returns a new \"Block\" array (in that case, the last \"Block\" is processed to produce an array of \"element\"s).<br \/>\nIf the \"Block\" is marked 'continuable', then the method  <code class=\"language-php\" >'block'.$blockType.Continue<\/code> (for instance <code class=\"language-php\" >blockQuoteContinue<\/code>) is called with the next line. When a \"Block\" is processed, the method <code class=\"language-php\" >'block'.$blockType.Complete<\/code> (for instance <code class=\"language-php\" >blockQuoteContinue<\/code>) is called.<\/p>\n<p>If the handling function returns <code class=\"language-php\" >NULL<\/code>, the next handler in the <code class=\"language-php\" >$BlockTypes[$marker]<\/code> is called, until the \"Block\" is handled, or the <code class=\"language-php\">paragraph<\/code> handler is called.<\/p>\n<p>Block-level handlers generally create \"elements\" that have <code class=\"language-php\" >\"handler\" == \"linesElements\"<\/code>, and the continuation handlers append the line to the <code class=\"language-php\" >\"argument\"<\/code>, so processing will continue recursively and elements can nest.<\/p>\n<h3>The Details: inline elements<\/h3>\n<p>Once there are no more markers for block elements, each line is scanned for markers for inline elements. For some reason, the program lists these in two places:<\/p>\n<pre><code class=\"language-php\" >$inlineMarkerList = '!*_&[:<`~\\\\';\r\n\/\/ AND\r\n$InlineTypes = array(\r\n  '!' => array('Image'),\r\n  '&' => array('SpecialCharacter'),\r\n  '*' => array('Emphasis'),\r\n  ':' => array('Url'),\r\n  '<' => array('UrlTag', 'EmailTag', 'Markup'),\r\n  '[' => array('Link'),\r\n  '_' => array('Emphasis'),\r\n  '`' => array('Code'),\r\n  '~' => array('Strikethrough'),\r\n  '\\\\' => array('EscapeSequence'),\r\n);\r\n<\/code><\/pre>\n<p>where he could have just done <\/p>\n<pre><code class=\"language-php\" >\r\n$inlineMarkerList = implode ('', array_keys($InlineTypes));\r\n<\/code><\/pre>\n<p>in the constructor. I would do that for any Parsedown extension.<\/p>\n<p>But the handling is similar to that for block elements. For each line, scan for any of the characters in <code class=\"language-php\" >$inlineMarkerList<\/code>, then for each of the strings for that marker in <code class=\"language-php\" >$InlineTypes<\/code>, create a method name <code class=\"language-php\" >'inline'.$inlineType<\/code> (for instance <code class=\"language-php\" >inlineEmphasis<\/code>) and calls that with the string to be parsed (starting from the marker, ending at the newline). The handler decides if it wants to handle the line or not. If not, returns <code class=\"language-php\" >NULL<\/code>. If yes, returns and array with two values:<\/p>\n<pre><code class=\"language-php\" >array(\r\n  'extent' => number of characters that the handler is consuming,\r\n  'element' => array (element array as defined above)\r\n);\r\n<\/code><\/pre>\n<p>Processing then continues with the rest of the line. Any text not handled is left untouched.<\/p>\n<p>Now I know enough to create some extensions.<\/p>\n","protected":false},"excerpt":{"rendered":"<p>I've been spending all my intellectual free time on working on my Kavanot site, so I haven't been doing any independent programming. But that site uses raw HTML, which is a pain to type. So I decided to start using Markdown to make writing easier. After a little trial and error, I decided to use [&hellip;]<\/p>\n","protected":false},"author":2,"featured_media":0,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[22,9],"tags":[],"_links":{"self":[{"href":"https:\/\/bililite.com\/blog\/wp-json\/wp\/v2\/posts\/3591"}],"collection":[{"href":"https:\/\/bililite.com\/blog\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/bililite.com\/blog\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/bililite.com\/blog\/wp-json\/wp\/v2\/users\/2"}],"replies":[{"embeddable":true,"href":"https:\/\/bililite.com\/blog\/wp-json\/wp\/v2\/comments?post=3591"}],"version-history":[{"count":24,"href":"https:\/\/bililite.com\/blog\/wp-json\/wp\/v2\/posts\/3591\/revisions"}],"predecessor-version":[{"id":3615,"href":"https:\/\/bililite.com\/blog\/wp-json\/wp\/v2\/posts\/3591\/revisions\/3615"}],"wp:attachment":[{"href":"https:\/\/bililite.com\/blog\/wp-json\/wp\/v2\/media?parent=3591"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/bililite.com\/blog\/wp-json\/wp\/v2\/categories?post=3591"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/bililite.com\/blog\/wp-json\/wp\/v2\/tags?post=3591"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}