Hacking at 0300

Archive for May 26th, 2020

Extending Parsedown: Block elements

Continuing work on extending Parsedown.

Adding block-level elements is not much different from adding inline elements. Kavanot.name originally used <footer> elements to indicate the source of a block quote:

<blockquote>
    Do or do not. There is no try.
  </blockquote>
  <footer>Yoda</footer>
</blockquote>

While marking blockquotes this way is now acceptable, for a long time it wasn't, and the recommended way was with <figure> and <figcaption>. KavanotParsedown uses the latter model:

<figure>
  <blockquote>
    Do or do not. There is no try.
  <figcaption>Yoda</figcaption>
<figure>

But I start with creating a <footer> and then modifying the DOM. So I want to have a block element that I will indicate with "--" at the start of the line.

function __construct(){
  $this->BlockTypes['-'][] = 'Source'; // only line needed to indicate a block level element
  // ... rest of the constructor
}

protected function blockSource($Line, $Block = null){
  if (preg_match('/^--[ ]*(.+)/', $Line['text'], $matches)) {
    return array(
      'element' => array(
        'name' => 'footer',
        'handler' => array(
          'function' => 'lineElements',
          'argument' => $matches[1],
          'destination' => 'elements'
        ),
        'attributes' => array('class' => 'source') // for styling, add a class automatically
      )
    );
  }
}

>Do or do not. There is no try.
--Yoda

becomes

<blockquote>
    Do or do not. There is no try.
  <footer class="source" >Yoda</footer>
</blockquote>

I realized that I might want to add an attribution to an image as well, without it being in a <blockquote>, as

<p>
  <img src=/blog/blogfiles/pdf/smiley.png alt="Smile!"/>
  <footer class="source" >Some file I found on the web</footer>
</p>

But as it stands,

[Smile!](/blog/blogfiles/pdf/smiley.png)
--Some file I found on the web

doesn't work; the <p> ends before the <footer> starts:

<p>
  <img src=/blog/blogfiles/pdf/smiley.png alt="Smile!"/>
</p>
<footer class="source" >Some file I found on the web</footer>

so we need to check that the previous block wasn't a paragraph. If it was, then parse this line and add it to the paragraph as an internal element:

protected function blockSource($Line, $Block = null){
  if (preg_match('/^--[ ]*(.+)/', $Line['text'], $matches)) {
    if ($Block && $Block['type'] === 'Paragraph'){
      $Block['element']['handler']['argument'] .= "\n".$this->element($this->blockSource($Line)['element']);
      return $Block;
    }
    return array(
      'element' => array(
        'name' => 'footer',
        'handler' => array(
          'function' => 'lineElements',
          'argument' => $matches[1],
          'destination' => 'elements'
        ),
        'attributes' => array('class' => 'source') // for styling, add a class automatically
      )
    );
  }
}

and now it works, except that the footer is a child of the <p> instead of the <blockquote>. We'll have to fix that.

Posted by Danny on May 26, 2020 at 3:54 pm under Parsedown, PHP.
2 Comments.

Extending Parsedown: Inline elements

Extending Parsedown involves adding elements to the $InlineTypes and $BlockTypes arrays, then adding methods to handle them.

See the actual code.

Italics

I use <i> a lot, to indicate transliterated words. So I use could use "/" to indicate that:
/Shabbat/ is a Hebrew word becomes <i>Shabbat</i> is a Hebrew word. To do that:
do

class myParsedown extends Parsedown{
  function __construct(){
    $this->InlineTypes['/'] []= 'Italic';
    // after adding all the new inline types, create the list of characters
    $this->inlineMarkerList = implode ('', array_keys($this->InlineTypes));
    // allow the character to be escaped by '\'
    $this->specialCharacters []= '/';
  }

  protected function inlineItalic($excerpt){
    if (preg_match('#^/(.+?)/#', $excerpt['text'], $matches)) {
      return array(
        'extent' => strlen($matches[0]), 
        'element' => array(
          'name' => 'i',
          'handler' => array(
            'function' => 'lineElements',
            'argument' => $matches[1],
            'destination' => 'elements'
          )
        )
      );
    }
}

Now, my transliterated words are almost always Hebrew, so I can automatically add the lang=he attribute:


  protected function inlineItalic($excerpt){
    if (preg_match('#^/(.+?)/#', $excerpt['text'], $matches)) {
      return array(
        'extent' => strlen($matches[0]), 
        'element' => array(
          'name' => 'i',
          'handler' => array(
            'function' => 'lineElements',
            'argument' => $matches[1],
            'destination' => 'elements'
          ),
          'attributes' => array('lang' => 'he') // Add attributes
        )
      );
    }
}

and now /Shabbat/ is a Hebrew word becomes <i lang=he>Shabbat</i> is a Hebrew word.

Cite

I also use the <cite>. I'm running out of single characters to indicate elements, so I'm going to redefine "-". I don't need two different markers for <em>.

  function __construct(){
    $this->InlineTypes['_'] = ['Cite']; // redefinition; I am replacing the old array (which was ['Emphasis'])
    // ... rest of the constructor as above
  }

  protected function inlineCite($excerpt){
    if (preg_match('#^_(.+?)_#', $excerpt['text'], $matches)) {
      return array(
        'extent' => strlen($matches[0]), 
        'element' => array(
          'name' => 'cite',
          'handler' => array(
            'function' => 'lineElements',
            'argument' => $matches[1],
            'destination' => 'elements'
          )
        )
      );
    }

And now _A Tale of Two Cities_ becomes <cite>A Tale of Two Cities</cite>

Posted by Danny on May 26, 2020 at 3:06 pm under Parsedown, PHP.
Comment on this post.

String Replacement in PHP

Working with Parsedown, I want to string manipulation but only in certain parts. For instance, on text not in HTML tags or not in quotes. The right way to do that is with a real parser. The easy way is by removing the unwanted strings, replacing them with a marker that won't come up in normal text, doing the manipulation, then replacing the markers (it is the replacement step that requires "a marker that won't come up in normal text"; you don't want to replace text that was present in the original).

I would use a marker that can't be typed but still is legal HTML; turns out that U+FFFC (OBJECT REPLACEMENT CHARACTER, ) is perfect for that. So I made a pair of functions, `StringReplace\remove` and `StringReplace\restore` to make that easy.

StringReplace\remove ($re, $target)

Any string that matches the regular expression $re in $target is replaced by a numbered marker, "{number}". The new string is returned. So for instance,

$rawtext = StringReplace\remove ('#</?[^>]*>#', $html);

will remove tags.

StringReplace\restore ($target)

Returns a string with the markers replaced by their original versions.

The code

namespace StringReplace;

define ('OBJECT_REPLACEMENT_CHARACTER', '');
define ('RE_REPLACEMENT', '/'.OBJECT_REPLACEMENT_CHARACTER.'(\d+)'.OBJECT_REPLACEMENT_CHARACTER.'/');

$strings = array();

$remover = function ($matches){
  global $strings;
  $strings []= $matches[0];
  return OBJECT_REPLACEMENT_CHARACTER.count($strings).OBJECT_REPLACEMENT_CHARACTER;
};

$replacer = function ($matches){
  global $strings;
  return $strings[$matches[1]-1];
};

function remove ($re, $target){
  global $remover;
  return preg_replace_callback ($re, $remover, $target);
}

function restore ($target){
  global $replacer;
  return preg_replace_callback (RE_REPLACEMENT, $replacer, $target);
}

Posted by Danny on May 26, 2020 at 2:38 pm under PHP.
Comment on this post.

S	M	T	W	T	F	S
					1	2
3	4	5	6	7	8	9
10	11	12	13	14	15	16
17	18	19	20	21	22	23
24	25	26	27	28	29	30
31

Archive for May 26th, 2020

Extending Parsedown: Block elements

Extending Parsedown: Inline elements

Italics

Cite

String Replacement in PHP

The code

Free Medical Advice

Recent Posts

Pages

Archives

Judaism

Medical Informatics

Web Design

Meta

Hacking at 0300

Archive for May 26th, 2020

Extending Parsedown: Block elements

Extending Parsedown: Inline elements

Italics

Cite

String Replacement in PHP

The code

Free Medical Advice

Recent Posts

Pages

Categories

Archives

Judaism

Medical Informatics

Web Design

Meta