From Dotclear to Eleventy 3

Table of content

The task

Now that I had an eleventy site that looks how I want, I need to populate it with the real content. Currently, each post is saved in a database with a specific syntax that resembles but is not Markdown, may contain pure HTML, image hotlinks or hidden notes. Each post may have comments saved in another table of the database and I wish to keep them too.

Content migration

Since Dotclear is made in php, I used this language to:

Get the content

connect to the database and read posts one by one
find a folder name for the new file structure based on year and month

Parse the content

parse the text to find references to images
parse the text to change the wiki syntax to Markdown
build a front matter with metadata from the database
note if any errors are detected

Save files in the new structure

save the new text with a front matter in a new file structure
find referenced images in Dotclear media folders
copy over images to new file structure

In case of comments

find comments from the database
parse the text of the comments to change the wiki syntax to Markdown
save each comment in a subfolder in the new file structure

Connect to the database

Even if Dotclear offered such function, I did it directly from the php script that I could use it as a standalone page.

$conn = new mysqli($servername, $username, $password, $dbname);

First I prepared the script by loading only one post chosen for a format difficulty that I had to solve

$sql = "SELECT *  FROM dc_post WHERE post_id = 476";

Then, when I was more confident, I proceeded by bunch of posts, that I again needed to verify visually. Once they looked good I could lanch the same script on the next bunch of posts, untill the end (the blog has more than 700 posts)

$sql = "SELECT *  FROM dc_post WHERE post_id BETWEEN 501 AND 800";
$result = $conn->query($sql);

File structure

The file structue is a classic structure for a blog. There are folders with years containing folders with months containing files with post as well at their related images. If the post has comments a folder with the name of the post will contain comments.

tree of files

Information about years and months comes from the database for each post. So we create folders as we go. Of course there is no need to create them if the folder exists. (if (!file_exists(dirname(__FILE__) . "/pages/" . $year)))

Since 2013, I didn’t write as much as before so from this year on, year folders contain directly the files for posts. (if ($year > 2013))

    $year = substr($row[post_dt], 0, 4);
    if (!file_exists(dirname(__FILE__) . "/pages/" . $year)) {
      mkdir(dirname(__FILE__) . "/pages/" . $year, 0777, true);
      echo "<pre style='color: #226600'>";
      echo "FOLDER " . $year . " created";
      echo "</pre>";
    } else {
      echo "<pre style='color: #cc6600'>";
      echo "FOLDER " . $year . " dejà là";
      echo "</pre>";
    }

    if ($year > 2013) {
      echo "<pre style='color: #0000ff'>";
      echo "recent" . $year . " No folder month";
      echo "</pre>";
      $folder = $year;
    } else {
     $month = substr($row[post_dt], 5, 2);
      if (!file_exists(dirname(__FILE__) . "/pages/" . $year . "/" . $month)) {
        mkdir(dirname(__FILE__) . "/pages/" . $year . "/" . $month, 0777, true);
        echo "<pre style='color: #226600'>";
        echo "FOLDER " . $month . " created";
        echo "</pre>";
      } else {
        echo "<pre style='color: #cc6600'>";
        echo "FOLDER " . $month . " dejà là";
        echo "</pre>";
      }
      $folder = $year . "/" . $month;
    }
    $folder = "pages/" . $folder;

Find images

In dotclear, images are all in the folder /public/images/ It was easy to find images with a patern that would match all files in this folder. From there, I had to capture the files names with the complete path ($matches[1][$i]) as well as the file name without path (basename($matches[1][$i])) to save it in the new file structure.

The image file is copied from its source (complete path) to the destination folder which is the $folder for the post as described above.

  $origin_file = realpath($_SERVER["DOCUMENT_ROOT"]).$matches[1][$i];
  $destination_file = dirname(__FILE__) . "/" . $folder . "/" . basename($matches[1][$i]);

I also saved the name of the image first found in page (part with if ($i == 0)) in a variable that will be saved in the front matter as the meta image of the post.

    $match_image = "/\(\((\/public\/images\/[0-9a-zA-Z-_.\/]*\/[0-9a-zA-Z-_.]*)(?:\|([^|\]]*))?(?:\|([A-Z]{0,1}))?\)\)/";
    preg_match_all($match_image, $content, $matches);

    for($i = 0; $i < count($matches[1]); $i++) {
      if (file_exists(realpath($_SERVER["DOCUMENT_ROOT"]).$matches[1][$i])) {
        $origin_file = realpath($_SERVER["DOCUMENT_ROOT"]).$matches[1][$i];
        $destination_file = dirname(__FILE__) . "/" . $folder . "/" . basename($matches[1][$i]);
        if (!copy($origin_file, $destination_file)) {
          echo "<pre style='color: #aa0000'>";
          echo "failed to copy $origin_file...\n";
          echo "</pre>";
        } else {
          echo "<pre style='color: #669900'>";
          echo $i."· ". basename($matches[1][$i]) . " copied \o/ ";

          if ($i == 0) {
            $image = pathinfo($matches[1][0], PATHINFO_BASENAME);
            $imagealt = $matches[2][0];
          }
          echo "</pre>";
        }
      } else {
        echo "<pre style='color: #aa0000'>";
        echo $i." ". $matches[1][$i] . " doesn't xists " . realpath($_SERVER["DOCUMENT_ROOT"]).$matches[1][$i];
        echo "</pre>";
      }
    }

Content formating

Meta data such as the title, category and tags populates the Front matter. Most of this data comes from the database but also from the previous steps. The fromt matter is followed by a --- followed by the main content of the post.

    $fullcontent =   "---";
    $fullcontent .= "\nlayout: base";
    $fullcontent .= "\ntitle: " . entreQuote($title);
    $fullcontent .= "\ndescription: ". entreQuote($shortdescription);
    $fullcontent .= "\ncategorie: ". $categories[$row[cat_id]];
    $fullcontent .= "\ntags: [". $taglist . "]";
    $fullcontent .= "\nisMarkdown: true";
    $fullcontent .= "\nthumbnail: ". $image;
    $fullcontent .= "\nimage_alt: ". entreQuote($imagealt);
    $fullcontent .= "\npermalink: ". $row[post_url]."/";
    $fullcontent .= "\ndate: ". substr($row[post_dt], 0, 10);
    $fullcontent .= "\nupdate: ". substr($row[post_upddt], 0, 10);
    if ($todo) { $fullcontent .=  "\nTODO: ". substr($todo, 2); }
    $fullcontent .= "\n---";
    $fullcontent .= "\n\n". $cleancontent;
    $fullcontent .= "\n---";

Text in the front matter should not contain single or double quotes nor colon that would break the formating. The function entreQuote makes sure the text is surrounded by double quotes if there any of these caracters in the text.

function entreQuote($string) {
  $string = str_replace("\"", "'", $string);
  if (mb_strpos($string, ":") !== false || mb_strpos($string, "'") !== false) {
    $string = '"' . $string . '"';
  }
  return $string;
}

Footnotes

In this blog, I used and abused footnotes as they were easy to write in Dotclear. The doctlear syntax hadd to be changed into simple tags like [^1] and [^2] in the text. I made sure this is done in the script.

  // footnotes
  preg_match_all('/\$\$([^\$]*)\$\$/', $cleancontent, $contentnotes);
  foreach ($contentnotes[1] as $key=>$value) {
    $count = (int)$key + 1;
    $cleancontent = str_replace('$$'.$value.'$$', '[^'.$count.']', $cleancontent);
  } // [^1] ref for footnotes

Article notes

Dotclear allows editors to save notes along with articles. As I didn’t want to loose these notes, I added them to the bottom of the content in HTML comments. This way, they would be available when editing the file. And because eleventy is removing any comments from the static files it generates, there is no risk of publishing them.

    foreach ($contentnotes[1] as $key=>$value) { // footnotes
      $count = (int)$key + 1;
      $fullcontent .= "\n[^".$count."]: ".$value;
    }
    // post notes are saved as a html comment bellow the content
    if ($row[post_notes]) {
      $fullcontent .= "\n<!-- post notes:\n";
      // remove line breaks in comments so markdown undderstands there is one block
      $postnotes = str_replace("\r\n\r\n", " \r\n", $row[post_notes]);
      $fullcontent .= $postnotes;
      $fullcontent .= "\n--->\n";
    }

I had to work on all this several times with many different posts in order to make sure that the content formating was generating propper Markdown. At this point I did echo $fullcontent with a lot of try and fix.

Save the file

Once $fullcontent is complete, all it needs is to be saved in a file, with the right file name and in the right folder.

The right folder is known as $folder (see the step about year and month) and the name is already in the blurb saved in the front matter with $row[post_url]. So saving the file is quite trait forward.

   $filename = dirname(__FILE__) . "/" . $folder . "/" . $row[post_url] . ".md";
   file_put_contents($filename, $fullcontent);

Save comments

Comments are saved in a dedicated table of the database. To figure out if a post has a comment we need to look into this table and check for the reference of this post ($row[post_id] from the previous request).

    $sql_com = "SELECT *  FROM dc_comment WHERE post_id = ".$row[post_id];
    $result_com = $conn->query($sql_com);

    if ($result_com->num_rows > 0) {
        echo "there be comments<br>";

A post can have more than one comment. In that case, we have to iterate through each result row to get each comment.

while($row_com = $result_com->fetch_assoc()) {

There is no need to parse the content as comments are saved in HTML, we can save them as they are and they will be rendered as they are.

Nevertheless, comment metadata such as username, url and email need to be saved as well. They are then added as metadata in the front matter.

      while($row_com = $result_com->fetch_assoc()) {
        if ($row_com[comment_status] > 0) {

          $comcontent =   "---";
          $comcontent .= "\ndate: ". substr($row_com[comment_dt], 0, 10);
          $comcontent .= "\nauthor: ". $row_com[comment_author];
          $comcontent .= "\nemail: ". $row_com[comment_email];
          $comcontent .= "\nsite: ". $row_com[comment_site];
          $comcontent .= "\ntags: ". "comment";
          $comcontent .= "\npermalink: ". "false";
          $comcontent .= "\n---";
          $comcontent .= "\n\n". $row_com[comment_content];
          $comcontent .= "\n---";

Notice that the front matter also contains permalink: ". "false" which will explain eleventy to not generate a page for each comment. Comments are only meant to be included at the end a post.

Saving the comment data is done the same way as for posts except that comments are copied in a specific folder with the name of the post.

First we check if the folder exist and we create it.

    if (!file_exists(dirname(__FILE__) . "/" . $folder . "/" . $row[post_url])) {
      mkdir(dirname(__FILE__) . "/" . $folder . "/" . $row[post_url], 0777, true);
      echo "<pre style='color: #226600'>";
      echo "FOLDER " . $row[post_url] . " created";
      echo "</pre>";
    } else {
      echo "<pre style='color: #cc6600'>";
      echo "FOLDER " . $row[post_url] . " dejà là";
      echo "</pre>";
    }

Then the content of the comment $comcontent is saved in a file identified by the comment id ($row_com[comment_id]) placed in the newly created folder.

    $filename = dirname(__FILE__) . "/" . $folder . "/" . $row[post_url] . "/comment-" .$row_com[comment_id] . ".md";
    file_put_contents($filename, $comcontent);

In the end

After the script is completed, never forget to close the database connexion.

$conn->close();

Considerations post migration

After I made all this content migration, I had to go through all posts and check if they were displaying correctly. Off course there were mistakes that I fixed manually.

Now that all is done I can admit that I should have spend a bit more time on the script, especially the content formating and links to images ; I would have saved a lot of time fixing manually the remaining mistakes such as:

descriptions containing links
html comments containing empty lines
preformated text
images with links
images starting with a dot (generated by dotclear)

I also forgot to automatically mark draft articles. I had to do it manually. adding the following code in the front matter:

permalink: false
eleventyExcludeFromCollections: true

There are also problems that were latent and could not be fixed without manual intevention. This is why I still had to review each article manually.

images without alt text (needed to be created)
hotlinked images where the source was deleted (needed to be replaced or removed)

All this content editing took time but as I think that my blog deserve to remain online, it was worth the effort.

Now I need to make sure pages are well linked together and nicely sorted with tags and categories.

Next: - Rebuilding navigation and pages