Five Minute Breaks

About two weeks ago, I posted this in a few places:

Dear Lazy Plus,

For a few days now I’ve started to take regular, 5-minute breaks during work. They are literally 5-minute breaks, with a timer running on my phone. I make a cup of tea, take a few sips, and di-ding! I’m heading back to my desk.

Thing is, there’s only so many times making tea is enjoyable. Or any beverage, for that matter.

I’m looking for something else to do in these 5-minute breaks. Ideally, something that isn’t staring at my phone, or involving any kind of screen. One thing I tried is juggling 3 balls. I’m not very good at it, which is great, because I feel that I’m improving slightly. But I’d like to add more activities. Any ideas?

Here’s the compilation of answers I’ve received:

  • Music room [I play bass]
  • Take a 5 minute nap on a bean bag
  • Doodle
  • Stairs
  • Rubik’s cube
  • Meditation (breathing, etc)
  • Do nothing, just relax, maybe reflect on what you did since the last break and why what you’re going to do next matters
  • Juggling a squash ball [I was offered instructions on how to do it]
  • Basic wrist / arm stretches
  • Go strike a conversation with someone on another floor
  • Origami. Yodeling. Bonsai. Learning to play the xylophone. Chemistry experiments. Psychological experiments.
  • Push-ups. Skipping rope. Chin-ups. Hold a pillar bridge/plank for 5 mins. Balance things on your head. Meditate. Journal.
  • Take up smoking [plus a suggestion of a smoking companion, haha]
  • Do two things with a Mobius strip. Do one thing with a piece of knot theory.
  • Deep Diaphragmatic Breathing, five minutes of that a day will revolutionize your life.
  • Work on doing impressions.

So far, the easiest thing to do, was striking a conversation with whomever happened to be around the micro kitchen. I’ve talked to several people who I’ve just been passing in the corridor without even a greeting. Now I know their names and a little bit about them. It was satisfying. It will take me time to try out the other ones.

phpBB static archive

I looked online for instructions on how to create a static phpBB archive of
a retired forum, and didn’t find much, apart from other people asking the same thing. I’ve investigated it myself.

UPDATE 2016-05-09: New things I found: How to archive phpBB (similar writeup), and phpbb3-static (a converter script).

UPDATE 2016-11-28: I’ve decided to do it again, better, using phpbb3-static.

General options

When choosing your approach, one of the criteria is the future maintenance cost. It’s likely that the reason that you want a static archive is that you want it to not require maintenance, or require as little as possible.

Optoion 1: Lock the forum and continue to run phpBB

  • Pros:
    • There’s little to do, so it’s quick.
  • Cons:
    • High maintenance. It’s not static. You’re still running PHP, so you have to keep on upgrading your PHP installation and your phpBB installation, or your forum archive will get hacked.

Option 2: Download the whole forum using wget or httrack

  • Pros:
    • The result looks the same as the original.
  • Cons:
    • The result looks the same as the original. (e.g. hard to browse on phones)
    • Out of the box, it does not work! It requires tweaks as discussed below.
    • Lots of content duplication. If there are different URLs with the same content, they will exist as separate files on disk.

Optoion 3: Write your own exporter

Query the database with SQL and write the output the way you want it.

  • Pros:
    • Low maintenance of the resulting site.
    • High level of control of how the output is structured.
  • Cons:
    • Writing the exporter is time consuming.
    • The output will most likely look different from the original forum, so people used to the forum who are browsing it will be likely confused about the navigation.
    • You need to put in additional work to preserve the old URLs.

Also… you could even generate a set of Markdown files to be fed as input to a static website generator such as hugo. This would give you a lot of things for free, including nice URLs and a sitemap.

Option 4: Use an existing exporter

  • Pros:
    • Low maintenance result.
    • Takes less time than Option 3, with comparable results.
  • Cons:
    • You can’t expect the exporter to just work for you, especially if you’ve modified / heavily customized your forum. You will have to dig into the exporter script and fix issues in the (somebody else’s) code.
      Archiving a forum is a one-off job. Once the result is satisfying, the user will lose interest in the exporter and will most likely not improve it any further. When you pick up an exporter, you’ll pick it up where the previous user left off.

Post content / bbcode

From my experience proper processing of the post content is the hardest problem. This is due to the format that phpBB uses to store posts in the database.

You would think that there is just one syntax – the one that forum users enter, which is stored in the database, and rendered into HTML when served on the web. In the case of phpBB it is not so: there are 3 formats! One for the user to edit, one to display (HTML) and something intermediate, that is stored in the database.

The existing exporter I found, phpbb3-static, used an existing bbcode parser to transform the database contents into HTML. The problem is that the database content isn’t bbcode, or at least it isn’t pure bbcode.

It’s a mix of HTML containing raw <a href=”…”>…</a> links, with bbcode links (“[]bbcode links[/url]”), and the existing bbcode parser tries to linkify bare URLs that it spots in the content. If there’s something like this in the content…


…the end result is (indentation added for readability)…

<a href="$valid_url">
  <a href="$truncated_url">

…and that doesn’t work, because $truncated_url is… truncated. This is what phpBB does with link links by default: It shortens turns “longlonglonglink” into “lo…nk”. The first part still starts with “http://” so the bare link matcher catches it and adds a <a href=”…”></a> tag around it.

I examined the database representation and realized that it’s complex and improving the parser on my own is futile, and in the best case I would be merely reimplementing what has already been implemented in phpBB itself. Perhaps I could just call the generate_text_for_display() function from phpBB to render the HTML? Theoretically yes. Unfortunately, this function isn’t just a parser. It uses a number of global variables, such as $user and $cache. The $cache is used to access the forum configuration, and makes SQL queries. In result, what should be just a text parser, requires the full phpBB environment.

I could wire the exporter to phpBB, but I thought that it would make it dependent on a certain phpBB version. What I could do instead, is making a HTTP request to the live version of the forum, finding the right snippet of HTML and saving it.

I’ve tried it. This method was order of magnitude slower than in-process parsing. But on the positive side, it gave me the right results!



[Obsolete] The previous attempt, using wget

Left here for the record. Superseded by the above approach, using phpbb3-static.

I’m intentionally not trying to write the whole thing in a form of a script, even though it was tempting. I expect different phpBB installations to vary, and the chance that my script would work with somebody else’s forum is slim. So instead I’ll write up what I did step by step, and people can follow this howto and make alterations as they see fit.

Note: I’m using Apache and I’m quoting Apache specific configuration lines.

Mirroring the forum

I downloaded the database and the forum snapshot to a local computer to start a local instance. It’s a hassle but it makes things quicker. Once it was ready, I created a mirror on disk:

wget --mirror -k -p <Forum URL>

After downloading it turned out that I had 127 thousand files on disk, which takes up 5GB of space as shown by du -sh <directory>. I mean I’ve seen larger in my career, but I expected a smaller size from a generally text-based static forum archive.

I’ve put result of wget’s work on a test server to see how it works.

Question marks

During testing it turned out that the “?” in the URL is treated as a special character. For example, when the browser requests this:

GET /style.php?id=1 HTTP/1.1

…the WWW server is looking for a file on disk named style.php, fails to find it, and returns a HTTP 404 error.

HTTP 404: style.php not found

But in our case we want the server to serve the file named “style.php?id=1”!

$ ls -l style.php*
-rw-rw-r-- 1 maciej maciej 71445 Apr 24 15:58 style.php?id=1&lang=pl
-rw-rw-r-- 1 maciej maciej 71445 Apr 24 16:24 style.php?id=1&lang=pl&sid=2231c9b38ea28f9aa9e9bdd2a8452846

By the way, did you noticed the file with sid? Ugh. Anyway…

With help from StackOverflow I’ve found these magic lines that I added to .htaccess:

RewriteCond %{ENV:REDIRECT_STATUS} !200 
RewriteCond %{QUERY_STRING} !^$ 
RewriteRule ^(.*)$ %{REQUEST_URI}\%3F%{QUERY_STRING} [noescape,last,qsdiscard]

I don’t fully understand what it does, but it seems to work. As far as I could understand — when the query string is not empty (“?foo=bar” in the URL), the request is rewritten in such a way that we’re putting it together again using REQUEST_URI and QUERY_STRING, and we’re connecting them with “%3F” which is an urlencoded question mark. When this is done, Apache understands that we mean a “?” on disk, and not a url/query string combination. We also have to add “qsdiscard” to prevent Apache from appending the query string again onto the URL. In a way, Apache is trying to do the right thing: keeping the file part and the query string part of the URL meaningful and separate. But in this case we want to do something opposite: treat the “?” literally as a part of file name.

By the way, the solution I found on StackOverflow was slightly different and didn’t work for me verbatim.

Done-ish? Probably not

OK, so this is the rudimentary version of the archive. It has a number of disadvantages, but it meets the main criteria: we have static files and the content is there, you can browse it.

What are the problems?

  1. The login form and the search box are is still there, which is confusing for people, they will try to log in and wonder why it’s broken.
    Addressed below.
  2. A number of URLs won’t work. There is a number of reasons for this, one of them is the parameter ordering. The web server isn’t interpreting the query strings any more, so these two are different now:

    In the PHP world they were interpreted and became part of the URL parameter namespace regardless of the order, but now Apache is just looking for files on disk, and it just looks for files named exactly as specified in the URL. So some URLs that used to work, especially if somebody linked to your forum  from the outside, will not work.

    Not addressed as of 2016-05-05.

  3. URLs are ugly. I know that search engines can deal with this sort of stuff, and they can do things like filtering out the “sid” parameter from the URL. But still, I keep on thinking that the forum URLs should be more like:

    Not addressed as of 2016-05-05.

  4. No sitemap.Not addressed as of 2016-05-05.
  5. Not mobile friendly. This isn’t a problem with the archiving process per se, but it is a feature I would expect in a good archive.Not addressed as of 2016-05-05.

Login form and the search box.

The next thing I noticed is that there still is a login form in the HTML. It is confusing for people because there’s nothing indicating that there’s nothing to log into. I wanted to remove the form, but it was duplicated across 127 thousand files!

First I tested it on one file:

sed -i -e '/<div id="search-box">$/,+9d' viewtopic.php?…

And then ran across all files:

find . -name '*.php*' -exec sed -i -e '/<div id="search-box">$/,+9d' {} \;

This took a fair bit of time, but was successful. I actually don’t know how much because I went out for a small hike.

Let’s make it smaller

The reason why the forum occupies a large amount of disk space is that a small file still occupies a full block on disk, so there’s a sort of file count tax that you have to pay when storing files on disk. But there’s something that you can do. I realized that the forum archive is static, so I can use a read-only file system, and there are read-only file system which pack files efficiently. After a quick look around, SquashFS turned up as the best pick, with efficient file packing, compression, and support in the Linux kernel. The whole packed forum shrinked from 5G to 517MB. I mounted it using the loopback device on the web server (added it to /etc/fstab), and voila! Almost 10× reduction in size. My web server only has 20G of disk space, so saving 4.5G is significant.

Unresolved problems

At the time of writing there’s a number of problems I haven’t addressed in my forum archive. If I manage to, I’ll update this page with new information.

Where to listen to jazz in Dublin

Two excellent places are JJ Smyths and Sweeney’s, but they are the most known ones, and there are many more that are also interesting. Here’s a few which I know first-hand:

  • Weekly Sunday jazz brunch with Stella Bass – starts every Sunday at 2pm. Vocals, piano, double bass and drums. Mainstream jazz, they take requests, but they won’t play Led Zeppelin (a friend tried). You’ll have more luck asking for an Ella Fitzgerald, Nina Simone or Roberta Flack song. The venue has superb acoustics.
  • Louis Stewart plays first Wednesday of the month in The House restaurant in Howth. It’s a guitar + double bass duo, sometimes joined by a sax player. It’s the best jazz guitar I heard in Ireland.
  • The Essential Big Band – I saw them when they played in Bleu Note on Capel Street (photo). These days they play in the Grainger’s pub on Malahide road on Mondays.
  • Monday Jazz jam session in The Grand Social – starts at 9pm, runs until midnight. It’s an open stage jam, so anything can happen. Acoustics are so-so.
  • Hot House Big Band plays in Bad Bob’s in Temple Bar, admission €5

There’s a jazz night in the Bello Bar on Sundays, but I haven’t been there yet.

There’s also the Bray Jazz Festival 2014 coming up on the bank holiday weekend in May.

UPDATE 2016-08-16: Mercantile is in Bad Bob’s now, Sweeney’s closed, and the jam session is on Mondays.

HTTP PUT with multipart/form-data using pycurl

Let’s suppose you have a REST interface to talk to, and there’s a PUT request you want to make, sending data over using the multipart/form-data encoding (as opposed to application/x-www-form-urlencoded). If you’re using Python and pycurl, you’ll find out that if you try to combine setopt(pycurl.PUT, 1) with setopt(pycurl.HTTPPOST, [ (key1, val1), … ]), it doesn’t work. You could try to use setopt(pycurl.POSTFIELDS, “…”), but you’d have to handle encoding to multipart/form-data by hand, or use a third party library such as poster. But in any case it looks like more hassle than it should. The pycurl.HTTPPOST option can already do what’s needed, it’s just that it implies the POST method, while you want to use PUT.

A solution came to me when reading a thread on the curl-with-python mailing list. I knew I could already do what I needed using the command line utility, like this:

curl -X PUT -F 'fieldname=@filename.json' http://localhost:8000/

If you add an option like --libcurl foo.c to such call, you’ll get a C program which does what your command line invocation would do. This revealed, that “-X PUT” did not translate into setopt(pycurl.PUT, 1), but into setopt(pycurl.CUSTOMREQUEST, “PUT”). It might look like a subtle difference, but the latter does what I wanted, while the former doesn’t. A minimal working example would look like this:

import pycurl

c = pycurl.Curl()
c.setopt(pycurl.URL, "http://localhost:8000")
c.setopt(pycurl.HTTPPOST, [('foo', 'bar')])
c.setopt(pycurl.CUSTOMREQUEST, "PUT")

If you run “nc -l 8000” and run the above code, you’ll see:

PUT / HTTP/1.1
User-Agent: PycURL/7.26.0
Host: localhost:8000
Accept: */*
Content-Length: 141
Expect: 100-continue
Content-Type: multipart/form-data; boundary=----------------------------2def70e0b37a

Content-Disposition: form-data; name="foo"


…which is exactly what I wanted.

Merging from trunk to a branch

You created a branch in subversion, and while you were working on it, trunk progressed. You now want to include the trunk updates in your branch. What should you do? Maybe merge from trunk into your branch?

svn merge ${url}/trunk branches/mybranch

Nope! This isn’t it. Think about the simple case: branch out, edit the branch, merge back. What does ‘merge’ mean in this case? If I understand correctly, it means replaying on trunk all the changes you made to your branch.

What happens when you run the above command then? You replay all the changes you made to trunk, on top of your branch. Once that is done, what happens when you want to merge your branch back to trunk? One of the changes to be replayed is the merge you did, but it contains changes that have already been made on trunk, and the merge does not work.

How to do it properly then? What you probably meant to do, is to have your branch as if you started your branch-work on the newer trunk. Let’s first consider the simple case, where you branch out and then merge back.

svn cp ${url}/trunk ${url}/branches/mybranch
svn update
...editing your branch...
svn commit -m "edits to my branch"
svn merge ${url}/branches/mybranch trunk
svn commit -m "merging mybranch back to trunk"

That works. And it cannot really be more complex than that. Maybe if you’re a subversion whiz, but I’m not, so I like to stick to simple scenarios I can understand.

Let’s try to accommodate an updated trunk into the above workflow. It starts as usual:

svn cp ${url}/trunk ${url}/branches/mybranch
svn update
...editing your branch...
svn commit -m "edits to my branch"

So far so good. Let’s say there are some updates to trunk we want to see in our branch. You would think: “Why didn’t I start working on my branch later, I would have all the updates already in my branch!”. It turns out, you can do that! You can create an new branch from the new trunk, and then replay all the changes from your branch on top of it. The result? You still have your changes in a separate branch, and you have the updates to trunk too.

svn status
# Make sure this returns nothing ‒ your working copy is clean.
svn cp ${url}/trunk ${url}/branches/mybranch2
svn update
svn merge ${url}/branches/mybranch branches/mybranch2
# There is potential for code conflicts here, you need to resolve them.
svn commit -m "Replaying changes made to mybranch onto mybranch2."
svn rm ${url}/branches/mybranch
# Let's go to the original branch name.
svn mv ${url}/branches/mybranch2 ${url}/branches/mybranch
svn update

Your branch is now updated and looks as if you’ve started to work on it using the new trunk. You can use the regular merging procedure.

svn merge ${url}/branches/mybranch trunk
svn commit -m "merging mybranch back to trunk"

Your changes are now merged back to trunk.

Canon XM2 (DV) to DVD, on Linux

I wanted to transfer some material from DV cassettes to DVD. My main workstation is running Ubuntu 12.04, and I decided to use the tools that are available with the distribution. I tried multiple ways of doing each of the tasks, and git many dead ends, mainly due to crashing programs, bugs, or incompatible tools. For instance, tovid looked very promising until it turned out that it is not compatible with the new version of the ffmpeg utility. My source material was DV, recorded by Canon XM2, the video format was 768×576, interlaced (576i), with audio at 48kHz, PCM, stereo. Interlacing was giving me some headache, because the first attempts lead to unsightly stripey output. The camera outputs double-scan interlace, which should be interpreted as 50 frames per second with reduced resolution. Interlacing might be tricky

The first step is to capture the video from the camera. Connect the camera to the laptop, switch the camera to the playback mode, rewind the tape and:

dvgrab birthday-

The “birthday-” bit is a prefix that will be added to the saved .dv files. dvgrab will save multiple 1GB files, each file about 4 minutes long. Once the material is captured, you can merge the multiple files into one, by simply concatenating them:

cat birthday-001.dv birthday-002.dv birthday-003.dv > birthday.dv

Once you have one file with the complete material, fire off a player and note down (I used paper and pencil) the times of segments you want to extract. You won’t be able to do a lot of cutting that way, but if it’s a couple of segments, it shouldn’t be too labor intensive. Once you know what are the segments you want to extract, you can extract them and encode as .vob files. Suppose one fragment starts at 02:13 and is 135 seconds long:

avconv -i birthday.dv -target pal-dvd -flags +ilme+ildct -b:v 6000k -ss 02:13 -t 135 birthday-01.vob

The “+ilme+ildct” bit is responsible for correct handling of interlacing, because DV uses different field order than DVD. Repeat the above command for each segment, and you’ll get a list of VOB files. These VOB files are DVD compliant, and they are implementing the interlace correctly. They must not be re-encoded when transferred to DVD, otherwise the interlacing settings will be most likely lost. You can try if your interlacing settings are correct by watching the VOB file using VLC with automatic deinterlace detection:

vlc --deinterlace -1 --deinterlace-mode bob --play-and-exit birthday-01.vob

You should see no stripes during movement in the video, and the displayed frame rate should be 50fps (although the video frame rate is set to 25fps).

The next step is to create a DVD menu. There is a number of DVD authoring software. I had most success with DVD Styler. I also tried tovid, and Bombono.

In DVD Styler, I managed to create a DVD directory structure, but not an ISO image, and I was not able to burn a DVD directly from DVD Styler. Instead, I only generated the DVD structure on disk, and used k3b, using its DVD template. I created a new project, found the generated VIDEO_TS directory from DVD Styler, and added it to the project in k3b. This was enough to arrive at a working DVD.

DVD Styler would recognize that the files are already DVD compatible and did not attempt to re-encode them.

The above method is rather basic and crude, but gets the job done. There isn’t a video editor used at any stage; instead we just note down the times and then extract time regions using the -ss and -t options of avconv. I tried to use pitivi for video editing, but there were issues with rendered video, and since I didn’t really need any editing, I dropped pitivi from the workflow. The main problem to solve in pitivi would be to encode a DVD compliant VOB video file. You can select a DVD VOB as the output format, but there’s still a lot of things you can mess up, for instance accidentally encode audio in 44.1kHz instead of 48kHz, which results in a DVD disc with no audio.

I suspect that tovid will be reasonably soon adapted for use with the new ffmpeg tools (using /usr/bin/avconv instead of /usr/bin/ffmpeg), which will make it easier to script out the process if I had more of such (e.g. archival) DVDs to make.