Dreamhost 100MB memory limit

I’ve recently found a thread on Google Groups which mentioned a 100MB memory limit for FCGI processes on Dreamhost, which can be a reason for killing them by their process monitor.

One of the posts says:

Interestingly, this limit doesn’t apply to Ruby processes. When I asked them if this was an admission that Ruby on Rails has a sad deployment story, the response was “Ahem.. =)”

This could explain my trick with the “dispatch.fcgi” file name, assuming this is how their process monitors detect Ruby on Rails. Again, it’s only my guesses.

Dreamhost, kernel 2.6, FCGI and threads

My Dreamhost server got rebooted yesterday. After the reboot, I’ve noticed two things:

  1. Kernel 2.6
  2. My Django application down. I’m not sure if it was the 2.6 kernel that caused the problem. It could be a coincidence.

I’ve spent a whole day investigating the problem. It turned out that Python couldn’t make a new thread. Just like Grimboy, I’ve changed “threaded” method into “prefork” in dispatch.fcgi and got my site up and running.

FCGI is pretty difficult to debug, I must say. To get a debug message, I needed to run a Perl script, from which a Python script was called, with stderr redirected to a file.

API for full-text search in Django

Let me imagine a way I’d like to use a full-text search in Django. It would look like this:

class Person(models.Model):
….first_name = models.CharField(maxlength = 50)
….about = models.TextField()
….class TextSearch:

# This would return a QuerySet
people = Person.objects.search(“Miles Davis”)

That’s it.

The inner class “TextSearch” would take optional arguments like the list of fields to be indexed. All fields would be indexed by default.

I am aware of already existing projects which provide search capabilities to Django.

  • Mercurytide uses MySQL-specific functions, so it wouldn’t work for other database backends.
  • Merquery doesn’t seem to have a nice API. For example, a system path is needed to initialize an indexer.

Any other search engines out there for Django?

PHP on Dreamhost also suffers from 500

My idea for renaming the django.fcgi file into dispatch.fcgi fixed the “500 Internal Server Error” mostly, but not completely. Watching site stats and Google webmaster tools reports, I was seeing “500” errrors popping up every now and then. At first, I thought that there might be still some Django-related problem. My latest observations point out that it’s probably a general issue which concerns both Python and PHP. And possibly, Ruby as well.

My wild guess is that it’s got something to do with the server load. When a server is busy, some processes get killed. When fcgi doesn’t receive any data from a killed process, it returns 500.

So, the questions arises, should Dreamhost be removed from the list of Django-friendly hosts? Or shoud it be removed from the list of anything-friendly hosts? No, it shouldn’t, because it should definitely stay on the list of wallet-friendly hosts.

Killing phpBB softly

My Polish forum is powered by phpBB. Undoubtedly, it’s the most popular bulletin board package. It’s free (as in freedom), easy to install and it’s easy to use. Virtually every Internet user had some exposure to it. When starting a new forum, it’s a safe choice.

As the years were passing by and my forum was growing bigger, I started being somewhat dissatisfied with it. Smaller and bigger annoyances were biting me every now and then. I’d like to point out some of them.

  • Search. Its user interface is unnecessarily complicated. It yields unsatisfactory results. As a result, people don’t want to use it and tend to ask the same questions over and over again. A good forum engine needs a decent search. Look at Vanilla’s search, it’s so simple and functional! Although it doesn’t mean I wouldn’t like to simplify it even a little more
  • Uncomfortable add-ons installation. So-called mods are distributed as instructions on how to modify the code. You have to open files and edit them by hand. One missed dot, BANG! Your forum is down. Want to upgrade your modified phpBB? It’s very likely that you will have to install it from scratch and install all the mods again. That’s why my moderators still don’t have the “merge topics” mod back. (sorry! I’ll try to install it some time!)
  • Crufty URLs. Compare “/viewtopic.php?t=1234” with “/topics/1234/i-like-clean-urls/”
  • Google won’t index it. It’s a mystery. Perhaps Google recognizes phpBB and avoids it. phpBB has a nasty habit of “enriching” its URLs with things that are different each time, generating infinite number of addresses. Google can never know if it has got all the topics from the forum. No wonder it gets discouraged. This causes a major problem: if the forum is not indexed, it doesn’t come up in search results and there ain’t no people coming! I consider it the biggest problem with phpBB.

I could also complain about lack of several features, including tags, ranking, finding similar topics, etc. Many of them are available… as mods of course. Theoretically, I could fix three of above problems, but I once phpBB would require an upgrade, I’d have to edit all the source code again, by hand. It’s the main reason why I wasn’t adding much things to the forum.

I tried installing Vanilla. It’s brilliant, but once I launched a test installation, users who visited it, complained about everything they could. I tried to fix things they were mentioning, but there was one major and inevitable problem: Vanilla ain’t look like phpBB. For example, buttons are in different places. Users are so tied to the existing interface that they can’t stand a button moved from right to left. I gave up with Vanilla.

I considered writing my own forum engine, then started having doubts and finally gave up. It’s too much hassle. Loads of work, data migration, user complaints… I would have probably rewritten the whole thing if I were younger. I would work furiously for many weeks, then force users into the new version, take flame-war attacks on my chest… No, I don’t want to do that any more.

However, I’m still too young to just sit around. Having just a few hours time, I started playing around with Django, writing a model on top of the phpBB database. I was soon able to fiddle with forums, topics and posts using Django’s ORM. I created a read-only forum archive with clean URLs, an RSS feed and a Sitemap for Google. The forum sitemap consists of about three thousands URLs, where each URL is a starting point of a topic. Each topic can have several pages.

My models work directly on phpBB database tables without modifying them. phpBB itself doesn’t even “know” that someone else is reading its dear tables.

My forum users didn’t notice anything. They’re happily using the old phpBB forum. In the meantime, Googlebot is crawling the Django-powered forum archive with dogged persistence. I think it will soon include the archive in its index and start directing traffic to it.

I’ll keep on developing the Django-powered forum. I can do it slowly and on-line. I will add a nice search engine, posts ranking and all other stuff that will come to my mind. Thing is, I won’t be touching the original phpBB tables. If I ever need to extend some models, I’ll just use Django OneToOne mapping. Current phpBB users will be able to use their forum just as they were before. However, all the cool features will be appearing on the new, Django-powered forum. They might find it more useful and start using it instead of the PHP version. It doesn’t need to happen any time soon. I can take my time developing the features as I want them. If they don’t like it, they can always go back to the PHP version.

It will be all soft. There will be no data migration. No forced user interface change. I’m going to slowly attract phpBB users to the new, Django-powered forum interface.

I’ll put all the phpBB-related code in a separate package and once it’s mature enough, publish it. It won’t be necessarily a forum implementation. It will be a Django-phpBB integration layer that will allow Django programmers to develop their own ideas for their phpBB-powered forums.

I’ll be killing phpBB softly.

MySQL encoding problems on Dreamhost

I’m running phpBB, MediaWiki and WordPress on Dreamhost. All the applications use MySQL database. Once I imported the data into the database, I checked how it looks like in phpMyAdmin. I was a little concerned when I saw latin1_swedish_ci collation in all the text columns in all tables. I checked the applications, expecting to see wrong encoding displayed, but everything seemed fine.

I learned the truth later, when developing a Django application which sits on top of the existing phpBB tables. All the data in the tables was stored wrongly encoded, but since the encoding and decoding were symmetrically wrong, all the characters were displayed correctly. Unfortunately, the database content is stored wrongly.

The problem is, all the databases on Dreamhost are created with LATIN1 default encoding (LATIN1 and ISO-8859-1 are synonyms), and it’s impossible to create a database with, say, UTF-8 default encoding. As a result, all the connections to the database are in LATIN1 by default. It is possible to set the encoding to UTF-8, but applications don’t do that. Typically. Because Django does.

Django stores all the text correctly encoded, other applications ― wrongly. Everything is fine, unless Django reads data from other applications. All the accented characters are trashed. I’ve written a small wrapper function that could bring some of the text to the proper encoding:

def repair_encoding(s):
        return s.decode('utf-8').encode('latin1').decode('latin2').encode('utf-8')
        return s

What it does, is:

  1. Read the data (variable s) and consider it an UTF-8 encoded text, storing it as Unicode
  2. Encode the Unicode object in LATIN1
  3. Take the LATIN1-encoded text and consider it LATIN2, convert it to Unicode again
  4. Encode it in UTF-8
  5. If any of the above fails, just return the original data

Steps 1-4 can fail, especially step 2, where it can happen that the Unicode object contains characters that are not present in LATIN1.

This hack allows to read data from PHP applications, but I wanted to repair the wrongly encoded text, so all the database content is straightened out. I saw some tutorials which involved dumping and restoring the database. I didn’t want that because that would mean a considerable downtime. I wanted to fix that in place. I’ve finally figured it out. Here’s how to fix column colname in table tablename.

SET NAMES latin1;
ALTER TABLE tablename MODIFY COLUMN colname blob;

It should be ran against every TEXT column in the database. The same applies to the VARCHAR and CHAR columns.

After applying the script, all the data in the database is encoded correctly. The problem is that the PHP applications started displaying trashed text on-line. It was due the default LATIN1 encoding connection on Dreamhost. I fixed it by adding the below query just after the connection was established. Alternatively, it could be added before every query.


This line sets the connection encoding to UTF-8, so all the data is transmitted to the application in correct encoding.

If I knew how to set the default encoding to UTF-8, it wouldn’t be necessary. I’ve posted a question about it on Dreamhost forum. We’ll see if there will be any answer.

Django development server on Dreamhost

My Django application on Dreamhost is running fine since I did one simple trick. There is one problematic moment, though. Namely, new Python code to upload and application restart.

I usually develop my applications locally and submit the new code to a repository. To make the changes live, I check out the new code from the repository to my Dreamhost account and issue a command:

touch dispatch.fcgi

If my application still doesn’t restart, I push a little harder.

killall /usr/bin/python2.4

If I do just one restart, it goes fine. The problems start, when I do it few times in a row. In such a case, the old version of the application is already killed (killall’ed) but the new application isn’t alive yet and I get some 500 and incomplete headers error for a while. The application eventually starts up and works fine.

When the code is ready, one check-out and one restart is enough, but it’s sometimes necessary to make small changes on-site and frequently restart the application.

Frequent restarts are solved very nicely by the development server. It monitors all the Python files and restarts itself on every change detected. It would be perfect to run an development server on Dreamhost to make the necessary changes and then restart the application just once.

It’s easy to run the development server itself.

./manage.py runserver

The problem is, how to display what it serves, in a local browser?

EDIT: As Ryan Berg suggested, it’s enough to tell the development server to listen on an external interface:

./manage.py runserver http://www.mydomain.com:8001

The Dreamhost servers aren’t firewalled, which means that you can open your development site by pointing your web browser to “http://www.mydomain.com:8001”.

Please note: if you host more domains on Dreamhost, your development site will be available under all the domains you’re hosting, for example firstdomain.com:8001, seconddomain.com:8001, and so forth.

You can make frequent small changes and numerous development server restarts, and once You’re satisfied, do just one restart of the application served via FCGI.

Django on Dreamhost: incomplete headers

I’ve recently bought a hosting in Dreamhost. There were two reasons:

  1. It’s possible to run Django on it
  2. It’s cheap

It’s a shared hosting, where many sites are served from a single physical machine. Each machine probably serves as much sites as possible, where the hardware capacity is the limit. The Dreamhost servers are pretty busy. My server for instance:

[shasta]$ uptime
16:03:42 up 31 days, 13:59, 6 users, load average: 10.48, 9.74, 9.24

It’s not the processing power that is the bottleneck here, at least on “my” server. There’s usually about 40% of idle processor time. However, when I ran tail command, it would get killed every now and then. Strange. Perhaps there’s a “garbage process collector” running on the site, terminating non essential jobs here and there.

Well, I bought the hosting and moved my main site to Dreamhost. PHP software ― PhpBB and MediaWiki ― is working great. I decided to try running Django, so I developed a small application and installed it. I followed the instructions, voila, it worked. I was happy.

At least until the Django app would eventually stop responding. I clicked a link and the browser just waited for data. The data never came. I looked into the logs.

[Thu Nov 30 14:56:16 2006] [error] [client 83.xx.xxx.xx] FastCGI: comm with (dynamic) server “/home/automatthias/atopowe.pl/django.fcgi” aborted: (first read) idle timeout (120 sec)
[Thu Nov 30 14:56:16 2006] [error] [client 83.xx.xxx.xxx] FastCGI: incomplete headers (0 bytes) received from server “/home/automatthias/atopowe.pl/django.fcgi”

Something was wrong. Django isn’t officially supported on Dreamhost, so I couldn’t submit a complaint to the support. I searched the Web and found out that some guys had similar problems. Some other guys hadn’t. Dreamhost has many servers, and I figure it’s got something do to with the load of each server and… perhaps killing “non-essential” processes.

After some more research, I have found out that it’s not only Django users who’ve been experiencing that. There were also Rails users! You need to know, that Rails are officially supported on Dreamhost. I got interested and read on. Dreamhost support responded to the affected Rails user:

Check with our support team and ask if our process monitor has been killing your processes. If you have a lot of processes hanging around that may be the case. We recently updated our process monitor to specifically handle dispatch.fcgi processes specially so that is probably not the problem but it’s worth asking.

“specifically handle dispatch.fcgi processes”? Aha!

Inspecting my processes with “ps -ef” revealed that there were several “django.fcgi” processes running. Some of them were zombie (defunct). If the “dispatch.fcgi” processes are specifically handled, why don’t I pretend to run them? So I’ve changed my setup a little bit: I renamed my “django.fcgi” file to “dispatch.fcgi” and altered two lines in the “.htaccess” file, so they would refer to the new name:

RewriteRule ^(dispatch\.fcgi/.*)$ – [L]
RewriteRule ^(.*)$ dispatch.fcgi/$1 [L]

Guess what?

No timeouts, no 500s, no incomplete headers. It works like a charm.