Slugify in a shell script

When constructing nice file names or URLs, it’s often nice to “slugify” a string, so it has a form of alphanumerics separated by dashes. For instance, you may have a string like this:

Linux clover 2.6.19-gentoo-r5 i686 Genuine Intel(R) CPU T2050 @ 1.60GHz

It has uppercase and lowercase letters, digits, brackets… you need to remove all but alphanumerics while retaining readability. Basically, you may want for instance:

linux-clover-2-6-19-gentoo-r5-i686-genuine-intel-r-cpu-t2050-1-60ghz

If you append “.html” to it, it makes a very nice URL, doesn’t it?

Here’s a part of a pipe chain that slugifies strings:

sed -e 's/[^[:alnum:]]/-/g' | tr -s '-' | tr A-Z a-z

If you have a shell script and you want to slugify variable content, you can:

SLUGIFIED="$(echo -n "${VARIABLE}" | sed -e 's/[^[:alnum:]]/-/g' \
| tr -s '-' | tr A-Z a-z)"

Note that wordpress likes to mess up quotes. They are meant to be plain, double ones.

Author: automatthias

You won't believe what a skeptic I am.

2 thoughts on “Slugify in a shell script”

  1. Fiddled a lot with my locale variables, but couldn’t get neither coreutils’ own tr nor perl’s uc (and other) to correctly lowercase a string with polish diacritics. However, tcl’s puts [string tolower {STRING}] worked just right out of the box. Guess their legendary unicode support is a serious claim.
    And while I’m writing this just thought of perl’s Text::Unaccent… Let’s see:
    echo ‘ZAŻÓŁĆ’ | perl -MText::Unaccent -ne ‘print(lc(unac_string(“utf-8”, “$_”)))’
    …seems to work just right.

    This requires installing the Text::Unaccent CPAN module directly from CPAN or via your package manager. Either way, this solution will most probably not work with a basic Perl installation from a default OS install.

Comments are closed.