posts | images | bookmarks

forced hiatus

By anders pearson 23 Jul 2007

For once, I have a good excuse for not posting anything here lately. Usually I don’t post for months at a time and it’s just out of laziness and a lack of anything interesting to write about. I’m not saying that wasn’t also the case this time, but this time I can at least pretend it was something else preventing me from posting.

Many of you noticed that thraxil.org was offline for the last month or so.

The root cause was that registerfly.com, the registrar that I’d registered my domains through, lost their ICANN accreditation for generally being sleazy. They transferred the domains in their control to a couple other registrars. Last time I renewed thraxil.org, I paid for a couple years at once to minimize the chances that it would expire on me when I wasn’t paying attention. Somehow, that got lost when they transfered it and the domain expired in June, a year before I was expecting it to.

Normally, a domain expiring isn’t that big a deal. ICANN gives you a 40 day grace period where the original owner is allowed to renew their domain without penalty and domain squatters can’t get it. This generally works pretty well to eliminate the really predatory domain squatting. However, you can only renew the domain through the original registrar. This obviously was a problem for me. I couldn’t renew it through registerfly since they were no longer accredited. Registerfly couldn’t give me the transfer authorization codes that I needed to transfer the domain to a different registrar because the domain was expired. As far as I can tell, there just isn’t a clear procedure in place for how to deal with the situation of an expired domain on a defunct registrar.

I won’t even bother going further into what I had to do to get everything back up. Obviously it involved a lot of bureaucratic nonsense that made me want to embrace violence as a problem solving technique. There was also a round of me having to prove my identity via several easily forged tokens (screenshots of my account page? emailing a scan of my photo id? ) that didn’t exactly make me feel secure with the general process.

Anyway, we’re back online now so I have no more excuses.

book

By anders pearson 11 Jan 2007

I’ve been drawing and painting a lot in the last couple years. Obsessively, perhaps.

A few months ago, I was looking at the shear volume of work I’d produced and realized that it was actually enough to fill a book. That idea kind of stuck with me and I became fascinated with the concept of having my own book. As an unknown, there was no way any publishing companies out there would bother with me, and I didn’t really relish the thought of having to deal with a publisher anyway. So I did some research and found that technology has recently made a print on demand model actually pretty feasible.

Lulu.com looked like the best of what’s out there so I ordered a couple of random books from their site to check the quality and I was pretty impressed. The prices are a little high, but considering that not too long ago it really wouldn’t have been financially possible to publish a book without doing a print run of hundreds of copies, it’s not too bad.

I don’t really know jack about anything print related so I roped in some friends to help me photograph the paintings that were too big to scan, lay everything out and put it into a pdf with the right settings, and design a professional looking cover. They came through for me and put together a complete package that makes it look like I’m actually a real artist or something. Neat.

Anyway, it’s now available. You can order a copy directly through lulu.com’s online store.

The 158 page color printed 9x7 inch paperback version is $28.23. That’s actually just the base manufacturing cost that lulu charges. I make no profit whatsoever on the book. This is purely a labor of love for me, not a money making venture. There’s also a free PDF download available if you either want to preview it or if you have access to your own book printing equipment and want to print it yourself.

The license on the book according to the site is the Creative Commons Attribution License 2.5 but that’s only because Lulu.com’s publishing interface didn’t include the option for the Public Domain which is what I prefer. In other words, do whatever the hell you want with the book. I don’t care. I’m just happy that it exists. If you like it, you can buy me a beer some day.

Again, thanks to Catherine, Marc, and Jessica for helping me out on this, one of my more involved crazy schemes.

triple plugged

By anders pearson 15 Nov 2006

For the third year in a row, MFR has been nominated for a PLUG Award. This year, they created a new “Best Music Blog” category separate from the “Best Music Website” category. That means that our little operation consisting of a couple college friends with a Movable Type install and a bit of free time isn’t competing directly against sites like Pitchfork or MySpace who have, like, budgets and employees and such.

We’re always happy just to get nominated. This year maybe we might even win. At the very least, it’ll be a pleasant change to lose to someone cool and deserving like brooklynvegan, MOKB, or stereogum instead of the mega-corporate MySpace like we did last year.

TurboGears Deployment with supervisord and workingenv.py

By anders pearson 13 Sep 2006

As my web development involves more of the microapps stuff, and the applications I’m building become clusters of larger and larger numbers of small services, the deployment challenges become more important to solve. This post is an explanation of my current best practices at the time of writing.

Most of what I build these days is built on the TurboGears Python web framework and runs on Gentoo Linux. I decompose applications into as many small services as possible. As a result, on our production server at work currently has dozens of TG applications running on it. We serve Perl, Java, and PHP apps too and they each have their own deployment challenges, but for now I’m going to focus on the Python stuff.

I started with TurboGears when it was at version 0.8 or so and have basically kept up with the current version (it’s at 1.0b as I’m writing this). The TG APIs haven’t quite stood still across those versions, so the apps written for 0.8 won’t run directly with 0.9 or 1.0. They still work fine though and I’d rather not spend my entire life continuously porting all my applications to the current version of the TG API. So one of the most important aspects of my deployment strategy is a way to have multiple, mutually incompatible versions of TG installed side by side on the same machine without interfering with each other.

It took me a while, but after being bitten too many times by upgrades to one library breaking other applications on the same machine, I eventually came to agree with Ian Bicking’s argument against site-packages (and I’ve been bitten by this in almost every language/environment I’ve ever worked in; it’s not just a Python issue).

Luckily, Ian actually did something about the problem and wrote workingenv.py which is a handy script that sets up an isolated Python environment. Workingenv.py combined with setuptools has been a maintenance dream come true. I now setup a working-env for every single application I write and easy-install TG and any other libraries that are needed for that application into it. I now have complete control over what versions of what libraries are in use by each application (with workingenv.py’s “–requirements” flag) and they can all peacefully coexist on the same machine. The cost is a bit of disk space and an extra shell command here and there to set up and activate the environments, but it saves so many upgrade related headaches that it’s an easy win.

I used to deploy all my TG apps with Apache and mod_python using the mpcp bridge. Actually, many are still deployed that way, but I’m moving away from it. Embedding python in an apache process works really well for some things, but it also made isolating environments with workingenv.py significantly more complicated (I do have a way of doing it, but it’s a messy hack that I’d rather not include in a post on “best practices”). Mod_python deployment also has memory usage issues the more apps you pile into it, and having them all tied to the same Apache process means that to restart or reconfigure one, the apache daemon has to be restarted so all the other apps tied to it get restarted. As some of them become more critical to our operations, the couple seconds of downtime each time apache gets restarted becomes more of a problem.

So I’ve now switched to the approach that the TG community seems to have basically agreed is best, which is running TG apps standalone and proxying to them with apache or lighttpd (we use apache at work and I use lighttpd for this site). The main drawback to this approach is that it means there are a lot more long-running server processes that have to be kept running. So that’s more start/stop scripts, monitoring, and logs for the sysadmin side of me to have to deal with. It also means more chance that if one of the apps manages to crash, someone has to notice and restart it.

Supervisord is the best solution to this problem that I’ve found. Supervisord is a daemon that starts your services as child processes and watches them, restarting them automatically if they die. It handles things like dependencies, monitoring, and is smart enough to know how to back off on restarts if the process goes crazy. Titus Brown’s introductory article is a good place to start to learn more about it.

Here’s a more concrete example of how all of these pieces fit together.

The first step in setting up a new application is to create a working-env for it. I like to include a requirements.txt that lists the exact versions of every library that I want to use for the application. The example ones that are linked to in Ian Bicking’s post are a good starting point.

$ python ~/bin/workingenv.py working-env --requirements requirements.txt

Then, to do development on the application, you activate the environment and start the server:

$ source working-env/bin/activate
$ ./start-foo.py

For production, we’ll need a single script that the supervisord can execute that starts the server in production mode. It looks something like:

:::bash
#!/bin/bash
#  this file is: start.sh
cd $1
source working-env/bin/activate
./start-foo.py $2.cfg &
echo $! > /tmp/foo.pid
wait $!

and the corresponding config section in supervisord.conf is

:::apacheconf
<program foo>
 command pidproxy /tmp/foo.pid /path/to/foo/start.sh /path/to/foo/ prod
 auto-start true
 auto-restart true
 logfile /var/log/foo-supervisor.log
 user pusher
</program>

Those require a little bit of explanation. start.sh takes a path to the project’s directory and the name of a config file as its arguments ($1 and $2) to be flexible for different deployment situations. Since start.sh spawns the python process seperately, if we just did:

:::bash
#!/bin/bash
cd $1
source working-env/bin/activate
./start-foo.py

Supervisord wouldn’t quite work right. It would be sending signals to the bash process instead of the python process and stopping and restarting the service wouldn’t work right. So instead, start.sh starts the python process in the background, writes its pid out to a file and then waits for the python process. The pidproxy program (which comes with supervisord for just such a purpose) then is used to run start.sh and send signals directly to the python process.

It took some work to figure all this out, but once you’ve done it for one or two apps, it becomes pretty quick and easy to set up any application for deployment like this. Probably my next step will be to create a paster template for tg-admin so these scripts get automatically put into quickstarted projects instead of having to copy them in manually.

I should also mention that there are a few more components of our deployment strategy that I didn’t really talk about. The first is version control. If you’re doing development and not using version control of some sort, you are insane and deserve whatever horrible fate befalls you. Second, we have a single application that handles deployment to production. You press one button and it checks out your app to a staging environment, runs the unit tests, then rsyncs it to the production server and runs any additional steps that are needed there (in the above case, it would do a ‘supervisorctl restart foo’ on the production server to get the new code running). It also logs every step, tags releases in svn, and allows for easily rolling back production to a previous release in case something goes wrong. Our pusher/deployment system is very particular to our environment, so I won’t explain it in too much detail here, but I highly recommend spending the time to set up something similar for your situation. At the very least, you’ll want a shell script that does the deployment process in a single step.

Erlang Challenge: Level 4

By anders pearson 01 Sep 2006

(Read [[Erlang Challenge: Level 1]], [[Erlang Challenge: Level 2]], [[Erlang Challenge: Level 2 (2)]], and [[Erlang Challenge: Level 3]] before reading this post or it won’t make sense)

Level 4 of the challenge gets a little meatier. The challenge page has a url that ends like ‘linkedlist.php?nothing=12345’ and the body of the page is just the text “the next nothing is 098093”. You put that number into the url and it gives you another page with the same text but a different number. So the challenge is to write a script that does an HTTP GET request, parses the next “nothing” out of the body of the response, constructs the new url and repeats until it reaches the end of the chain.

Parsing the number out of the text shouldn’t be any harder than the previous challenge, so first we just tackle that in its own function:

:::erlang
extract_nothing(Body) ->
    case regexp:first_match(Body,"the next nothing is [0-9]+") of
        {match,Start,Length} ->
            string:substr(Body,Start + 20,Length - 20);
        nomatch ->
            io:format("no match found in body text: ~s~n",[Body])
    end.

Again, we’re just using the regexp library to look for a match and then string:substr/3 to pull it out. This time there’s a little bit of error checking though in the form of the case statement. regexp:first_match/2 can return either ‘{match,Start,Length}’ or ‘nomatch’ when it doesn’t find anything. The case statement just lets us easily switch on the patterns and print out a message with the full body text when it doesn’t find a match. That condition would presumably mark the end of the chain that we’re following.

Now, the part of this challenge that really was useful for me was that the script has to make GET requests. I consider making HTTP requests to be a fundamental operation these days. The “microapp” architecture that I’ve been pushing lately involves exposing as much as possible over a REST/HTTP interface. Once I can make HTTP requests in Erlang, it means that I now have available all the functionality that I’ve previously exposed as a microapp. That means easy shared storage, tagging, threaded comments, full-text search, image processing, and a number of other special purpose microapps that I and other people have built. It instantly makes it more feasable for me to start building real web applications in Erlang.

It took me a while with google to find it, but Erlang does include a pretty decent HTTP client module. It’s pretty similar to python’s urllib or httplib2 libraries. The only difference is that it actually does the requests in a seperate Erlang process to allow for nice asyncronous requests. This actually seems to be typical of how Erlang interacts with any outside system. There’s always a special process that gets spawned that acts as a proxy. So your code just interacts with it like a regular Erlang process. That’s an important concept to understand, but for the purpose of just making a regular synchronous GET request, it basically just means that we have to call the:

:::erlang
application:start(inets).

function once to spawn the process before we can make any HTTP requests. Other than that, it’s just regular function calls in the http module.

It didn’t take long after finding the documentation to make a function which takes one “nothing” fetches the page, extracts the next one from the body text and returns it:

:::erlang
next_nothing(Nothing) ->
    Url = "http://www.pythonchallenge.com/.../linkedlist.php?nothing=" ++ Nothing,
    {ok, {{_, 200, _}, _, Body}} = http:request(get, {Url, []}, [], []),
    extract_nothing(Body).

It constructs the url, makes the request, uses pattern matching and liberal use of the ‘_’ “don’t care” variable to pull the body text out of the response tuple, and then uses the previous extract_nothing/1 function to parse it and get the next number.

Reading the full documentation for the http module I realized that I could simplify the http:request line to just:

:::erlang
{ok, {{_, 200, _}, _, Body}} = http:request(Url),

since a GET request with no extra stuff is just the default.

Error handling could be added with another case statement that just caught anything that didn’t come back with ‘ok’ as the first element of the tuple and printed a message or something. I’m pretty much expecting to not encounter errors here though so we’ll skip it.

All that’s left now is to wrap this in a “loop” now to have it repeat until it hits a page that it can’t parse a “nothing” out of:

:::erlang
all_nothings(Nothing,Count) when Count > 300 ->
    io:format("finished on: ~s~n",[Nothing]);
all_nothings(Nothing,Count) ->
    io:format("getting nothing: ~s~n",[Nothing]),
    NewNothing = next_nothing(Nothing),
    all_nothings(NewNothing,Count + 1).

all_nothings(Nothing) -> all_nothings(Nothing,0).

A hint on the challenge page says that it shouldn’t take more than 300 steps to get to the end, so I put in a counter to keep track of how many requests we’d made and put a guard on the first clause to stop it at 300. (Looking at the discussion of the challenge afterwards, I learned that this hint was there because there was a trick in the challenge. If you didn’t match on the whole phrase and just pulled out the first number from the page, one of them sends you off on a different chain that turns into a loop and will go on forever.)

The second clause does the main work, printing out the current “nothing”, getting the next via next_nothing/1 and then incrementing the counter and recursing. all_nothings/1 is just a driver function added for convenience that starts the whole thing with a counter of 0.

This level of the challenge has one or two other little snags that make it tricky, but I won’t mention them since they didn’t involve any programming to get around.

I’m happy to see that making HTTP requests in Erlang is about as painless as I’m used to with other languages. That really opens it up for me as a legitimate option for my own work.

The only mildly disappointing thing I discovered with this level (and it sort of showed up earlier, too) is that google doesn’t seem to have done a very good job of indexing the Erlang documentation that’s out there. Once I got a little comfortable with the language, I’ve found the terse format of the documentation to be perfectly adequate, but it’s a little scattered so you have to know where to look for something. Googling something like “erlang http module” doesn’t really turn up anything useful. Instead, I had to browse around the documentation site to find an index of the modules and scan the list looking for one that would be appropriate. Now that I know that that’s what I have to do, it’s not to bad, but it would be nice if Erlang docs were a little more googlable.

Oh, I should also mention that Level 5 of the challenge does indeed involve writing python code. Specifically, you have to reconstruct a pickled python object, so it isn’t really doable in Erlang. I’ll do that one in python so the next entry in this series will be Level 6. It may be a while before I get to it though since I won’t have internet access in my new apartment until later next week.

Erlang Challenge: Level 3

By anders pearson 24 Aug 2006

(Read [[Erlang Challenge: Level 1]], [[Erlang Challenge: Level 2]], and [[Erlang Challenge: Level 2 (2)]] before reading this post or it won’t make sense)

Level 3 involves regular expressions. I was a serious Perl programmer for a few years, so this shouldn’t take too long.

The problem presented is similar to the last level. It’s looking for needles in a 100K haystack of text that looks like:

kAewtloYgcFQaJNhHVGxXDiQmzjfcpYbzxlWrVcqsmUbCunkfxZWDZjUZMiGqhRRiUvGmYmvnJIHEmbT
MUKLECKdCthezSYBpIElRnZugFAxDRtQPpyeCBgBfaRVvvguRXLvkAdLOeCKxsDUvBBCwdpMMWmuELeG
ENihrpCLhujoBqPRDPvfzcwadMMMbkmkzCCzoTPfbRlzBqMblmxTxNniNoCufprWXxgHZpldkoLCrHJq
vYuyJFCZtqXLhWiYzOXeglkzhVJIWmeUySGuFVmLTCyMshQtvZpPwuIbOHNoBauwvuJYCmqznOBgByPw
TDQheAbsaMLjTmAOKmNsLziVMenFxQdATQIjItwtyCHyeMwQTNxbbLXWZnGmDqHhXnLHfEyvzxMhSXzd

The needle we’re looking for is a single lowercase letter with exactly three uppercase letters on either side.

My approach is basically going to boil down to: 1) save the data in a text file again 2) read it into an erlang string 3) use erlang’s regexp module to look for the pattern

(?<![A-Z])[A-Z]{,3}[a-z][A-Z]{,3})(?![A-Z]) 

That regexp is pretty gnarly, so let me break it down for anyone who hasn’t been immersed in Perl or a similarly regexp obsessed language. In the middle is “[a-z]”, which matches the lowercase letter we’re looking for. On either side of it is “[A-Z]{,3}” which matches exactly three uppercase letters. At the outside are “(?<![A-Z])” and “(?![A-Z])” which are zero-width negative lookbehind and lookahead (respectively) assertions. They are there to make sure that it doesn’t match cases with 4 or more uppercase letters since the instructions said “exactly 3”. The even funkier than normal syntax on them is so they’ll match the characters but those characters won’t get included in the results (hence “zero-width”).

Checking out Erlang’s regexp module, a problem appears. Erlang’s regexp support simply isn’t up to the same level as Perl or Python’s. I was checking because zero-width assertions are fancy enough that I was worried it wouldn’t support them. Naturally it doesn’t. But it also appears that it doesn’t support curly braces for specifying exact numbers of matches. That’s unfortunate, but ultimately both of those are just icing on a regexp engine and we can easily rewrite the pattern without them:

[^A-Z][A-Z][A-Z][A-Z][a-z][A-Z][A-Z][A-Z][^A-Z]

The catch is that without zero-width assertions, we’ll get back an extra character on either side of the interesting bits which we’ll have to strip out with Erlang code.

Otherwise, the regexp module is pretty straightforward. There’s a regexp:first_match/2 function which takes a string and a regexp and returns a tuple of ‘match’ or ‘nomatch’ and the start position and length of the first match in the string that it finds. Those we can pass to string:substr/3 to pull out the stuff we’re interested in:

:::erlang
findit(String) ->
    Pattern = "[^A-Z][A-Z][A-Z][A-Z][a-z][A-Z][A-Z][A-Z][^A-Z]",
    {match,Start,Length} = regexp:first_match(String,Pattern),
    string:substr(String,Start + 1,Length - 2).

The ‘Start + 1’ and ‘Length - 2’ are to strip off the extra characters on either end of it that we don’t care about. And again, in the interest of expediency, we’re ignoring the possible ‘nomatch’ condition since we expect our data file to have a match.

We can reuse the read_data/1 function from last time to read in the data (stored in “l3.txt” this time), but instead of copying and pasting and changing the filename, we might as well refactor it so the filename can be passed in as a parameter:

:::erlang
slurp(Filename) ->
    {ok,IoDevice} = file:open(Filename,[read]),
    {ok,Data} = file:read(IoDevice,10000000),
    Data.

Then, we can do:

:::erlang
findit(slurp("l3.txt")).

And it finds what we’re looking for.

Kind of. When I try using it as the key to get to the next level, I notice that I didn’t read the problem closely enough and there should be multiple matches in the data and we need all of them.

So regexp:find_first/2 isn’t going to cut it. Instead, we can swap it out with regexp:matches/2 which returns all the matches as a list of {Start,Length} tuples. It’s a quick change to just put a list comprehension in to handle the list of results instead of a single result:

:::erlang
findit(String) ->
    Pattern = "[^A-Z][A-Z][A-Z][A-Z][a-z][A-Z][A-Z][A-Z][^A-Z]",
    {match,Matches} = regexp:matches(String,Pattern),
    [string:substr(String,Start + 1,Length - 2) || {Start,Length} <- Matches].

And we’re on to Level 4.

I was tempted to see if I could use Erlang’s gen_fsm behavior to write a small finite state machine to find the matches instead of using regexps (since a regexp is essentially just a convenient syntax for defining a certain kind of FSM) but it’s getting late and I should really get back to my packing.

Erlang Challenge: Level 2 (2)

By anders pearson 23 Aug 2006

(read [[Erlang Challenge: Level 1]] and [[Erlang Challenge: Level 2]] before reading this one)

First off, I noticed that the challenge levels actually started numbering at 0 so this third level is really Level 2. So I’m fixing the numbering of my posts from here on out to keep it matched with the site.

Level 2’s challenge gives you a huge block of text (hidden in the source of the page) that’s almost entirely punctuation. It looks sort of like this:

:::text
%%$@_$^__#)^)&!_+]!*@&^}@[@%]()%+$&[(_@%+%$*^@$^!+]!&_#)_*}{}}!}_]$[%}@[{_@#_^{*
@##&{#&{&)*%(]{{([*}@[@&]+!!*{)!}{%+{))])[!^})+)$]#{*+^((@^@}$[**$&^{$!@#$%)!@(&
+^!{%_$&@^!}$_${)$_#)!({@!)(^}!*^&!$%_&&}&_#&@{)]{+)%*{&*%*&@%$+]!*__(#!*){%&@++
!_)^$&&%#+)}!@!)&^}**#!_$([$!$}#*^}$+&#[{*{}{((#$]{[$[$$()_#}!@}^@_&%^*!){*^^_$^
]@}#%[%!^[^_})+@&}{@*!(@$%$^)}[_!}(*}#}#___}!](@_{{(*#%!%%+*)^+#%}$+_]#}%!**#!^_
)@)$%%^{_%!@(&{!}$_$[)*!^&{}*#{!)@})!*{^&[&$#@)*@#@_@^_#*!@_#})+[^&!@*}^){%%{&#@

and basically goes on for about 100K. The hint makes it clear that there are a couple letters hidden in there and you need to write a program to pull them out.

I could solve this level in about 3 seconds with grep or slightly longer with a nice regexp in emacs. But as usual, we’ll stick with the tools that Erlang gives us since that’s the whole point of this exercise.

First, I copy and paste the the data out into a file called ‘level2_data.txt’. I suppose that the Erlang program could be written to fetch the data directly off the web and parse it out, but that’s a little more advanced than I’d like to get for this early in the challenge. Just figuring out how to read data in from a file will be difficult enough for me.

Once the data’s read in, the problem becomes just filtering non-alphabetical characters out of a string. In Python, I would probably use a list comprehension again or just the built in filter() function. Lo and behold, Erlang has a lists:filter/2 function that works like I expect it to. With either approach though, a helper function will be needed to discriminate between letters and everything else:

:::erlang
is_alpha(C) when C >= $a, C =< $z -> true;
is_alpha(C) when C >= $A, C =< $Z -> true;
is_alpha(_) -> false.

If you followed the last entry, this one should be pretty understandable. Simple pattern matching with guards. The only thing new is the ‘_’ variable, which is just a convention in most functional languages for a variable whose value you really don’t care about. So the last line is just there to catch any character that didn’t match one of the first two conditions.

lists:filter/2 works similarly to lists:map/2 which I explained already. Where map applies a function to every element in a list and returns a list of the results, filter applies a function to every element but only returns the elements for which the function returns true. In other words, it’s like grep inside a programming language.

Again, I seem to need an anonymous function to make it work right, but it’s still pretty simple:

:::erlang
filter_strip(Msg) -> lists:filter(fun (X) -> is_alpha(X) end, Msg).

I test this from the shell on short strings I type in and it seems to work nicely, so now we just need to pull the real data out of the file and into an Erlang string that we can pass to the filter_strip/1 function.

Another trip through the library documentation and I find the file module which has some familiar sounding functions like open/2 and read/2.

Doing file i/o is the kind of thing that tends to have lots of potential error conditions that you need to handle if you want your code to be robust in a real environment. Here though, we’re opening and reading in one file that we know ahead of time will exist, will have the proper permissions, etc. So I’m not going to worry about all those error conditions and just write some fast, brittle code:

:::erlang
get_data() ->
{ok,IoDevice} = file:open("level2_data.txt",[read]),
{ok,Data} = file:read(IoDevice,10000000),
Data.

file:open/2 takes a filename and a list of flags. In our case, we just want to read data from the file so the only flag we give it is ‘read’. It returns a tuple consisting of ‘ok’ (it would be ‘error’ if something went wrong but we’re ignoring that possibility for now) and an IoDevice which represents the open file. file:read/2 takes an IoDevice and the number of bytes to read in. Normally, you’d put this in a loop and read in the data in small chunks until you reach the end of the file. Again, since we’re only dealing with one file and we know it’s size and we’re not trying to do anything very robust or efficient, we can get away with just specifying a big number of bytes so it will just slurp in the entire file in one gulp. It also returns a tuple with ‘ok’ (or ‘error’) and Data, which is a string of data. That’s exactly what we’re looking for, so the function just returns that.

Now, just calling:

:::erlang
filter_strip(get_data()).

gives us the letters stripped out of the file and the level is solved.

I mentioned list comprehensions, so here’s the list comprehension equivalent:

:::erlang
lc_strip(Msg) -> [X || X <- Msg, is_alpha(X)].

I really do love list comprehensions :)

Erlang Challenge: Level 2

By anders pearson 21 Aug 2006

(Read [[Erlang Challenge: Level 1]] first or this post won’t make much sense.)

Level 2 gives you a ciphertext message and a key that basically tells you that it’s a Caesar cipher with a shift of 2. The message looks roughly like this (I’m just going to include the beginning so as not to give it away completely): “g fmnc wms bgblr rpylqjyrc gr zw fylb.”

The hints on the page make it pretty clear that they expect you to solve it with python’s string.maketrans() and string.translate() functions, which do indeed make short work of the problem.

Erlang doesn’t have direct equivalents to those functions so we either have to find another way, or write our own.

Since we’re dealing with text now instead of math, it’s important to understand how Erlang deals with strings. Technically, Erlang doesn’t really have strings. It just has lists of numbers and double quotes are just syntactic sugar. So “hello” is really the same thing as [$h,$e,$l,$l,$o] or [104,101,108,108,111].

It’s kind of similar to how strings in C are really just character arrays and characters are really just 8-bit integers. But Erlang’s are a linked list of 8-byte numbers (yes, byte, not bit). Clearly some string operations in Erlang aren’t going to be as fast or space efficient as in C (but Erlang has libraries for dealing with byte arrays if you really want to go there), but it does let you manipulate strings as easily as any other list and lists are pretty much the core data structure of functional languages so that’s important. It also means that Erlang is sort of unicode compliant from the ground up. At least, it can store and manipulate utf8/16/32 strings without any extra care. Apparently Erlang’s libraries don’t always do the right thing with unicode though, so you do still need to be careful.

Since Erlang strings are just lists of numbers, shifting each letter of a string up two places is roughly just adding two to each number (but not exactly, as we’ll see). Time to write a function. My first attempt:

:::erlang
-module(pychallenge2).
-export([plus2/1]).

plus2 ([H|T]) -> [H + 2 | plus2(T)];
plus2 ([])    -> [].

Ah, pattern matching, how I’ve missed you.

If you’ve ever worked in a pattern matching functional language, the code above should be pretty obvious. The first two lines just declare the module that the function’s in and export the single function that’s defined in it. Then the two lines starting with ‘plus2’ define our function. The first of those lines does most of the work, matching a list argument, using ‘|’ to pull apart the head and tail of the list, then returning a list consisting of the head incremented by 2 concatenated with a plus2 called on the tail recursively. The second line just defines the base case so it terminates when it hits the end of the list.

Then we go to the shell and do:

:::erlang
> c(pychallenge2).
> pychallenge2:plus2("g fmnc wms bgblr rpylqjyrc gr zw fylb.").
"i\"hope\"you\"didnt\"tr{nsl{te\"it\"|y\"h{nd0"

This actually is good enough to get the answer and proceed to the next level, but obviously not perfect. The Erlang code works fine, but simply adding two to each letter isn’t really enough. The two problems are that we really only want it to add two to alphabetic characters; adding two to the spaces gives us quotes and adding two to the period at the end gives us ‘0’. The other issue is that the cipher really needs to wrap around at the end of the alphabet, so ‘y’ turns into ‘a’, and ‘z’ turns into ‘b’. Above you can see that ‘y’ turned into ‘{‘ because that’s how the ASCII character set is arranged.

To make our code a little smarter, We’ll define a helper function:

:::erlang
 trans (C) when C >= $a, C =< $x -> C + 2;
 trans (C) when C >= $y, C =< $z -> C - 24;
 trans (C) -> C.

trans/1 just shifts a single character. It makes use of Erlang’s guards to only shift the letters. The first line handles the case where the letter is between ‘a’ and ‘x’, just returning the letter plus 2. The second line handles the case of ‘y’ or ‘z’ and subtracts 24 to map it back down to the beginning of the alphabet. Those two probably could have been combined with a clever use of modular arithmetic, but I think this is nice and clear. The third line just returns the original letter and handles the default situation when neither of the above conditions match (ie, for punctuation, etc.)

Now our plus2/1 function (which isn’t entirely accurately named anymore) becomes:

:::erlang
 plus2 ([H|T]) -> [trans(H) | plus2(T)];
 plus2 ([]) -> [].

And we can go back to the shell and do:

:::erlang
 > c(pychallenge2).
 {ok,pychallenge2}.
 > pychallenge2:plus2("g fmnc wms bgblr rpylqjyrc gr zw fylb.").
 "i hope you didnt translate it by hand."

Which is much better.

After I’d already completed Level 2 and was working on a higher level, I remembered reading that Erlang had Haskell style list comprehensions as well. I’m a big fan of list comprehensions and put them all through my Python code, so naturally, I had to do a version of Level 2 using a list comprehension:

:::erlang
 decipher(Msg) -> [trans(C) || C <- Msg].

Much nicer. Do you see why I’m a fan of list comprehensions? Once your familiar with the basic syntax, I think it makes the function much easier to understand than even the recursive, pattern matching version.

Erlang’s list library also has a map/2 function that can be used to achieve an effect similar to the list comprehensions:

:::erlang
 decipher_map(Msg) -> lists:map(fun(C) -> trans(C) end, Msg).

map/2 takes a function and a list as arguments and applies the function to each element in the list and returns a list of the results.

(I have an anonymous function in there (defined with ‘fun’) because for some reason, this:

:::erlang
 decipher_map(Msg) -> lists:map(trans, Msg).

Didn’t work. I’m still not sure why I had to wrap trans/1 in an anonymous function to get lists:map/2 to work right.)

Using map/2 doesn’t really gain you much in terms of clarity over list comprehensions, but it does mean that you could easily switch it to pmap/2 and your code will run in parallel across multiple processors or machines in a distributed environment without any other changes, Let that one sink in for a minute. Yes, it should remind you of google’s MapReduce.

I have to admit that I actually burned a fair amount of time on this level because of one stupid little syntax issue that took me forever to figure out. If you look closely at the trans/1 function, you’ll notice that Erlang uses =< as “less than or equal to” rather than <= which I’m more accustomed to.

My first attempt at writing trans/1 was basically what is up here, but with <=. Erlang gave me a syntax error on that and I assumed that I had messed up the syntax for guards somehow, so I attempted a version with an if statement instead. I still got a syntax error there so I thought that maybe ‘,’ wasn’t the right way to combine conditions and tried things like ‘and’, ‘&&’, etc., none of which worked. Eventually I bit my tongue and wrote a big, ugly version of trans/1 with nested if statements. When I still was getting a syntax error, it finally occurred to me that <= might not be right so I dug through the docs some more and discovered my problem. I probably should’ve been able to figure that out more quickly but <= is just so ingrained in my brain that it never occurred to me that that might be my problem.

Other than that, this was a very fun level to solve. Pattern matching, recursion, guards, pulling apart lists into heads and tails, map, and list comprehensions are all things that just warm this programmer’s heart.

On to Level 3!

Erlang Challenge: Level 1

By anders pearson 21 Aug 2006

I’ve been pretty fascinated with the concurrency and reliability aspects of the Erlang programming language lately. Lessons I’ve learned from reading about Erlang’s concurrency model have even been popping up in my day to day Python programming.

However, most of my knowledge of Erlang stuff is purely academic. I’ve read the documentation, the whitepapers and numerous advocacy blog posts, but I haven’t really written anything non-trivial in Erlang myself yet. Clearly I have to change that.

It’s been quite a while since I’ve done any programming in a functional language at all and all my ideas for Erlang projects are… ambitious to say the least. I realized that I really need some fairly simple tasks to get myself up to a level of basic familiarity with the syntax and standard library before I go too crazy with self assembling distributed applications designed to survive massive hardware failure, etc. etc.

There’s this great website called the Python Challenge that’s basically a 33 level riddle where every level requires some sort of programming to figure out the URL for the next level. It’s called the “Python” challenge since it was originally designed to get a programmer familiar with Python’s features and libraries. For the most part though, It can be done with any language and works nearly as well for familiarizing the programmer with that language’s features and libraries. Exactly what I needed. [Apparently, levels 5 and 23 do require python though, so I’ll probably skip them.]

So I decided to start working through the Python Challenge with my limited knowledge of Erlang. While I was at it, I figured I’d blog my efforts on each level, including my missteps and questions. I’m going to try not to give away the answers, but my entries are clearly going to be massive hints at least. So if you want to solve the puzzle yourself, you might not want to read any of these posts until you’ve completed the level yourself.

Entries may be few and far between as it takes a bit of time to write them up and this week I’m supposed to be packing to move to a new apartment. We’ll see how far I get before I get distracted and move on to something else :)

If it wasn’t clear already, you definitely shouldn’t expect any code here to be an example of how Erlang should be written. I’m still figuring out which way is up in the language and I fully expect that when I’m more comfortable with Erlang I’ll look back on these early entries and cringe. Any experienced Erlang programmers who want to offer suggestions for cleaner, more idiomatic solutions, please post comments.

Now then…

Level one asks you to raise 2 to the 38th power. This would be easy enough to solve with a good scientific calculator, but the point of this is to do everything in Erlang.

A little poking through the documentation and I find the math library which includes the pow/2 function. Erlang has arbitrary precision arithmetic so at least I don’t have to worry about overflowing a 16-bit integer type or anything stupid like that. Simple enough. I fire up the Erlang interactive shell (if your programming language doesn’t have a REPL, ask for your money back):

$ erl
Erlang (BEAM) emulator version 5.4.6 [source] [hipe] [threads:0]

Eshell V5.4.6  (abort with ^G)
1> math:pow(2,38).
2.74878e+11

OK, so Erlang’s default display of large numbers is in scientific notation. It’s right, but I suspect that the challenge is looking for the full number written out. In another language, I’d do some sort of sprintf or format string magic with a ‘%f’ in it somewhere to convert the large floating point number into a string or even just int() it to an integer.

Erlang has an io library with a format/2 function in it. Erlang’s format uses ‘~’ as the control character instead of ‘%’ like most of the C influenced languages. ‘~’ formatting still feels a little weird to me but it’s how Lisp does its formatting too, so at least Erlang’s in good company.

At the prompt, I put in:

:::erlang
2> io:format("~f",[math:pow(2,38)]).

and it gives me the number i’m looking for formatted as a regular decimal. It’s good enough to get me to the next level, but while I’m here figuring out number stuff, I figure it would be good to figure out how to properly convert between float and integer types.

It actually takes a fair amount of scouring the Erlang documentation looking for a function like python’s int(). I expect it to be called something like ‘int’, ‘integer’, or ‘float_to_integer’. Eventually, I stumble on the function that I want: round/1, which is a built-in function. I guess the name makes sense, it just wasn’t what I expected coming from Python. Ultimately, this is the simplest solution that I’ve found yet:

:::erlang
3> round(math:pow(2,38)).

prints out exactly the number that I’m looking for.

That was simple enough. I’m a long way from being a master of Erlang, but at least now I have the library reference bookmarked, have skimmed through it a little, and know a couple simple things like how to format strings and convert between floats and integers.

Tomorrow (hopefully): Level 2.

:::erlang
4> halt(). 

verizon pain

By anders pearson 17 Aug 2006

As the programmer in the family, I’m the defacto tech support guy. No way around it. I don’t mind too much except that I really only know Linux stuff well. I haven’t had Windows installed on any computer of mine since about 1999. If something there doesn’t work immediately, I’m about as helpless as the average Windows user would be on Linux.

So when my sister up in Maine bought her first computer a few weeks ago, I got in there early and installed Ubuntu on it while I was visiting her. All she really wants to do is basic web surfing and email and maybe upload some pictures from a digital camera. Ubuntu is way more than enough for this plus I know it well, it’s easy for me to set it up to remotely administer and troubleshoot it for her, and it’s pretty secure by default. If I’d left Windows on there it would have turned into a virus infested mess in no time and I’d probably be stuck walking her through a reinstall every few months. Honestly, if I’d been able to talk to her before she bought her computer, I’d have steered her towards a Mac of some kind, but I couldn’t so just got a cheap eMachines box and Ubuntu will have to do.

Installation was pretty straightforward. I ran into some problems just because the machine only came with 256MB of memory (which I think is criminally low). The current version of Ubuntu (Dapper Drake) installs from a livecd and livecds like to have a bit more memory to work with. So the installer was really dragging and I had to get a bit creative with going in and killing off every process that wasn’t completely necessary. If I’d known that she had that little memory, I would have either bought more for her before installing the OS, or I would have just used a slightly older version of Ubuntu that still had the seperate install disk that does everything from the command line and doesn’t need lots of memory to run. Anyway, other than that, it was completely painless. Ubuntu detected all the hardware automatically and ran smoothly once it was installed on the drive. I plugged in my digital camera and iPod and it detected them both and started up the appropriate programs without any trouble. From what I’ve seen, tables have turned and Ubuntu currently has better hardware detection and support than Windows. The camera that worked trivially on her computer, I can’t get a windows machine to recognize without manually installing drivers from a CD that I’ve long since lost.

Unfortunately, the one thing I couldn’t set up for her while I was visiting was the network connection. She’d called Verizon and ordered a DSL account from them. While I was there, she still hadn’t gotten the DSL modem/router yet in the mail. I tried to talk her into going with a cable modem instead of DSL because I know that it’s usually a much simpler setup, but the only option where she lived would have cost twice as much as the DSL and she would have had to have some things rewired to get cable to the room with her computer (or a really long ethernet cable I guess). Plus the DSL came with a phone line that she was planning on getting anyway (she’s been on just a cellphone for the last few years).

This left me in the position of having to walk her through the whole network setup thing over the phone this morning. Verizon, of course doesn’t officially support anything other than Windows XP and Mac OS X, but I have yet to meet an ISP that officially supports linux and yet I’ve always been able to get my stuff connected so I figured it couldn’t be too bad.

Oh god, was it bad. Ultimately we were able to get her online, but it was not a simple process and involved making use of a coworker’s Mac to complete one of the steps. I’m writing up the steps that we had to go through here in case anyone else out there is trying to get a similar setup working. Perhaps some of the information here will be useful.

The box she got in the mail contained a dsl modem/wireless router and a Windows/Mac CD-ROM installation disk. No instructions, no account info.

Got her to plug in the router and put the right cables into the right jacks and turn everything on. So far, so good. Ubuntu saw that it had an ethernet connection and got itself an IP address from the router via DHCP. That involved nothing beyond plugging things in and turning them on.

Then she starts up a browser and tries to go to google.com. What she gets is a website telling her that her operating system is unsupported. In other words, they set a portal system like you sometimes see with for-pay wireless networks in airports, etc. where you get an IP and everything via DHCP, but the router always sends you to a login page when you first get on so you can enter your account info and it will authenticate you. Except with the additional twist that the login webpage was doing a browser detect and not letter her even access that.

The easiest thing at this point would have been for her to borrow a windows or mac laptop from someone and go through the activation process on that. My sister lives in rural Maine though and basically doesn’t know anyone within a few hundred miles with a laptop. So our options were to either figure out some way to do it from Linux or give up and reinstall Windows on her computer for the DSL activation and then reinstall Linux after that. I don’t give that easily though.

I found some phone numbers that she could call to set up her account username and password over the phone and sent her off to do that while i did a bit more research on the problem.

My assumption at this point was that it would be like a typical portal setup and it was just a web page that she needed to get access to and there was an overly restrictive browser detect script keeping her from getting to something that would probably work just fine if she could get to it.

So next I took her through hacking her Firefox config to spoof the User-Agent string to make it report itself as Internet Explorer so she could get to the activation web page. The steps for this are:

  • enter ‘about:config’ into the browser url bar and hit enter
  • right click anywhere on the preferences page that comes up and select ‘New -> String’
  • put in ‘general.useragent.override’ and hit enter
  • enter ‘Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; SV1)’ into that box and hit enter

That worked nicely and she now had a browser that would look exactly like IE6 to Verizon’s servers. When she went back to the login page, she was able to see it this time, although then we learned that they weren’t kidding about only supporting IE and the page was a big garbled mess to Firefox (web standards? what are those?). Just to try to see what she was encountering, I got the url from her, which was for http://activatemydsl.verizon.net/ and loaded it on a Windows machine in our office. There’s a form there for entering a phone number and zip code to set up a new account, or entering a username/password to activate an existing account. So she probably hadn’t needed to do that stuff over the phone; we could have done it through the web here. But we already had the account info at this point, so I put that in and… got a page with an Active-X control. Ouch. There was no way we were going to get that working on her linux box. Actually, I couldn’t even get it to run on the Windows machine that I was on.

I got one of my coworkers with a Mac to go to the site and it gave them a .dmg file which we unpacked and ran the Mac installer program that it included. This program had us put in her account info and click through a Terms of Service agreement, etc. It ended by installing a bunch of crap on his computer and informing us that the account was now activated.

Back on her computer, she was still getting the same garbled login screen instead of websites so I figured that one of the things the installer program probably took care of was putting in the PPPoE username and password in the router’s settings.

This is usually easy enough to do manually though. I got her local IP address (192.168.1.65) and from that could figure out the router’s IP (192.168.1.1) which I had her connect to in Firefox. We hit a bit of a stumbling block here because the router didn’t come with a manual so she didn’t know the router’s admin username/password. They generally all come with the same defaults set for a given model though. I googled for the default for her router. At first I only searched for the manufacturer and not the exact model number and the default passwords listed for them didn’t work. I was starting to get worried that Verizon had installed custom firmware on the routers with a different admin password so users couldn’t mess with it or something. But when I got her to read off the exact model number to me I was able to find a different password that actually worked. From there, we just went through the router’s setup and entered her Verizon account info and everything worked after that.

At this point, I’m not sure if the step of running the installer program on my coworker’s computer was strictly necessary or if we’d gone straight into the router and put in her username/password there if it would have worked. My guess is that it was necessary though for the installer to run so the TOS could be agreed to and a flag set in Verizon’s central database saying that the account was valid. If we’d skipped that step, I think we probably wouldn’t have been able to connect even with the PPPoE username/password set. If that’s indeed the case, then it really is impossible to get a Verizon DSL connection set up without at least knowing someone with a Windows or Mac box that can do the activation for you. If it isn’t necessary, then we could have done it all from Linux (though without the router manual, we still would’ve needed to already have an internet connection somewhere so we could google the default router password).

Now, if Verizon had just made their activation process be a normal website that worked in any browser instead of a full desktop application that required a particular OS to run and had included the manual for the router they sent her, this whole process would have been almost completely painless.

My obligatory rant is how this sort of thing makes linux adoption harder than it should be. Ubuntu did absolutely everything it could to make it all easy and painless for the user. There’s nothing that any linux programmer could have done to make it easier for us. The problem was entirely that Verizon was doing a number of things that made it very difficult for a Linux box to connect. But for someone a little less persistent than me and a without the same knowledge and resources, if they’d installed Ubuntu on their computer and then tried to connect to Verizon with it, they’d come out of the experience thinking “Ubuntu must be broken. It can’t even connect to DSL.” Linux adoption is a chicken and egg problem. Until lots of people are running Linux, it doesn’t make sense for someone like Verizon to officially support it, but if it’s this hard to connect to their service, not very many of their customers are going to run Linux. As an open source developer myself, I feel some real empathic frustration for the Ubuntu developers. They’ve done an amazing job making an OS that just works out of the box for almost everything. If adoption was just a matter of writing code, they would have solved it by now. But instead, there are these external forces in play that make it an uphill battle for them no matter how great a product they create.