Thraxil.org

NaGraNoWriMo

By anders pearson • 2007-10-25

At some point a while back, my little meditational abstract doodles started occasionally getting boxes around them. When there were several of those on a page, I noticed that they sort of resembled the panel layout of a comic. I kind of liked that so I experimented more in that direction, playing with deliberate panel layouts and borrowing from the vocabulary of comics and sequential art while keeping my subject matter abstract. Now I’ve got a fairly large collection of these abstract comics and it’s a style that I think is still interesting and rich for experimentation.

Lately I’ve found myself in several conversations with friends about National Novel Writing Month (NaNoWriMo). The executive summary if you’re too lazy to click the link and aren’t familiar with it: you give yourself the month of November (30 days) to write a 50,000 word novel. It doesn’t have to be good or make much sense or anything, it just has to be started and finished within the month. I’ve always thought it was a cool idea. A bunch of my friends have attempted and at least one has even successfully completed a NaNoWriMo.

As a writer, I’m not really interested in writing fiction so it isn’t something I’d be up for myself. The idea continues to intrigue me though so I’ve decided that this year I want to try to do an abstract graphic novel NaNoWriMo style. A “NaGraNoWriMo” I guess (calling it a “NaAbsGraNoWriMo” would just be taking it too far.)

Obviously, I can’t use word count as a metric for my graphic novel. Laying out, drawing and inking a full page of panels I’ve found takes me about three to four hours with the level of detail that my style involves. The notion of “a picture is worth a thousand words” I think actually works out pretty well for my purposes. 50,000 words = 50 pages = 1.67 pages per day = about 6 hours per day of me drawing for 30 days. I think that’s a similar magnitude of challenge to what other NaNoWriMo writers are attempting.

That’s the goal, anyway. I’ve got a lot of other things going on in November so it’s not going to be easy to find all that time. I’ll try to post the pages to flickr as I go, but scanning and posting takes time that I could be drawing so that will be a lower priority than finishing.

If you don’t see me outside my apartment or hear from me next month, now you’ll know why.

[Yes, I know that someone else has used the term “NaGraNoWriMo” for something similar. Their approach was to write a “script” for a 175 page graphic novel. I honestly don’t know quite what a script for a graphic novel is but I’m pretty sure that it wouldn’t work for an abstract one. I like my idea better.]

periodic personal update

By anders pearson • 2007-08-25

I’m in the office now on a Saturday enjoying the free air conditioning and functioning ceiling, which, at the moment, my apartment is lacking. I don’t have anything else pressing to do at the moment so I figured I would take this opportunity to post an all too rare update on the miscellaneous stuff going on in my life for those of you that I’m not in frequent contact with.

For the most part, it’s been pretty quiet here. I work a lot, paint and draw obsessively (I think there should be enough material for a Volume 2 soon, in the meantime, it usually ends up on flickr), and occasionally go out for beers, movies, or concerts with my friends. I’ve even been running semi-regularly.

I currently have two plants which, by some miracle, I haven’t managed to kill yet with my neglect.

In July, a couple coworkers and I went skydiving. I can not possibly recommend it highly enough if you’ve never done it. I liked it for a very different reason than I expected. When you jump out of the plane, there’s about a minute of free fall which is pretty exhilarating. You don’t know which way is up for a few seconds, then you’re assaulted by 120mph wind and deafening noise. Your brain sort of shuts off because it can’t handle the intensity of the sensations which it has absolutely nothing in memory to compare to. I actually kind of forgot how to breathe during free fall and had to consciously remind myself. That was fun in its way, but I’m not an adrenaline junky type so it wasn’t the big draw for me. When your chute opens, you’re suddenly thrown into a different universe. In a second or two, you go from free fall to absolute stillness and dead silence. You’re far enough up that you have no sensation of movement at all and you can see for hundreds of miles to the horizon in all directions. You realize that there’s basically no other solid matter anywhere within thousands of feet of you in any direction. It’s an amazing sense of serenity and calm more powerful than anything I’ve ever experienced with yoga or meditation.

One thing I don’t recommend is getting an ear infection. I was pretty much knocked out of commission last week with strep throat that turned into an ear infection. The strep was quite pleasant in comparison. The ear infection was several days of continuous, unrelenting, maddening pain. Pain killers seemed to have no effect. If you’ve never had one, it’s basically like jabbing a pencil into your ear, sharp end first, then giving it a little tap with a ball peen hammer about once a second. For days. Antibiotics eventually were brought in and made short work of the infection, but it’s not an experience I would ever like to repeat.

On the technical front, I’ve been continuing my climb up the Erlang learning curve. The recently published book by Joe Armstrong has been a big help. I’m particularly smitten with Mnesia at the moment and I also see the OTP framework being very useful to me in the future. I’ve prototyped some simple microapps with Erlang and Mnesia and am scheming on how I can sneak some of it into work.

Speaking of work, we’ve been up to some cool stuff. I spent a couple weeks converting our whole (modest) server infrastructure to virtual servers running on Xen. So far it’s gone really smoothly and it should make things even more reliable and flexible in the future. I’m really impressed with Xen. We evaluated VMWare as well and it has some advantages, but for our situation, we found Xen to be more than adequate performance-wise and much simpler to deal with (I’m vastly more comfortable with arcane Unix commands than point and click stuff. YMMV).

We also recently got a nice grant from NIMH for a very worthwhile HIV prevention project. Sadly, I don’t think most of the grant money is going directly into my salary. We’ve got more cool projects in development too.

I’ve also been psyched that we recently hired a new programmer to help out with our general world domination plans. She’s been kicking ass and taking names, rapidly getting up to speed with the Python and TurboGears stuff we use, and she’s an industrial fan to boot. I expect the orbital laser to be operational very soon.

With that addition, we’re up to four desktop Linux users in the office now (all Ubuntu) with some others experimenting, which is a nice change from two years ago when it was just me.

Finally, it appears that the Japanese convenience store in my neighborhood is again carrying Choco Monaka Jumbo ice cream bars after months of being out of stock. I was beginning to panic.

forced hiatus

By anders pearson • 2007-07-23

For once, I have a good excuse for not posting anything here lately. Usually I don’t post for months at a time and it’s just out of laziness and a lack of anything interesting to write about. I’m not saying that wasn’t also the case this time, but this time I can at least pretend it was something else preventing me from posting.

Many of you noticed that thraxil.org was offline for the last month or so.

The root cause was that registerfly.com, the registrar that I’d registered my domains through, lost their ICANN accreditation for generally being sleazy. They transferred the domains in their control to a couple other registrars. Last time I renewed thraxil.org, I paid for a couple years at once to minimize the chances that it would expire on me when I wasn’t paying attention. Somehow, that got lost when they transfered it and the domain expired in June, a year before I was expecting it to.

Normally, a domain expiring isn’t that big a deal. ICANN gives you a 40 day grace period where the original owner is allowed to renew their domain without penalty and domain squatters can’t get it. This generally works pretty well to eliminate the really predatory domain squatting. However, you can only renew the domain through the original registrar. This obviously was a problem for me. I couldn’t renew it through registerfly since they were no longer accredited. Registerfly couldn’t give me the transfer authorization codes that I needed to transfer the domain to a different registrar because the domain was expired. As far as I can tell, there just isn’t a clear procedure in place for how to deal with the situation of an expired domain on a defunct registrar.

I won’t even bother going further into what I had to do to get everything back up. Obviously it involved a lot of bureaucratic nonsense that made me want to embrace violence as a problem solving technique. There was also a round of me having to prove my identity via several easily forged tokens (screenshots of my account page? emailing a scan of my photo id? ) that didn’t exactly make me feel secure with the general process.

Anyway, we’re back online now so I have no more excuses.

book

By anders pearson • 2007-01-11

I’ve been drawing and painting a lot in the last couple years. Obsessively, perhaps.

A few months ago, I was looking at the shear volume of work I’d produced and realized that it was actually enough to fill a book. That idea kind of stuck with me and I became fascinated with the concept of having my own book. As an unknown, there was no way any publishing companies out there would bother with me, and I didn’t really relish the thought of having to deal with a publisher anyway. So I did some research and found that technology has recently made a print on demand model actually pretty feasible.

Lulu.com looked like the best of what’s out there so I ordered a couple of random books from their site to check the quality and I was pretty impressed. The prices are a little high, but considering that not too long ago it really wouldn’t have been financially possible to publish a book without doing a print run of hundreds of copies, it’s not too bad.

I don’t really know jack about anything print related so I roped in some friends to help me photograph the paintings that were too big to scan, lay everything out and put it into a pdf with the right settings, and design a professional looking cover. They came through for me and put together a complete package that makes it look like I’m actually a real artist or something. Neat.

Anyway, it’s now available. You can order a copy directly through lulu.com’s online store.

The 158 page color printed 9x7 inch paperback version is $28.23. That’s actually just the base manufacturing cost that lulu charges. I make no profit whatsoever on the book. This is purely a labor of love for me, not a money making venture. There’s also a free PDF download available if you either want to preview it or if you have access to your own book printing equipment and want to print it yourself.

The license on the book according to the site is the Creative Commons Attribution License 2.5 but that’s only because Lulu.com’s publishing interface didn’t include the option for the Public Domain which is what I prefer. In other words, do whatever the hell you want with the book. I don’t care. I’m just happy that it exists. If you like it, you can buy me a beer some day.

Again, thanks to Catherine, Marc, and Jessica for helping me out on this, one of my more involved crazy schemes.

triple plugged

By anders pearson • 2006-11-15

For the third year in a row, MFR has been nominated for a PLUG Award. This year, they created a new “Best Music Blog” category separate from the “Best Music Website” category. That means that our little operation consisting of a couple college friends with a Movable Type install and a bit of free time isn’t competing directly against sites like Pitchfork or MySpace who have, like, budgets and employees and such.

We’re always happy just to get nominated. This year maybe we might even win. At the very least, it’ll be a pleasant change to lose to someone cool and deserving like brooklynvegan, MOKB, or stereogum instead of the mega-corporate MySpace like we did last year.

TurboGears Deployment with supervisord and workingenv.py

By anders pearson • 2006-09-13

As my web development involves more of the microapps stuff, and the applications I’m building become clusters of larger and larger numbers of small services, the deployment challenges become more important to solve. This post is an explanation of my current best practices at the time of writing.

Most of what I build these days is built on the TurboGears Python web framework and runs on Gentoo Linux. I decompose applications into as many small services as possible. As a result, on our production server at work currently has dozens of TG applications running on it. We serve Perl, Java, and PHP apps too and they each have their own deployment challenges, but for now I’m going to focus on the Python stuff.

I started with TurboGears when it was at version 0.8 or so and have basically kept up with the current version (it’s at 1.0b as I’m writing this). The TG APIs haven’t quite stood still across those versions, so the apps written for 0.8 won’t run directly with 0.9 or 1.0. They still work fine though and I’d rather not spend my entire life continuously porting all my applications to the current version of the TG API. So one of the most important aspects of my deployment strategy is a way to have multiple, mutually incompatible versions of TG installed side by side on the same machine without interfering with each other.

It took me a while, but after being bitten too many times by upgrades to one library breaking other applications on the same machine, I eventually came to agree with Ian Bicking’s argument against site-packages (and I’ve been bitten by this in almost every language/environment I’ve ever worked in; it’s not just a Python issue).

Luckily, Ian actually did something about the problem and wrote workingenv.py which is a handy script that sets up an isolated Python environment. Workingenv.py combined with setuptools has been a maintenance dream come true. I now setup a working-env for every single application I write and easy-install TG and any other libraries that are needed for that application into it. I now have complete control over what versions of what libraries are in use by each application (with workingenv.py’s “–requirements” flag) and they can all peacefully coexist on the same machine. The cost is a bit of disk space and an extra shell command here and there to set up and activate the environments, but it saves so many upgrade related headaches that it’s an easy win.

I used to deploy all my TG apps with Apache and mod_python using the mpcp bridge. Actually, many are still deployed that way, but I’m moving away from it. Embedding python in an apache process works really well for some things, but it also made isolating environments with workingenv.py significantly more complicated (I do have a way of doing it, but it’s a messy hack that I’d rather not include in a post on “best practices”). Mod_python deployment also has memory usage issues the more apps you pile into it, and having them all tied to the same Apache process means that to restart or reconfigure one, the apache daemon has to be restarted so all the other apps tied to it get restarted. As some of them become more critical to our operations, the couple seconds of downtime each time apache gets restarted becomes more of a problem.

So I’ve now switched to the approach that the TG community seems to have basically agreed is best, which is running TG apps standalone and proxying to them with apache or lighttpd (we use apache at work and I use lighttpd for this site). The main drawback to this approach is that it means there are a lot more long-running server processes that have to be kept running. So that’s more start/stop scripts, monitoring, and logs for the sysadmin side of me to have to deal with. It also means more chance that if one of the apps manages to crash, someone has to notice and restart it.

Supervisord is the best solution to this problem that I’ve found. Supervisord is a daemon that starts your services as child processes and watches them, restarting them automatically if they die. It handles things like dependencies, monitoring, and is smart enough to know how to back off on restarts if the process goes crazy. Titus Brown’s introductory article is a good place to start to learn more about it.

Here’s a more concrete example of how all of these pieces fit together.

The first step in setting up a new application is to create a working-env for it. I like to include a requirements.txt that lists the exact versions of every library that I want to use for the application. The example ones that are linked to in Ian Bicking’s post are a good starting point.

$ python ~/bin/workingenv.py working-env --requirements requirements.txt

Then, to do development on the application, you activate the environment and start the server:

$ source working-env/bin/activate
$ ./start-foo.py

For production, we’ll need a single script that the supervisord can execute that starts the server in production mode. It looks something like:

:::bash
#!/bin/bash
#  this file is: start.sh
cd $1
source working-env/bin/activate
./start-foo.py $2.cfg &
echo $! > /tmp/foo.pid
wait $!

and the corresponding config section in supervisord.conf is

:::apacheconf
<program foo>
 command pidproxy /tmp/foo.pid /path/to/foo/start.sh /path/to/foo/ prod
 auto-start true
 auto-restart true
 logfile /var/log/foo-supervisor.log
 user pusher
</program>

Those require a little bit of explanation. start.sh takes a path to the project’s directory and the name of a config file as its arguments ($1 and $2) to be flexible for different deployment situations. Since start.sh spawns the python process seperately, if we just did:

:::bash
#!/bin/bash
cd $1
source working-env/bin/activate
./start-foo.py

Supervisord wouldn’t quite work right. It would be sending signals to the bash process instead of the python process and stopping and restarting the service wouldn’t work right. So instead, start.sh starts the python process in the background, writes its pid out to a file and then waits for the python process. The pidproxy program (which comes with supervisord for just such a purpose) then is used to run start.sh and send signals directly to the python process.

It took some work to figure all this out, but once you’ve done it for one or two apps, it becomes pretty quick and easy to set up any application for deployment like this. Probably my next step will be to create a paster template for tg-admin so these scripts get automatically put into quickstarted projects instead of having to copy them in manually.

I should also mention that there are a few more components of our deployment strategy that I didn’t really talk about. The first is version control. If you’re doing development and not using version control of some sort, you are insane and deserve whatever horrible fate befalls you. Second, we have a single application that handles deployment to production. You press one button and it checks out your app to a staging environment, runs the unit tests, then rsyncs it to the production server and runs any additional steps that are needed there (in the above case, it would do a ‘supervisorctl restart foo’ on the production server to get the new code running). It also logs every step, tags releases in svn, and allows for easily rolling back production to a previous release in case something goes wrong. Our pusher/deployment system is very particular to our environment, so I won’t explain it in too much detail here, but I highly recommend spending the time to set up something similar for your situation. At the very least, you’ll want a shell script that does the deployment process in a single step.

Erlang Challenge: Level 4

By anders pearson • 2006-09-01

(Read [[Erlang Challenge: Level 1]], [[Erlang Challenge: Level 2]], [[Erlang Challenge: Level 2 (2)]], and [[Erlang Challenge: Level 3]] before reading this post or it won’t make sense)

Level 4 of the challenge gets a little meatier. The challenge page has a url that ends like ‘linkedlist.php?nothing=12345’ and the body of the page is just the text “the next nothing is 098093”. You put that number into the url and it gives you another page with the same text but a different number. So the challenge is to write a script that does an HTTP GET request, parses the next “nothing” out of the body of the response, constructs the new url and repeats until it reaches the end of the chain.

Parsing the number out of the text shouldn’t be any harder than the previous challenge, so first we just tackle that in its own function:

:::erlang
extract_nothing(Body) ->
    case regexp:first_match(Body,"the next nothing is [0-9]+") of
        {match,Start,Length} ->
            string:substr(Body,Start + 20,Length - 20);
        nomatch ->
            io:format("no match found in body text: ~s~n",[Body])
    end.

Again, we’re just using the regexp library to look for a match and then string:substr/3 to pull it out. This time there’s a little bit of error checking though in the form of the case statement. regexp:first_match/2 can return either ‘{match,Start,Length}’ or ‘nomatch’ when it doesn’t find anything. The case statement just lets us easily switch on the patterns and print out a message with the full body text when it doesn’t find a match. That condition would presumably mark the end of the chain that we’re following.

Now, the part of this challenge that really was useful for me was that the script has to make GET requests. I consider making HTTP requests to be a fundamental operation these days. The “microapp” architecture that I’ve been pushing lately involves exposing as much as possible over a REST/HTTP interface. Once I can make HTTP requests in Erlang, it means that I now have available all the functionality that I’ve previously exposed as a microapp. That means easy shared storage, tagging, threaded comments, full-text search, image processing, and a number of other special purpose microapps that I and other people have built. It instantly makes it more feasable for me to start building real web applications in Erlang.

It took me a while with google to find it, but Erlang does include a pretty decent HTTP client module. It’s pretty similar to python’s urllib or httplib2 libraries. The only difference is that it actually does the requests in a seperate Erlang process to allow for nice asyncronous requests. This actually seems to be typical of how Erlang interacts with any outside system. There’s always a special process that gets spawned that acts as a proxy. So your code just interacts with it like a regular Erlang process. That’s an important concept to understand, but for the purpose of just making a regular synchronous GET request, it basically just means that we have to call the:

:::erlang
application:start(inets).

function once to spawn the process before we can make any HTTP requests. Other than that, it’s just regular function calls in the http module.

It didn’t take long after finding the documentation to make a function which takes one “nothing” fetches the page, extracts the next one from the body text and returns it:

:::erlang
next_nothing(Nothing) ->
    Url = "http://www.pythonchallenge.com/.../linkedlist.php?nothing=" ++ Nothing,
    {ok, {{_, 200, _}, _, Body}} = http:request(get, {Url, []}, [], []),
    extract_nothing(Body).

It constructs the url, makes the request, uses pattern matching and liberal use of the ‘_’ “don’t care” variable to pull the body text out of the response tuple, and then uses the previous extract_nothing/1 function to parse it and get the next number.

Reading the full documentation for the http module I realized that I could simplify the http:request line to just:

:::erlang
{ok, {{_, 200, _}, _, Body}} = http:request(Url),

since a GET request with no extra stuff is just the default.

Error handling could be added with another case statement that just caught anything that didn’t come back with ‘ok’ as the first element of the tuple and printed a message or something. I’m pretty much expecting to not encounter errors here though so we’ll skip it.

All that’s left now is to wrap this in a “loop” now to have it repeat until it hits a page that it can’t parse a “nothing” out of:

:::erlang
all_nothings(Nothing,Count) when Count > 300 ->
    io:format("finished on: ~s~n",[Nothing]);
all_nothings(Nothing,Count) ->
    io:format("getting nothing: ~s~n",[Nothing]),
    NewNothing = next_nothing(Nothing),
    all_nothings(NewNothing,Count + 1).

all_nothings(Nothing) -> all_nothings(Nothing,0).

A hint on the challenge page says that it shouldn’t take more than 300 steps to get to the end, so I put in a counter to keep track of how many requests we’d made and put a guard on the first clause to stop it at 300. (Looking at the discussion of the challenge afterwards, I learned that this hint was there because there was a trick in the challenge. If you didn’t match on the whole phrase and just pulled out the first number from the page, one of them sends you off on a different chain that turns into a loop and will go on forever.)

The second clause does the main work, printing out the current “nothing”, getting the next via next_nothing/1 and then incrementing the counter and recursing. all_nothings/1 is just a driver function added for convenience that starts the whole thing with a counter of 0.

This level of the challenge has one or two other little snags that make it tricky, but I won’t mention them since they didn’t involve any programming to get around.

I’m happy to see that making HTTP requests in Erlang is about as painless as I’m used to with other languages. That really opens it up for me as a legitimate option for my own work.

The only mildly disappointing thing I discovered with this level (and it sort of showed up earlier, too) is that google doesn’t seem to have done a very good job of indexing the Erlang documentation that’s out there. Once I got a little comfortable with the language, I’ve found the terse format of the documentation to be perfectly adequate, but it’s a little scattered so you have to know where to look for something. Googling something like “erlang http module” doesn’t really turn up anything useful. Instead, I had to browse around the documentation site to find an index of the modules and scan the list looking for one that would be appropriate. Now that I know that that’s what I have to do, it’s not to bad, but it would be nice if Erlang docs were a little more googlable.

Oh, I should also mention that Level 5 of the challenge does indeed involve writing python code. Specifically, you have to reconstruct a pickled python object, so it isn’t really doable in Erlang. I’ll do that one in python so the next entry in this series will be Level 6. It may be a while before I get to it though since I won’t have internet access in my new apartment until later next week.

Erlang Challenge: Level 3

By anders pearson • 2006-08-24

(Read [[Erlang Challenge: Level 1]], [[Erlang Challenge: Level 2]], and [[Erlang Challenge: Level 2 (2)]] before reading this post or it won’t make sense)

Level 3 involves regular expressions. I was a serious Perl programmer for a few years, so this shouldn’t take too long.

The problem presented is similar to the last level. It’s looking for needles in a 100K haystack of text that looks like:

kAewtloYgcFQaJNhHVGxXDiQmzjfcpYbzxlWrVcqsmUbCunkfxZWDZjUZMiGqhRRiUvGmYmvnJIHEmbT
MUKLECKdCthezSYBpIElRnZugFAxDRtQPpyeCBgBfaRVvvguRXLvkAdLOeCKxsDUvBBCwdpMMWmuELeG
ENihrpCLhujoBqPRDPvfzcwadMMMbkmkzCCzoTPfbRlzBqMblmxTxNniNoCufprWXxgHZpldkoLCrHJq
vYuyJFCZtqXLhWiYzOXeglkzhVJIWmeUySGuFVmLTCyMshQtvZpPwuIbOHNoBauwvuJYCmqznOBgByPw
TDQheAbsaMLjTmAOKmNsLziVMenFxQdATQIjItwtyCHyeMwQTNxbbLXWZnGmDqHhXnLHfEyvzxMhSXzd

The needle we’re looking for is a single lowercase letter with exactly three uppercase letters on either side.

My approach is basically going to boil down to: 1) save the data in a text file again 2) read it into an erlang string 3) use erlang’s regexp module to look for the pattern

(?<![A-Z])[A-Z]{,3}[a-z][A-Z]{,3})(?![A-Z])

That regexp is pretty gnarly, so let me break it down for anyone who hasn’t been immersed in Perl or a similarly regexp obsessed language. In the middle is “[a-z]”, which matches the lowercase letter we’re looking for. On either side of it is “[A-Z]{,3}” which matches exactly three uppercase letters. At the outside are “(?<![A-Z])” and “(?![A-Z])” which are zero-width negative lookbehind and lookahead (respectively) assertions. They are there to make sure that it doesn’t match cases with 4 or more uppercase letters since the instructions said “exactly 3”. The even funkier than normal syntax on them is so they’ll match the characters but those characters won’t get included in the results (hence “zero-width”).

Checking out Erlang’s regexp module, a problem appears. Erlang’s regexp support simply isn’t up to the same level as Perl or Python’s. I was checking because zero-width assertions are fancy enough that I was worried it wouldn’t support them. Naturally it doesn’t. But it also appears that it doesn’t support curly braces for specifying exact numbers of matches. That’s unfortunate, but ultimately both of those are just icing on a regexp engine and we can easily rewrite the pattern without them:

[^A-Z][A-Z][A-Z][A-Z][a-z][A-Z][A-Z][A-Z][^A-Z]

The catch is that without zero-width assertions, we’ll get back an extra character on either side of the interesting bits which we’ll have to strip out with Erlang code.

Otherwise, the regexp module is pretty straightforward. There’s a regexp:first_match/2 function which takes a string and a regexp and returns a tuple of ‘match’ or ‘nomatch’ and the start position and length of the first match in the string that it finds. Those we can pass to string:substr/3 to pull out the stuff we’re interested in:

:::erlang
findit(String) ->
    Pattern = "[^A-Z][A-Z][A-Z][A-Z][a-z][A-Z][A-Z][A-Z][^A-Z]",
    {match,Start,Length} = regexp:first_match(String,Pattern),
    string:substr(String,Start + 1,Length - 2).

The ‘Start + 1’ and ‘Length - 2’ are to strip off the extra characters on either end of it that we don’t care about. And again, in the interest of expediency, we’re ignoring the possible ‘nomatch’ condition since we expect our data file to have a match.

We can reuse the read_data/1 function from last time to read in the data (stored in “l3.txt” this time), but instead of copying and pasting and changing the filename, we might as well refactor it so the filename can be passed in as a parameter:

:::erlang
slurp(Filename) ->
    {ok,IoDevice} = file:open(Filename,[read]),
    {ok,Data} = file:read(IoDevice,10000000),
    Data.

Then, we can do:

:::erlang
findit(slurp("l3.txt")).

And it finds what we’re looking for.

Kind of. When I try using it as the key to get to the next level, I notice that I didn’t read the problem closely enough and there should be multiple matches in the data and we need all of them.

So regexp:find_first/2 isn’t going to cut it. Instead, we can swap it out with regexp:matches/2 which returns all the matches as a list of {Start,Length} tuples. It’s a quick change to just put a list comprehension in to handle the list of results instead of a single result:

:::erlang
findit(String) ->
    Pattern = "[^A-Z][A-Z][A-Z][A-Z][a-z][A-Z][A-Z][A-Z][^A-Z]",
    {match,Matches} = regexp:matches(String,Pattern),
    [string:substr(String,Start + 1,Length - 2) || {Start,Length} <- Matches].

And we’re on to Level 4.

I was tempted to see if I could use Erlang’s gen_fsm behavior to write a small finite state machine to find the matches instead of using regexps (since a regexp is essentially just a convenient syntax for defining a certain kind of FSM) but it’s getting late and I should really get back to my packing.

Erlang Challenge: Level 2 (2)

By anders pearson • 2006-08-23

(read [[Erlang Challenge: Level 1]] and [[Erlang Challenge: Level 2]] before reading this one)

First off, I noticed that the challenge levels actually started numbering at 0 so this third level is really Level 2. So I’m fixing the numbering of my posts from here on out to keep it matched with the site.

Level 2’s challenge gives you a huge block of text (hidden in the source of the page) that’s almost entirely punctuation. It looks sort of like this:

:::text
%%$@_$^__#)^)&!_+]!*@&^}@[@%]()%+$&[(_@%+%$*^@$^!+]!&_#)_*}{}}!}_]$[%}@[{_@#_^{*
@##&{#&{&)*%(]{{([*}@[@&]+!!*{)!}{%+{))])[!^})+)$]#{*+^((@^@}$[**$&^{$!@#$%)!@(&
+^!{%_$&@^!}$_${)$_#)!({@!)(^}!*^&!$%_&&}&_#&@{)]{+)%*{&*%*&@%$+]!*__(#!*){%&@++
!_)^$&&%#+)}!@!)&^}**#!_$([$!$}#*^}$+&#[{*{}{((#$]{[$[$$()_#}!@}^@_&%^*!){*^^_$^
]@}#%[%!^[^_})+@&}{@*!(@$%$^)}[_!}(*}#}#___}!](@_{{(*#%!%%+*)^+#%}$+_]#}%!**#!^_
)@)$%%^{_%!@(&{!}$_$[)*!^&{}*#{!)@})!*{^&[&$#@)*@#@_@^_#*!@_#})+[^&!@*}^){%%{&#@

and basically goes on for about 100K. The hint makes it clear that there are a couple letters hidden in there and you need to write a program to pull them out.

I could solve this level in about 3 seconds with grep or slightly longer with a nice regexp in emacs. But as usual, we’ll stick with the tools that Erlang gives us since that’s the whole point of this exercise.

First, I copy and paste the the data out into a file called ‘level2_data.txt’. I suppose that the Erlang program could be written to fetch the data directly off the web and parse it out, but that’s a little more advanced than I’d like to get for this early in the challenge. Just figuring out how to read data in from a file will be difficult enough for me.

Once the data’s read in, the problem becomes just filtering non-alphabetical characters out of a string. In Python, I would probably use a list comprehension again or just the built in filter() function. Lo and behold, Erlang has a lists:filter/2 function that works like I expect it to. With either approach though, a helper function will be needed to discriminate between letters and everything else:

:::erlang
is_alpha(C) when C >= $a, C =< $z -> true;
is_alpha(C) when C >= $A, C =< $Z -> true;
is_alpha(_) -> false.

If you followed the last entry, this one should be pretty understandable. Simple pattern matching with guards. The only thing new is the ‘_’ variable, which is just a convention in most functional languages for a variable whose value you really don’t care about. So the last line is just there to catch any character that didn’t match one of the first two conditions.

lists:filter/2 works similarly to lists:map/2 which I explained already. Where map applies a function to every element in a list and returns a list of the results, filter applies a function to every element but only returns the elements for which the function returns true. In other words, it’s like grep inside a programming language.

Again, I seem to need an anonymous function to make it work right, but it’s still pretty simple:

:::erlang
filter_strip(Msg) -> lists:filter(fun (X) -> is_alpha(X) end, Msg).

I test this from the shell on short strings I type in and it seems to work nicely, so now we just need to pull the real data out of the file and into an Erlang string that we can pass to the filter_strip/1 function.

Another trip through the library documentation and I find the file module which has some familiar sounding functions like open/2 and read/2.

Doing file i/o is the kind of thing that tends to have lots of potential error conditions that you need to handle if you want your code to be robust in a real environment. Here though, we’re opening and reading in one file that we know ahead of time will exist, will have the proper permissions, etc. So I’m not going to worry about all those error conditions and just write some fast, brittle code:

:::erlang
get_data() ->
{ok,IoDevice} = file:open("level2_data.txt",[read]),
{ok,Data} = file:read(IoDevice,10000000),
Data.

file:open/2 takes a filename and a list of flags. In our case, we just want to read data from the file so the only flag we give it is ‘read’. It returns a tuple consisting of ‘ok’ (it would be ‘error’ if something went wrong but we’re ignoring that possibility for now) and an IoDevice which represents the open file. file:read/2 takes an IoDevice and the number of bytes to read in. Normally, you’d put this in a loop and read in the data in small chunks until you reach the end of the file. Again, since we’re only dealing with one file and we know it’s size and we’re not trying to do anything very robust or efficient, we can get away with just specifying a big number of bytes so it will just slurp in the entire file in one gulp. It also returns a tuple with ‘ok’ (or ‘error’) and Data, which is a string of data. That’s exactly what we’re looking for, so the function just returns that.

Now, just calling:

:::erlang
filter_strip(get_data()).

gives us the letters stripped out of the file and the level is solved.

I mentioned list comprehensions, so here’s the list comprehension equivalent:

:::erlang
lc_strip(Msg) -> [X || X <- Msg, is_alpha(X)].

I really do love list comprehensions :)

Erlang Challenge: Level 2

By anders pearson • 2006-08-21

(Read [[Erlang Challenge: Level 1]] first or this post won’t make much sense.)

Level 2 gives you a ciphertext message and a key that basically tells you that it’s a Caesar cipher with a shift of 2. The message looks roughly like this (I’m just going to include the beginning so as not to give it away completely): “g fmnc wms bgblr rpylqjyrc gr zw fylb.”

The hints on the page make it pretty clear that they expect you to solve it with python’s string.maketrans() and string.translate() functions, which do indeed make short work of the problem.

Erlang doesn’t have direct equivalents to those functions so we either have to find another way, or write our own.

Since we’re dealing with text now instead of math, it’s important to understand how Erlang deals with strings. Technically, Erlang doesn’t really have strings. It just has lists of numbers and double quotes are just syntactic sugar. So “hello” is really the same thing as [$h,$e,$l,$l,$o] or [104,101,108,108,111].

It’s kind of similar to how strings in C are really just character arrays and characters are really just 8-bit integers. But Erlang’s are a linked list of 8-byte numbers (yes, byte, not bit). Clearly some string operations in Erlang aren’t going to be as fast or space efficient as in C (but Erlang has libraries for dealing with byte arrays if you really want to go there), but it does let you manipulate strings as easily as any other list and lists are pretty much the core data structure of functional languages so that’s important. It also means that Erlang is sort of unicode compliant from the ground up. At least, it can store and manipulate utf8/16/32 strings without any extra care. Apparently Erlang’s libraries don’t always do the right thing with unicode though, so you do still need to be careful.

Since Erlang strings are just lists of numbers, shifting each letter of a string up two places is roughly just adding two to each number (but not exactly, as we’ll see). Time to write a function. My first attempt:

:::erlang
-module(pychallenge2).
-export([plus2/1]).

plus2 ([H|T]) -> [H + 2 | plus2(T)];
plus2 ([])    -> [].

Ah, pattern matching, how I’ve missed you.

If you’ve ever worked in a pattern matching functional language, the code above should be pretty obvious. The first two lines just declare the module that the function’s in and export the single function that’s defined in it. Then the two lines starting with ‘plus2’ define our function. The first of those lines does most of the work, matching a list argument, using ‘|’ to pull apart the head and tail of the list, then returning a list consisting of the head incremented by 2 concatenated with a plus2 called on the tail recursively. The second line just defines the base case so it terminates when it hits the end of the list.

Then we go to the shell and do:

:::erlang
> c(pychallenge2).
> pychallenge2:plus2("g fmnc wms bgblr rpylqjyrc gr zw fylb.").
"i\"hope\"you\"didnt\"tr{nsl{te\"it\"|y\"h{nd0"

This actually is good enough to get the answer and proceed to the next level, but obviously not perfect. The Erlang code works fine, but simply adding two to each letter isn’t really enough. The two problems are that we really only want it to add two to alphabetic characters; adding two to the spaces gives us quotes and adding two to the period at the end gives us ‘0’. The other issue is that the cipher really needs to wrap around at the end of the alphabet, so ‘y’ turns into ‘a’, and ‘z’ turns into ‘b’. Above you can see that ‘y’ turned into ‘{‘ because that’s how the ASCII character set is arranged.

To make our code a little smarter, We’ll define a helper function:

:::erlang
 trans (C) when C >= $a, C =< $x -> C + 2;
 trans (C) when C >= $y, C =< $z -> C - 24;
 trans (C) -> C.

trans/1 just shifts a single character. It makes use of Erlang’s guards to only shift the letters. The first line handles the case where the letter is between ‘a’ and ‘x’, just returning the letter plus 2. The second line handles the case of ‘y’ or ‘z’ and subtracts 24 to map it back down to the beginning of the alphabet. Those two probably could have been combined with a clever use of modular arithmetic, but I think this is nice and clear. The third line just returns the original letter and handles the default situation when neither of the above conditions match (ie, for punctuation, etc.)

Now our plus2/1 function (which isn’t entirely accurately named anymore) becomes:

:::erlang
 plus2 ([H|T]) -> [trans(H) | plus2(T)];
 plus2 ([]) -> [].

And we can go back to the shell and do:

:::erlang
 > c(pychallenge2).
 {ok,pychallenge2}.
 > pychallenge2:plus2("g fmnc wms bgblr rpylqjyrc gr zw fylb.").
 "i hope you didnt translate it by hand."

Which is much better.

After I’d already completed Level 2 and was working on a higher level, I remembered reading that Erlang had Haskell style list comprehensions as well. I’m a big fan of list comprehensions and put them all through my Python code, so naturally, I had to do a version of Level 2 using a list comprehension:

:::erlang
 decipher(Msg) -> [trans(C) || C <- Msg].

Much nicer. Do you see why I’m a fan of list comprehensions? Once your familiar with the basic syntax, I think it makes the function much easier to understand than even the recursive, pattern matching version.

Erlang’s list library also has a map/2 function that can be used to achieve an effect similar to the list comprehensions:

:::erlang
 decipher_map(Msg) -> lists:map(fun(C) -> trans(C) end, Msg).

map/2 takes a function and a list as arguments and applies the function to each element in the list and returns a list of the results.

(I have an anonymous function in there (defined with ‘fun’) because for some reason, this:

:::erlang
 decipher_map(Msg) -> lists:map(trans, Msg).

Didn’t work. I’m still not sure why I had to wrap trans/1 in an anonymous function to get lists:map/2 to work right.)

Using map/2 doesn’t really gain you much in terms of clarity over list comprehensions, but it does mean that you could easily switch it to pmap/2 and your code will run in parallel across multiple processors or machines in a distributed environment without any other changes, Let that one sink in for a minute. Yes, it should remind you of google’s MapReduce.

I have to admit that I actually burned a fair amount of time on this level because of one stupid little syntax issue that took me forever to figure out. If you look closely at the trans/1 function, you’ll notice that Erlang uses =< as “less than or equal to” rather than <= which I’m more accustomed to.

My first attempt at writing trans/1 was basically what is up here, but with <=. Erlang gave me a syntax error on that and I assumed that I had messed up the syntax for guards somehow, so I attempted a version with an if statement instead. I still got a syntax error there so I thought that maybe ‘,’ wasn’t the right way to combine conditions and tried things like ‘and’, ‘&&’, etc., none of which worked. Eventually I bit my tongue and wrote a big, ugly version of trans/1 with nested if statements. When I still was getting a syntax error, it finally occurred to me that <= might not be right so I dug through the docs some more and discovered my problem. I probably should’ve been able to figure that out more quickly but <= is just so ingrained in my brain that it never occurred to me that that might be my problem.

Other than that, this was a very fun level to solve. Pattern matching, recursion, guards, pulling apart lists into heads and tails, map, and list comprehensions are all things that just warm this programmer’s heart.

On to Level 3!