Asymmetrical View

Locating a bad expression in Emacs and Clojure

Kyle Burton — 2013-08-18T00:00:00-04:00

Locating a bad expression in Emacs and Clojure

I work frequently in Emacs and Clojure. Emacs has two extensions that let you do interactive development: tools.nrepl and swank-clojure.

I sometimes make the mistake of modifying many functions in a module before I attempt executing or compiling the entire file again and am met with an error like:

Can't have more than 1 variadic overload
  [Thrown class java.lang.Exception]

Finding the source of this in a large file full of functions can be difficult.

There is a trick you can use in Emacs to make this simpler though. By using a simple keyboard macro you can step through all of the expressions in a buffer and evaluate each one in turn until you find the expression with the error in it.

The steps are as follows:

Jump to the top of your buffer: C-M-<
Start recording a macro by typing: C-x (
Move forward a single sexpression by typing: C-M-f
Evaluate that expression by typing: C-x C-e
End the macro by typing: C-x )

If it was in the first expression then you’re done. If it’s not, then check the next one by typing C-x e, to continue checking you can then keep pressing just e until you’ve found the offending code.

Fixing A Broken Sudoers File on an Amazon's EC2

Kyle Burton — 2012-11-07T00:00:00-05:00

Fixing A Broken Sudoers File on an Amazon’s EC2

I was working on getting our CI server set up and wanted to grant Jenkins the ability to run a command as root. Pretty standard stuff really. I ran sudo visudo and discovered that it had dropped me into Nano. Being an editor snob, this was unacceptible. I immediately quit out of the weak little thing.

The next few moments were where I made my fatal mistake. I ran my own, personally preferred, more powerful, edtior (as root) on /etc/sudoers, made a few changes and exited my editor. I tested the changes and low-and behold I was met with the message:

"parse error in /etc/sudoers"

‘Ok’ I thought. I tried to edit /etc/sudoers again.

"parse error in /etc/sudoers"

Oh, er…well, now I can no longer use sudo. Ok, how about becoming root. Oh, this is an Ubuntu instance. root has no password that I can use. Can’t ssh into the box as root either.

Damn. I’m stuck.

The solutions I found on Google all talked about booting the machine from a Live CD, mounting the disk with the sudoers file and fixing it. This isn’t possible on EC2. What was possible though is attaching the root volume (an EBS instance) to another EC2 instance and then editing the file. After some reading and expirementation, I ended up doing the following:

1. I wrote down the following:

instance ID of the EC2 instance
a list of each mounted EBS volume and the device it is mounted as.

2. Shutdown (but do not termiante) the EC2 instance.
3. Detach the EBS volume that has the bad sudoers file.
4. Attach and then Mount the EBS volume on another EC2 instance.

Attaching the volume is done in the AWS Management Console
Once attached, I noted the device name (in my case sdh) and mounted it from within the running instance:
- sudo mkdir /mnt/other
- sudo mount /dev/xvdh /mnt/other

5. Edit the sudoers file, fixing the syntax error.
6. Unmount and then Detach the EBS volume from the alternate EC2 instance.

sudo umount /mnt/other

7. Re-attach the EBS volume to the original EC2 instance.
8. Start the original EC2 instance and confirm that the sudoer’s file is valid.

Once I had figured out these steps it took me about 30 minutes of time to go through the steps and I had a working instance again.

Conclusion

The moral of the story is not to repeat any of this, rather it is to always use visudo. Here’s how to do so using a custom editor:

 @EDITOR=my-editor sudo -E visudo@

Narcissism, or don't try this at home

Kyle Burton — 2012-06-30T00:00:00-04:00

Narcissism, or don’t try this at home

I’m a frequent reader of Hacker News and when I saw the post about HackerRank and challenge to be invited into the beta if you were on the leader board I had a hard time resisting.

I played a few games manually to make sure I understood the game. Since you can take only 1-5 candies, and to win you must force your opponent to leave you with less than 6, then at the last move you must leave your opponent with 6 candies. So at each move you must leave them with a number of candies divisible by 6, meaning my next move should always be: numCandies % 6

The top entries already had hundreds of games solved so there was no way I was going to get anywhere unless I automated the game playing.

I opened up Chrome’s developer tools and watched the network as I played a game. Things looked straightforward enough based on the POST and PUT requests used to start a game and update it (make the next move) and detection of a completed game was right in the response from performing your next move: {game: {... solved: true}}

I wanted to optimize for my time, not necessarily run-time. I thought of using curl (too much work), of using Ruby’s Mechanize, or Clojure, but all of these solutions would require authenticating to the site first. I wanted to start with automating the game so I opted to write up some javascript and paste it into the console so it could be called.

I started by writing a basic skeleton, and a startGame function to initiate a game, then I cut and pasted it into the javascript console so I could try it interactively.

Foo = (function () {
  var self = {autoContinue: true};

  self.startGame = function (numCandies) {
    $.ajax({
      type: 'POST',
      url: '/splash/challenge.json',
      data: {
        n: numCandies,
        remote: true,
        utf8: '✓'
      },
      success: self.playRound
    });
  };

  self.playRound = function (data) {
    console.log('playRound');
    console.dir(data);
  };

  return self;
}());

I used some bash commands to make the iteration go faster on my Mac:

$ vim hrank.js
# background vim, and press up:
$ pbcopy < hrank.js && fg

This let me make changes and paste them back into chrome with just a few keystrokes. Calling startGame indeed worked – I could see the response from the server that a new game was begun. From there I worked out the rest of the implementation. The only thing to really point out is autoContinue – I used that as a flag and also used a callback (self.continue) to make it easier to step through the process while I was debugging my code and could then set autoContinue to true when I thought it was ready to go.

Full Program

The full JavaScript module that I used is below.

Foo = (function () {
  var self = {autoContinue: true};

  self.startGame = function (numCandies) {
    $.ajax({
      type: 'POST',
      url: '/splash/challenge.json',
      data: {
        n: numCandies,
        remote: true,
        utf8: '✓'
      },
      success: self.playRound
    });
  };

  self.nextMove = function (current) {
    return current % 6;
  };

  self.playRound = function (data) {
    var nextMove;
    console.log('play round: ' + JSON.stringify(data));
    if (data.message) {
      console.log(data.message);
    }

    //if (data.exit) {
    //  console.log('Game is over!');
    //  return;
    //}

    if (data.game) {
      data = data.game;
    }

    if (data.solved) {
      console.log('Game is solved!');
      self.continue = null;
      return self.gameWon();
    }

    // n the starting numCandies
    // current the number of candies left
    // limit: not sure wha thtis is
    // moves array of moves taken?
    nextMove = self.nextMove(data.current);
    console.log('the next move is: ' + nextMove);
    if (nextMove < 1 ) {
      console.error('Whoops something went wrong, the next move is zero? ');
      console.dir(data);
      return;
    }

    self.continue = function () {
      $.ajax({
        type: 'PUT',
        url: '/splash/challenge.json',
        data: {
          move: nextMove,
        remote: true,
        utf8: '✓'
        },
        success: self.playRound
      });
    };


    if (self.autoContinue) {
      self.continue();
    }
  };

  self.playGames = function ( startingValue, numGames ) {
    var currentValue = startingValue - (startingValue % 6) + 1,
        numGames = numGames;

    self.gameWon = function () {
      if (numGames < 1) {
        console.log('games played, currentValue: ' + currentValue);
        return;
      }
      numGames = numGames - 1;
      currentValue += 6;
      console.log('playing next game[' + numGames + ']: ' + currentValue);
      return self.startGame(currentValue);
    };

    numGames = numGames - 1;
    console.log('playing next game[' + numGames + ']: ' + currentValue);
    return self.startGame(currentValue);
  };

  return self;
}());

Glitch?

I think that there is a glitch in the server-side code, during some of the games the current count of candies would regress (see in the log where current jumps from 734 to 870?), so I added a conditional to detect that and halt the automated play (see the check for nextMove < 1).

Setting it off and watching it run I got my ego boost and my name started climbing the board.

Conclusion

I strongly recommend against you taking and running this code as playing the game uses resources on the server side. After I had gotten up towards the top I stopped running the script. It also uses quite a bit of CPU on your machine when running inside Chrome and as the starting count gets larger the number of moves increases linearly so each subsequent game takes longer and longer to play (you don’t get credit for replaying the same game). Also, I’m not giving everything away, the maximum starting value at the time of this writing is 2048, which results in about 341 possible games – but if you look at the leaderboard there are players with higher scores.

I hope by the time you’re reading this Hacker Rank will have either come up wiht a new challenge, or prevented (hopefully) this method from being used to ‘game’ the game.

Look Ma, No Threads!

Kyle Burton — 2011-10-15T00:00:00-04:00

Look Ma, No Threads!

Java’s NIO, node.js, Netty, Tornado, Twisted, Perl’s POE…they all have in common one thing: non-blocking IO.

Node.js is based on V8. V8 is a JavaScript implementation. It comes form a place (the Browser) where blocking is anathema. It’s just not permitted. You can’t block the browser. Node.js’ community follows this to its core: thou shall not block. Ever. It permeates its culture and all the software the community accepts. If something blocks, well, sorry, but it’s going to be rejected by the community.

This is why the approach isn’t as strong in the cultures of Python, Java and even Perl. They’re all willing, in fact its the default, to accept blocking operations. They see threads where the Node community sees non-blocking.

epoll? kpoll? select? If your goal is to learn about non-blocking IO, you can start simpler than that. You can start with a basic C program interacting with stdin and stdout. Consider:

/* first.c */
#include 
#include 
#include 

char* int_to_binary_ascii( int x ) {
  static char buff[8*sizeof(int) + 1];
  unsigned int ii, num_bits = (8 * sizeof(unsigned int));
  memset(buff,'\0',sizeof(buff));
  for ( ii = 0; ii < num_bits; ii++ ) {
    buff[num_bits - ii - 1] = ( x & (1 << ii ) ) ? '1' : '0';
  }
  return buff;
}

/* Use a terminal escape to clear the line...*/
void clear_line () {
  printf("%c[2K\r", 27);
}

int main (void) {
  long ii;
  for ( ii=0;; ++ii ) {
    clear_line();
    printf("% 10ld: %s",ii, int_to_binary_ascii(ii));
    fflush(stdout);
  }
  return 0;
}

This program is a tight (infinite) loop printing out a counter as an integer and the counter represented as a string of ones and zeros. It’s mesmerizing to watch the ones march ever more slowly to the left isn’t it? If you run it, you should see output something like:

   2622411: 00000000001010000000001111001011

It will print continuously to the same line, clearing it and printing again. To run this program, lets us a shell script, make a file called run.sh:

set -e
set -u
set -x

trap "stty sane" INT TERM EXIT

F="$1"

if [ -e $F ]; then
  SRC=$F
  F=$(basename $F .c)
else
  SRC=$F.c
fi

function compile () {
  gcc -O2 -Werror -Wall -o $2 $1
}

compile $SRC $F
stty raw
./$F
rm $F

This script does an important thing that we’ll need in a moment. It puts the terminal into raw mode. This will allow unfiltered (un-cooked or raw) keyboard input to be passed directly to the program. In sane mode, the shell holds your input until you press enter. We don’t want that, we want the characters you type to go immediately to our program as you type them. We have to use raw mode to make this happen. When you use this script to run our program, you won’t be able to stop it with CTRL+C. You’ll have to go to another terminal, find out it’s pid and kill it. You can do this with something like:

kill $(ps aux | grep ./[f]irst | awk '{print $2}')

The reason you won’t be able to use CTRL+C any longer is because of that bit about ‘raw’ mode. In raw mode, the shell allows keyboard input (all of it) to go directly to the program. In ‘sane’, or normal, mode, keyboard input like CTRL+C or CTRL+Z is handled by the shell. In the case of CTRL+C, the shell will send a signal to the foreground process: a SIGINT. For most programs, like ours, a SIGINT will kill the process.

The shell script compiles the program, puts the terminal into raw mode, runs the program and then ensures the terminal is returned to ‘sane’ mode with the ‘trap EXIT’.

The program gives us no way to tell it to exit short of killing it. It would be nice if we could ask it to stop. We could stop it by waiting for some input:

/* second.c */
#include 
#include 
#include 
#include 

char* int_to_binary_ascii( int x ) {
  static char buff[8*sizeof(int) + 1];
  unsigned int ii, num_bits = (8 * sizeof(unsigned int));
  memset(buff,'\0',sizeof(buff));
  for ( ii = 0; ii < num_bits; ii++ ) {
    buff[num_bits - ii - 1] = ( x & (1 << ii ) ) ? '1' : '0';
  }
  return buff;
}

/* Use a terminal escape to clear the line...*/
void clear_line () {
  printf("%c[2K\r", 27);
}

int main (void) {
  long ii;
  char c;
  for ( ii=0;; ++ii ) {
    clear_line();
    printf("% 10ld: %s",ii, int_to_binary_ascii(ii));
    fflush(stdout);
    read(0,&c, sizeof(char));
    if ( 'q' == c ) {
      break;
    }
  }
  return 0;
}

This works, kinda. It certainly exits when we press ‘q’, but the program is blocked awaiting our input. We can keep pressing keys, but it doesn’t run without us. How can we get it to run independently of waiting for input, but react if input is present? You put stdin in non-blocking mode:

/* third.c */
#include 
#include 
#include 
#include 
#include 

char* int_to_binary_ascii( int x ) {
  static char buff[8*sizeof(int) + 1];
  unsigned int ii, num_bits = (8 * sizeof(unsigned int));
  memset(buff,'\0',sizeof(buff));
  for ( ii = 0; ii < num_bits; ii++ ) {
    buff[num_bits - ii - 1] = ( x & (1 << ii ) ) ? '1' : '0';
  }
  return buff;
}

/* Use a terminal escape to clear the line...*/
void clear_line () {
  printf("%c[2K\r", 27);
}

int main (void) {
  long ii;
  char c;
  int flags = fcntl(0,F_GETFL,0);
  flags |= O_NONBLOCK;
  fcntl(0,F_SETFL,flags);

  for ( ii=0;; ++ii ) {
    clear_line();
    printf("% 10ld: %s",ii, int_to_binary_ascii(ii));
    if ( 1 == read(0,&c, sizeof(char)) ) {
      printf("\r\n");
      printf("You pressed: '%c'\r\n",c);
      fflush(stdout);
      if ( 'q' == c ) {
        break;
      }
    }
    fflush(stdout);
  }
  return 0;
}

The key lines are where you see fcntl, where we get the current set of flags for stdin (file descriptor 0), ensure O_NONBLOCK is set in the flags and then set the newly configured flags for stdin. When you run the program and type a few characters into it, you should see something like:

> bash run.sh  third.c
+ trap 'stty sane' INT TERM EXIT
+ F=third.c
+ '[' -e third.c ']'
+ SRC=third.c
++ basename third.c .c
+ F=third
+ compile third.c third
+ gcc -O2 -Werror -Wall -o third third.c
+ stty raw
+ ./third
    343404: 00000000000001010011110101101100
You pressed: 'a'
    390059: 00000000000001011111001110101011
You pressed: 'b'
    416735: 00000000000001100101101111011111
You pressed: 'c'
    466926: 00000000000001110001111111101110
You pressed: 'd'
    490793: 00000000000001110111110100101001
You pressed: 'e'
    529303: 00000000000010000001001110010111
You pressed: 'f'

That change allows us to call read – and now read won’t block. In the cases where read is called and there is no input, it returns -1. If you read the man page for read you’ll see that it also sets errno, and if we print errno:

/* fourth.c */
#include 
#include 
#include 
#include 
#include 
#include 

char* int_to_binary_ascii( int x ) {
  static char buff[8*sizeof(int) + 1];
  unsigned int ii, num_bits = (8 * sizeof(unsigned int));
  memset(buff,'\0',sizeof(buff));
  for ( ii = 0; ii < num_bits; ii++ ) {
    buff[num_bits - ii - 1] = ( x & (1 << ii ) ) ? '1' : '0';
  }
  return buff;
}

/* Use a terminal escape to clear the line...*/
void clear_line () {
  printf("%c[2K\r", 27);
}

int main (void) {
  long ii;
  char c;
  int flags = fcntl(0,F_GETFL,0);
  flags |= O_NONBLOCK;
  fcntl(0,F_SETFL,flags);

  for ( ii=0;; ++ii ) {
    clear_line();
    printf("% 10ld: %s errno:%d",ii, int_to_binary_ascii(ii), errno);
    if ( 1 == read(0,&c, sizeof(char)) ) {
      printf("\r\n");
      printf("You pressed: '%c'\r\n",c);
      fflush(stdout);
      if ( 'q' == c ) {
        break;
      }
    }
    fflush(stdout);
  }
  return 0;
}

On my Mac, it prints 35. If I look in /usr/include/sys/errno.h I see:

#define EAGAIN          35              /* Resource temporarily unavailable */
#define EWOULDBLOCK     EAGAIN          /* Operation would block */

Which can be taken to mean it would block, if it weren’t in non-blocking mode that is :)

Conclusion

These same techniques can be used on sockets as well as file handles. This technique is the foundation all of these non-blocking frameworks are built on under the hood. If you’re interested in learning more, the next function to study is select (or poll, epoll, or kpoll), which allows you to quickly determine which, of a set of, file handle is has input ready and can be read from, or can be written to, without blocking.

Kyle

Cucumber, Gherkin and Multi-line arguments

Kyle Burton — 2011-06-02T00:00:00-04:00

Cucumber, Gherkin and Multi-line arguments

Gherkin is the language of the Cucumber testing tool uses for describing feature tests.

Writing a test today at $WORK, we needed to use a multi-line string and had some trouble finding an example in Google…

I am using Cucumber to test my Twilio based IVR and SMS applications. You can find an example as part of my Twilio in Ten Minutes repo, one of which I’ve included here:

The Scenario “I register my card in one step and sign up for all services.” shows how you embed a multi-line string into Gherkin – you surround your text with enclosing sets of triple double quotes.

I implemented two steps in my testing framework: the first supports a double quoted, single-line string. The second supports the multi-line string:

Note that the matcher for the multi-line string doesn’t include a capture group. The multi-line string will end up being passed in anyway, as Gherkin recognizes the triple set of double quotes as the bounding delimiters for the multi-line string and passes it in.

The multi-line string will be passed to your step definition as a single string, which you can use as-is or split apart as I’m doing in my step definition. Note that it will include any whitespace indentation present in your feature definition.

This will now make for an easy reminder for myself and hopefully it’ll show up the next time someone else searches Google for how to use multiline strings.

Kyle Burton, 06 Jun 2011 – Philadelphia PA

Simple Process Coordination with Tellmewhen

Kyle Burton — 2011-04-26T00:00:00-04:00

Simple Process Coordination with Tellmewhen

I subscribe to the Unix is my IDE philosophy, mixing and matching the gnu tools, Leiningen, and lots of other tools. I frequently go to bash, Ruby and Rake in order to tie or glue many of those tools together in ad-hoc combinations to get tasks done or create small automations.

I often want is to be notified when a task completes, or to trigger an action after another task completes. bash has some features that allow you to background a process and await the completion of that process, like wait, but wait doesn’t fit all of my use cases.

I recently checked out Phil Hagelberg’s Slamhound dependency analysis and cleanup tool for Clojure to troubleshoot an issue I was having. Not knowing much about Leiningen plug-ins and how to effectively do interactive development with them, I started with the classical development cycle: edit the code (add print statements, make some tweaks); compile and install the code; and run the plug-in.

The disconnect for me was that my test case was in another project wasn’t part of Slamhound itself, so I was doing the build and install of Slamhound then switching to another tab and executing my test.

I created Tellmewhen for exactly this kind of use case. You can used it to implement a lightweight form of IPC using basic bash commands and files.

I stacked together the execution of Leiningen to build the software and then touch a trigger file in my $HOME directory that my other process could then wait on:

user@host ~/projects/slamhound $ lein deps && lein install && touch ~/x

Meanwhile…in another terminal, I used tellmewhen to await the update to the file ~/x and then execute my test:

user@host ~/personal/projects/impresario $ tellmewhen -m ~/x ; lein deps && lein slamhound src/impresario/core.clj

I could run this first (or just after I started the slamhound build) and it would (almost) immediately kick off when the slamhound install successfully completes.

This was a lightweight way for me to script and coordinate between these activities. Automating it one step further was as easy as wrapping a while loop around the test case:

user@host ~/personal/projects/impresario $ while :; do \
  tellmewhen -m ~/x ; \
  lein deps && lein slamhound src/impresario/core.clj; \
done

Then my test case will execute, and await another build + install of slamhound from the other terminal.

The Unix philosophy focuses on modularity – small, re-usable, composable components. I see parallels to the composability of Functional Programming constructs as well. tellmewhen is becoming a tool I’m using more frequently, mixing it into my other tool-set. I hope you find it useful too.

Kyle Burton, 04 Apr 2011 – Philadelphia PA

An Interactive Voice Response System in 10 Minutes

Kyle Burton — 2011-02-20T00:00:00-05:00

An Interactive Voice Response System in 10 Minutes

In this post I’m going to walk you through deploying a telephony application. If you’ve ever called an automated system, listened to a menu and then pressed 1 to continue then you’ve interacted with a telephony application.

By following this guide you’ll deploy an application that you can call from your own phone that will read a random quote to you, allow you to repeat repeat the quote (by pressing 1) then disconnect (by pressing 2).

I must make many assumptions about you, the reader of this guide, in order to achieve the goal set by the title of this guide. If you meet my assumptions then I hope that you’ll be able to get your first applicaiton up and running in a very, very short time.

If you’re a developer and you have a Ruby on Rails environment set up, you should be good to go.

If you don’t, don’t worry. I will provide more details about what to do in subsequent posts. So if you can’t get your app deployed in 10 min, don’t despair! You’re probably still only a few steps away, look for more details in future posts. Failing that, drop me (Kyle Burton) a line and I’ll help you work through things.

Anxious to pick up your phone and call your application? Then lets get ’er done!

Deploing your First Application

Get Your App Deployed and Call it!

We’re going to make and deploy an application that you can call with your phone and it will speak to you. All the parts string together something like this:

1. Sign up for Heroku

Heroku is a PaaS (Platform as a Service) provider. They graciously allow you to create an deploy an application for free (as long as it doesn’t take up many of their resources – and what you’re about to do fits that bill).

Click Here to head to their home page and start the process.

You should then be shown your ‘applications page’.

Grab The Source Code For This Guide

You will need Git installed and have a connection to the Internet.

The URL to the git repository is: https://github.com/kyleburton/twilio-in-ten-minutes.git

Install the bundler gem

Run `bundle install`

Run `heroku create`

When prompted, enter the credentials you created when you signed up in the earlier step:

Note your application’s url, this will be vital for when you sign up for Twilio. For my example application, as seen in the screenshot above, it is http://warm-wind-609.heroku.com/. Yours will be different.

Heroku automatically added your application’s git remote to your local git repository.

Push the application code to heroku:

Open the App in your Browser

This is to ensure it’s working. Enter the url for your Heroku instance, make sure you append on ‘/quote’ so it routes to the right controller. You should see a random quote:

Sign up for the Twilio free trial.

Click Here to head to their main page.

Configure Twilio’s Voice URL

Quick! Call Your Application!

You’ve now got an application deployed on Heroku, and have integrated it with Twilio. When you call The number at Twilio, it will make a call to your applicaiton at Heroku, which will tell it what to say.

Grab your phone and call your app. Call the phone number that was on your Twilio account page, enter the PIN (from the same page) and you should hear your very own application read a random quotation to you.

Conclusion

Hopefuly there were no kinks and you have your first application deployed. In the next post I’ll walk through how to make changes to the application.

Kyle Burton, 20 Feb 2011 – Philadelphia PA

Upcoming Talk: Large Data and Clojure

Kyle Burton — 2010-10-19T00:00:00-04:00

Upcoming Talk: Large Data and Clojure

I’ll be giving my Large Data and Clojure: the middle ground between RAM and EC2 talk at Philly Lambda on November 22nd. The RSVP Link is here.

This will be an updated verison of the talk I gave at Barcamp Philly 2010.

Sampling a Sequence with Clojure

Kyle Burton — 2010-10-19T00:00:00-04:00

Sampling a Sequence with Clojure

We needed to sample a data set that had around 392 million entries in it. The first thing we thought of was using the database. The second thing we thought of was by creating an array of the record IDs (or line numbers), shuffling the array and then selecting those records that matched (via the line number or record id).

SQL

SQL Databases do support selecting random samples of records. Looking at SQL to Select a random row from a database table, most use a variation of the following:

We asked PostgreSQL how expensive this would be on our data set and…it reported a cost in the hundreds of millions. We weren’t too confident that this would execute in any reasonable time frame based on my (limited) understanding of what was required: to generate a random value for each row and then sort the entire set of random values before the sample set could be taken (which would admittedly be fast – given the sorting has completed). Testing on a million rows, the database returned results after some time – thinking of having to scale that up by 2 orders of magnitude resulted in a lot of pessimism.

We started a query using this approach anyway, and then we started discussing and googling for other approaches.

Shuffle

If the database was going to be slow at it, would it be any faster if we took the same approach by hand? Could we do it in memory in the JVM?

With a 2048M Heap, we tried allocating a long array with enough space to hold 392 million and it resulted in an OOM. Going up to a 4096M heap and down to an array of ints, we were able to allocate an array and then populate it with the sequential line numbers in Clojure took about 80 seconds on the box we were working with.

There is a Clojure function shuffle which under the hood defers to java.util.Collections/shuffle, an optimized shuffle that is supplied with the JRE. This version of shuffle would have required us to use non-native arrays and use the wrapper classes – at these sizes, it would seem, a non-starter.

A pure Java implementation of the array creation and sorting (Fisher Yates / Knuth Shuffle) was pretty easy to create:

And it ran pretty quickly as well at just over a minute (78s). That still left us with having to now use the result of that operation as a set in order to filter the records from the original data set.

We kept wondering if there wasn’t a way to do it in a single pass, so we kept googling. Then we ran across an article that talked about an alternate approach that would allow the set to sampled while it was streamed in a single run: How to pick a random sample from a list

Based on that article we were able to craft an implementation in Clojure that could operate on any other sequence, randomly selecting records from a given prefix of the sequence to hit a target sample size:

We ran it on our 19 Gigabyte file (the 390 million record file) and it ran in under ten minutes! This didn’t feel much longer than it took to stream the file through Java’s IO. We ended up implementing make-periodic-invoker as a ‘throbber’ so we’d have a progress update – thanks to the way this approach works, it could estimate the amount completed (and could have estimated the total and time remaining).

The algorithm isn’t guaranteed to produce the target sample set, though it is very likely to. As it gets closer to the number of items you wished the sample to be taken from (the population-size), it becomes more and more likely to select elements – finally reaching a probability of 1 at the last element of the population if it hasn’t completed the set at that point.

We did a quick analysis of the distribution of the values and things looked great. We’re happy to have the new technique and the new library code, you can leverage it as part of the clj-etl-utils project.

Kyle Burton, 10 Jun 2010 – Philadelphia PA

Clojure and Large Result Sets

Kyle Burton — 2010-10-14T00:00:00-04:00

Clojure and Large Result Sets

I’ve been working with the very useful clojure.contrib.sql package. One wrinkle I’ve encountered, in an otherwise lazy language, is that the SQL connections typically are not lazy. By default they load the entire result set at once – even if the java.sql.ResultSet is lazily accessed. This isn’t an issue with clojure.contrib.sql inasmuch as it is with how the JDBC classes are being used.

For many use cases, fetching a handful of rows, a user profile, etc., this a perfectly reasonable default. When you’re working with result sets that are larger than will fit into memory this default behavior isn’t so desirable.

First Attempt: `LIMIT` using an Offset

I like the interface that the contrib library offers – a lazy sequence of the result set. Since PostgreSQL supports ranges as part of it’s SQL (via LIMIT) it would seem that you could just write up a quick chunking function that pulls ranges of records from the table using a lazy approach, while still presenting things as a single stream of records from a single result set.

The first function, get-record-range, simply takes a table name a starting offset and a record count. It executes a select for the given range and offset. The second function uses the first to create a lazy sequence, generating the next block of records when the previous block was exhausted. It includes a check to ensure it terminates when there are no more records.

The only piece missing from the above is our db/exec-sql, which is just a convenience wrapper around clojure.contrib.sql/with-query-results that forces the lazy record sequence (this was not the source of our memory troubles). db/exec-sql is something I often use to quickly grab the entirety of a result set.

The problem with this approach though, is that using a SELECT * and a LIMIT without an ORDER BY clause doesn’t guarantee the ordering of the records. The database (for an example, see PostgreSQL) is free to return rows in whatever order it pleases. The impact of this is that the rows you get back will not be consistent – you are not guaranteed to get all of the rows through this approach, nor are you guaranteed to not get duplicate rows! For a table where we tested this approach, with approximately 600k records, we got about 400k unique rows back.

Adding an ORDER BY id (since most of our tables contain an id column) resolves these issues, only to incur a large performance penalty while the database computes an ordered list of ids so that it can return the requested block of records.

Back to the drawing board: Try Database Cursors

Databases have long had a way of supporting this use case via Cursors. JDBC, and specifically the ResultSet interface, was designed to support them.

It turns out that the default settings on the java.sql.Connection configured by clojure.contrib.sql are not set to use a Cursor. According to ‘PostgreSQL JDBC Driver: Getting results based on a cursor’ there are several settings that need to be in place for the result set to use a cursor:

Auto Commit must be turned off
The java.sql.ResultSet must be created with a ResultSet type of: ResultSet.TYPE_FORWARD_ONLY
The Query must be a single statement, meaning it must not contain semi-colons joining together multiple statements.
You must use a java.sql.Statement and set the FetchSize on the statement to a non-zero value.

It turns out that clojure.contrib.sql has a handy form that will turn off AutoCommit and ensure it will be set back to its previous value so you don’t put the Connection into an inconsistent state (and so that you join any transactions that are in progress). That form is transaction, and trying it out:

The output shows that the AutoCommit is indeed off, but we’ll still need to set a FetchSize ourselves.

The default ResultSet type is ResultSet.TYPE_FORWARD_ONLY, which is what we needed. Adding in @setFetchSize is straightforward, so, wrapping it all up into a helper function (following the conventions in the sql package) I end up with:

This brings me back to my goal: being able to wield very large result sets.

Kyle Burton, 14 Oct 2010 – Philadelphia PA

How We Deploy Our Clojure Services

Kyle Burton — 2010-08-26T00:00:00-04:00

How We Deploy Our Clojure Services

We’re using Clojure as a core development technology at my company. We have a basic web service implemented in Clojure running inside Jetty. I’m not using a full J2EE container, I just wanted a fairly simple strategy for running the Jetty based service. We’re already using Chef for provisioning our servers, for installing base dependencies, so that’s what I started with.

The main components are:

chef recipe to install the foundation
init script for controlling the service
ruby based, capistrano-esque deployer.
maven based project using the clojure-maven-plugin

Chef

recipes/default.rb

The first part of our deployment is a Chef recipe to install our base components for the service. We set up an unprivileged user for the service to be run as, set up /var/lib and /var/log directories to act as its install target and for its log files, set the correct permissions and install the init and start scripts for the service.

templates/default/the-clj-serviced.init

The init script is a standard LSB style init script that I copied from another daemon already in /etc/init.d and then trimmed down and modified for our needs. The init script manages the running process via its pid (process id), which it expects to be stored in the file identified by $PIDFILE. It uses the-clj-service.sh to start the service – stopping the service is done by sending a signal to the PID of the JVM process. I’d like to introduce a more graceful shutdown procedure by asking our service to exit, this serves our needs for now.

/var/lib/the-clj-service/the-clj-service.sh

The JVM isn’t really able to fork and background itself. The start-stop-daemon utility can do this on behalf of the daemon, but according to its man page, that’s not recommended. This shell script represents that bridge between the init script controlling the daemon and the actual daemon, the JVM. It provides a simple runner that forks itself, writes out the child’s PID and then exits. This is one spot I’m not yet satisfied with: there is a hole here, where if the Jetty application fails to start or crashes, the init script will think it started when it in fact did not.

deploy.rb

Finally we have our deployer, modeled after Ruby’s capistrano, for pushing the application up to the boxes where we’re going to run the service. I’ve left out the ServiceUtils class because in our environment it contains some system configuration information, what you see below contains all the actual steps sans the configuration and credentials information for those hosts. install_path matches the location you saw above in the init and service scripts.

The deployer builds the service using maven, using the assembly plugin to build a stand-alone Jar file that contains our application along with all of its dependencies. This produces a larger artifact than if it just contained our application code, but has some advantages: it freezes the dependencies into 1 archive; it is easily relocatable; we don’t have to perform a build or manage dependencies on the systems where we run the application.

As part of the deploy process we also create a few configuration files: a logging configuration, which for now we just copy the file appropriate for the environment from our project’s; a database configuration file and a service configuration file. The service properties file is built dynamically and is available via a special api call that allows us to monitor and validate that the version of the service that we think is installed is installed on a given server.

I’d love to hear how others are deploying their Clojure applications and about what we could improve. I’m happy with this as a starting point, but will definitely focus on improving it as we move forward.

Kyle Burton, 26 Aug 2010 – Radnor PA

Automating Capistrano Password Prompts with Expect

Kyle Burton — 2010-07-06T00:00:00-04:00

Automating Capistrano Password Prompts with Expect

I just started using Capistrano for deploying my Rails applications (like Snapclean.me"). I also just started using capistrano_rsync_with_remote_cache to help push releases out faster than the :copy deploy strategy.

I’m very happy with how much faster it is than the a :copy, but I’m impatient and having to provide the password more than once per invocation is frustrating to me. I know the old Unix standby Expect can do this easily, my only problem is I don’t remember Tcl very well and every time I’ve done this I forget how / what I did. This post is a write up so I know where to come back to the next time I need to do this (and I know I’ll run into it again).

Spawn and Expect

Conceptually the basic use of Expect is very straight forward. You execute another program and you can register handlers that will be invoked when the program emits specific output. In the case of executing capistrano, and it executing other commands, I’m looking for anything that looks like a password prompt. When a password prompt is emitted I want to send the password.

The script below does exactly that, first asking the user for the password (disabling terminal echo) before spawning the capistrano command itself.

#!/usr/bin/env expect
stty -echo
send_user -- "Login Password: "
expect_user -re "(.*)\n"
send_user "\n"
stty echo
set the_password $expect_out(1,string)

spawn cap deploy
expect {
  -re " *\[Pp\]assword: *" {
    send "$the_password\n"
    exp_continue
  }
}

I put the above script in a file called wrap-cap, and did a chmod 755 ./wrap-cap on the file.

The exp_continue is used to reset (or loop) expect so that it will continue to look for password prompts and provide the password as many times as it is requested – since there are various commands that may ask for it (rsync, ssh, sudo, etc.) and some of them are optional (especially sudo), this ‘looping’ behavior is very handy.

Expect is a great Unix tool for developers and system administrators to have in their toolboxes.

Kyle Burton, 6 Jul 2010 – Philadelphia PA

New Clojure Libraries: Bloom Filter and LFSR

Kyle Burton — 2010-07-01T00:00:00-04:00

New Clojure Libraries: Bloom Filter and LFSR

I created two new clojure libraries as part of my continued study of all things computing related. I was introduced to the Bloom Filter through Hacker News and to the Linear Feedback Shift Register [LFSR] through Toby DiPasquale. It turns out that each of these will have practical application for me in the near future. They are the kind of thing I wish I had learned about sooner.

Bloom Filter

You can download clj-bloom from Gitub.com/kyleburton/clj-bloom The filter is “a probabilistic data structure for testing set membership”, they “sacrifice determinism in favor of significantly lower memory usage”. They make this trade off at the cost of false positives – but you can tune the filter’s false positive probability. They are useful in situations when you need to test for membership in a very large set of data and can’t (or don’t want to) hold the set members themselves in memory. A common use case is with a large corpus of documents – being able to quickly check if a new document is not in the set, storing it if it’s not, or pulling the existing copy and comparing it if you do. In this type of use, a false positive has little impact as you’d pull the document and compare it anyway. You can save the cost of making the initial query by using the bloom filter.

Linear Feedback Shift Register [LFSR]

LFSRs are related to Pseudo Random Number Generators [PRNGs]. There is a very interesting special class of LFSRs that have maximal period length – these LFSRs can be used as binary numbering systems since they iterate through all (2^n) – 1 possible bit combinations. Unlike counting up from 1 in binary (0, 1, 10, 11, 100, 101, 110, 111, 1000, …) they are much less deterministic looking in how they iterate through the values. I am currently looking at using them in ID generation, so that the IDs generated are not adjacent numeric values, eg: look pseudo-random, but are still guaranteed to be unique. LFSRs are also easily serializable, which makes it possible for you to use them as a kind of pseudo random unique sequence. You can download the code from Gitub.com/kyleburton/clj-lfsr

I found these two constructs fascinating, I hope you find the implementations useful.

Kyle Burton, 1 Jul 2010 – Philadelphia PA

How We Run Cucumber

Kyle Burton — 2010-06-10T00:00:00-04:00

How We Run Cucumber

Cucumber is a wonderful behavior driven development tool by Aslak Hellesøy. There is a getting started guide in the wiki at github that describes how to get it set up and running.

We’ve written a wrapper around Cucumber to help make it easier to run along with our application and to make it less intrusive when it runs the browser (at least under Linux).

Server Execution

Most of our test driven development doesn’t require script/server to be running the web application, and we want it to run under either the cucumber or test rails environment when it does. Stopping (if it’s running) the development server, starting it and then shutting it down again when the tests are done was a manual process that we automated.

Xnest

When we run Cucumber under X Windows, it often grabs focus and keeps us from doing anything else with the desktop while the tests are running. This is annoying to say the least. Thankfully X11 has a utility called Xnest which allows you to run a window which is a contained X desktop. Running this and then telling the browser to run inside it keeps it out of our way and allows us to keep using the machine without being constantly interrupted by the browser constantly jumping to the foreground.

`script/server` Arguments

To pass arguments (such as the port, or other options) through to script/server, put a -- between the arguments to script/cucumber-runner and they will be passed on verbatim to script/server. In this example, -p 8080 will be passed to script/server instructing it to use port 8080.

$ script/cucumber-runner -g 1024x768 -- -p 8080

`cucumber-runner`

#!/usr/bin/env ruby
require 'optparse'

# Controller for running Cucumber feature tests on your Rails application.
# Supports stopping/starting WEBRick in cucumber or test rails environment and
# (on supporting systems, eg: Linux) running the browser under Xnest to keep it
# away from your other desktop activities.
#
#
# Author::     Kyle Burton 
# Copyright::  Copyright (c) 2010 Kyle Burton
# License::    Distributes under the same terms as Ruby

#
# Command line runner.
#
class CucumberRunner

  # location of the rails scripts
  SCRIPT_ROOT = File.dirname(__FILE__)

  # Set up defaults, can be overridden with command line parameters.
  def initialize
    @opts = {
      :env             => 'cucumber',
      :geometry        => '1280x800',
      :display         => ENV['DISPLAY'],
      :xnest_display   => ':2',
      :control_webrick => true,
      :try_use_xnest   => true
    }
  end

  # Determine if Xnest is present, and if it should be used (Not on OS X).
  def use_xnest?
    return false unless @opts[:try_use_xnest]
    on_darwin = (`uname` =~ /Darwin/)
    has_xnest = !(`which Xnest`.empty?)
    !on_darwin && has_xnest
  end

  # Start Xnest in the background, save off the pid so it can be stopped later.
  def start_xnest
    @opts[:display] = @opts[:xnest_display]
    cmd = "Xnest #{@opts[:xnest_display]} & echo $!"
    @xnest_pid = `#{cmd}`.to_i
  end

  # Stop Xnest with a SIGTERM.
  def stop_xnest
    Process.kill("TERM", @xnest_pid) if @xnest_pid
  end

  # Stop the WEBRick server by looking for it in the process tree.
  def stop_webrick
    res = `ps aux | grep [r]uby | grep [s]cript/server`
    pid = res.split[1]
    return unless pid
    puts "#{$0} STOPPING WEBRick pid=#{pid}"
    Process.kill "KILL", pid.to_i
  end

  # Start WEBRick with the configured rails environment as a daemon (background).
  def start_webrick
    cmd = "#{SCRIPT_ROOT}/server -e #{@opts[:env]} -d"
    puts "#{$0} Starting WEBRick: #{cmd}"
    system cmd
  end

  # Process any command line arguments.
  def parse_opts
    OptionParser.new do |opts|
      opts.banner = "Usage: #{$0} [options]"

      opts.on("-h", "--help") do
        puts <<-EOH
       #{$0} [[opts]]

        -e ENV  --envrionment ENV   Rails environment to run under default=#{@opts[:env]}
        -g GEO  --geometry GEO      Xnest geometry to use default=#{@opts[:geometry]}
        -X      --no-xnest          Do not use Xnest (even if present)
        -d DSP  --display DSP       DISPLAY to use default=#{@opts[:display]}
        -x DSP  --xnest-display DSP Xnest's DISPLAY to use default=#{@opts[:xnest_display]}
        -W      --no-webrick        Don't start WEBRick (assume running) default=false
        EOH
        exit 0
      end

      opts.on("-e", "--envrionment ENV") do |env|
        @opts[:env] = env
      end

      opts.on("-g", "--geometry GEO") do |geo|
        @opts[:geometry] = geo
      end

      opts.on("-X", "--no-xnest") do
        options[:use_xnest] = false
      end

      opts.on("-d", "--display DSP") do |dsp|
        options[:display] = dsp
      end

      opts.on("-x", "--xnest-display DISP") do |disp|
        options[:xnest_display] = disp
      end

      opts.on("-W", "--no-webrick") do
        options[:control_webrick] = false
      end
    end.parse!
  end

  # Process options, start Xnest, [re]start WEBRick, execute the cucumber
  # suite, then stop Xnest and stop WEBrick
  def run
    parse_opts
    start_xnest if use_xnest?

    if @opts[:control_webrick]
      stop_webrick
      start_webrick
    end

    cmd = "#{SCRIPT_ROOT}/runner #{SCRIPT_ROOT}/cucumber #{ARGV}"
    ENV['DISPLAY']   = @opts[:display]
    ENV['RAILS_ENV'] = @opts[:env]
    puts "#{$0} using RAILS_ENV=#{ENV['RAILS_ENV']}"
    res = system cmd

  ensure
    stop_webrick  if @opts[:control_webrick]
    stop_xnest    if @xnest_pid
  end

  def self.main
    self.new.run
  end
end

CucumberRunner.main

You can download or check out cucumber-runner from my github cucumber-example project.

Running Cucumber with this wrapper helps make it easier and less intrusive to use. If anyone has suggestions for how to make it less intrusive while running under OS X, please email me, I’d love to hear your suggestions.

Kyle Burton, 10 Jun 2010 – Philadelphia PA

Creating Standalone Java Applications with Leiningen

Kyle Burton — 2010-06-08T00:00:00-04:00

Creating Standalone Java Applications with Leiningen

Leiningen is a simpler build tool for Clojure. Previously I covered a few basic aspects including how to run the HEAD version of Leiningen. Leiningen can also build a standalone jars in the same way the maven assembly plug-in does. There are a few steps you need to follow, I’ll walk you through them here. This example is available in my sandbox.

`project.clj`

For the example, I started by creating a new project (for testing a csv parsing library) by running lein new cljcsv. I added clojure-csv as a dependency, set cljcsv.core to be compiled, by specifying the name-space with the :aot parameter and specified cljcsv.core to be the main class invoked for the jar file by specifying it with :main.

(defproject cljcsv "1.0.0-SNAPSHOT"
  :description "CSV Example."
  :dependencies
    [[org.clojure/clojure "1.1.0"]
     [org.clojure/clojure-contrib "1.1.0"]
     [clojure-csv/clojure-csv "1.0.0"]]
  :aot  [cljcsv.core]
  :main cljcsv.core)

`src/cljcsv/core.clj`

To have a Clojure program produce a class file with a main function, you need to ensure you have :gen-class specified in the name-space for your program and that you define a -main function to act as the main.

(ns cljcsv.core
  (:require [com.davidsantiago.csv :as cdc]
            [clojure.contrib.pprint :as pp])
  (:gen-class))

(defn parse-cdc [file]
 (let [rs (cdc/parse-csv (slurp file))]
  (prn (str "rs=" rs))
  (prn (pp/cl-format nil "cdc: rows: ~a~&" (count rs)))
  (prn (pp/cl-format nil "cdc: ~a~&" rs))))

(defn -main [& args]
  (prn (format "args=%s" args))
  (if (not (empty? args))
    (do
      (parse-cdc (first args)))))

Then to build and run the jar:

$ lein compile
$ lein uberjar
$ java -jar cljcsv-standalone.jar input.csv

Leiningen makes creating and building Clojure projects quick and easy. It makes creating an executable jar for your project, which may have many dependencies, about as easy as you can get.

Kyle Burton, 08 Jun 2010 – Philadelphia PA

Special Thanks

Special thanks to technomancy, for creating Leiningen.

Leiningen

Kyle Burton — 2010-06-03T00:00:00-04:00

Leiningen

Leiningen is a simpler build tool for Clojure. It is easier to get started with than Maven (though you can do it, as more than one post shows), as well as ant.

Leiningen is, as maven is, rooted in convention over configuration to reduce the complexity of its build configuration. Leiningen expects your Clojure source code to be in /src and your tests to be in /test. These can be overridden with :source-path and :test-path respectively.

You create a project.clj in your project’s root directory to control how Leiningen builds your project. Leiningen manages your dependencies in a brief syntax using the same dependency resolution system as maven, and it integrates with Clojars, a jar repository for Clojure projects (which is also a maven repository), both for pulling dependencies as well as allowing you to upload your own open source projects to the remote repository.

There is a full walk through for installing and getting started with Leiningen that is part of the Leiningen documentation, so I am going to focus on what I found useful that isn’t already part of that guide and only show the basics of getting started – you should refer to that page (or the google group) for more information.

Dependency Version Strings

For those of you who have never used Maven, it is worth reading about maven version strings to familiarize yourself with how they work and the effect that something like SNAPSHOT in the version string will have on your build.

`:warn-on-reflection`

:warn-on-reflection is used to set the Clojure compiler option for warning on reflection. This will point out where adding in Clojure (make this a link) type declarations will help speed up your code.

`:dependencies`

:dependencies specifies a list of pairs of project and version string of the libraries you depend on.

`:dev-dependencies`

:dev-dependencies is similar to :dependencies, but the dependencies are not considered to be run-time dependencies, they are only for use by Leiningen during the build. The one I most commonly use is swank-clojure.

Example project.clj

This example is from the clj-bloom library

(defproject com.github.kyleburton/clj-bloom "1.0.1"
  :description "Bloom Filter implementation in Clojure, see also: http://github.com/kyleburton/clj-bloom"
  :warn-on-reflection true
  :dependencies
  [[org.clojure/clojure "1.1.0"]
   [org.clojure/clojure-contrib "1.1.0"]]
  :dev-dependencies
  [[swank-clojure "1.2.1"]])

Common Leiningen Commands

`lein deps`

Prior to version 1.2 of Leiningen, it did not have a build life cycle. With 1.2 it does, this means that it understands that test depends on compile, similar to how maven’s life-cycle works. If you’re using a version older than 1.2 and you get a message like: "Exception in thread "main" java.lang.NoClassDefFoundError: clojure/main" it is dues to the dependencies not yet having been pulled down.

This most often happened to me right after a lein clean. Thankfully Leiningen uses the same caching strategy that Maven does so the second time you run lein deps it will be much quicker as all it needs to do is pull the dependencies from your ~/.m2/repository directory. Currently it copies the file, in the future my hope is that it will use a symlink, which will be supported by Java7.

`lein test`

You can run your clojure.test based tests with lein test:

kyle@indigo64 ~/personal/projects/clj-bloom[master]$ lein test
[null] Testing com.github.kyleburton.clj-bloom-test
[null] Ran 5 tests containing 21 assertions.
[null] 0 failures, 0 errors.
[null] --------------------
[null] Total:
[null] Ran 5 tests containing 21 assertions.
[null] 0 failures, 0 errors.
kyle@indigo64 ~/personal/projects/clj-bloom[master]$ echo $?
0
kyle@indigo64 ~/personal/projects/clj-bloom[master]$

As the example shows, a drawback of the pre 1.2 versions of Leiningen is that they do not set the exit code of the process on test or compilation failure to a non-zero value. This makes those versions difficult to use from other software (like deployment or continuous integration tools.

Thankfully this is one of the fixes in the 1.2 branch, among many others. You can run the 1.2 branch, or the HEAD pretty easily by first having a working stable release of Leiningen, building it and then either symlinking or aliasing lein to point to the checked out copy:

kyle@indigo64 ~/personal/projects$ git clone http://github.com/technomancy/leiningen.git
kyle@indigo64 ~/personal/projects$ cd leiningen
kyle@indigo64 ~/personal/projects/leiningen[master]$ lein deps
kyle@indigo64 ~/personal/projects/leiningen[master]$ lein compile
kyle@indigo64 ~/personal/projects/leiningen[master]$ alias lein=~/personal/projects/leiningen/bin/lein
kyle@indigo64 ~/personal/projects/leiningen[master]$

After doing that, going back into the project directory, now lein test sets the exit code:

kyle@indigo64 ~/personal/projects/clj-bloom[master]$ lein test
     [null] Testing com.github.kyleburton.clj-bloom-test
     [null] FAIL in (make-hash-fn-hash-code-test) (clj_bloom_test.clj:9)
     [null] test the hash-fn helper
     [null] expected: (not (= (.hashCode "foo1") ((bf/make-hash-fn-hash-code "1") "foo" 4294967295)))
     [null]   actual: (not (not true))
     [null] Ran 5 tests containing 21 assertions.
     [null] 1 failures, 0 errors.
     [null] --------------------
     [null] Total:
     [null] Ran 5 tests containing 21 assertions.
     [null] 1 failures, 0 errors.
kyle@indigo64 ~/personal/projects/clj-bloom[master]$ echo $?
1
kyle@indigo64 ~/personal/projects/clj-bloom[master]$

Emacs + SLIME Integration

Leiningen eases the integration with Emacs and SLIME. SLIME provides a rich development environment for Clojure. I’ve written previously about SLIME and Clojure (Connecting to a remote REPL). Leiningen provides the ability to run a swank server with the lein-swank plug-in which is part of swank-clojure.

See Setting up Emacs & Clojure with Emacs Starter Kit or Setting up Clojure with Emacs and Leiningen for how to get Emacs configured for Clojure development.

The swank server is the back-end portion of SLIME. It listens to a socket in the JVM where your Clojure code is running and allows Emacs to connect to and interact with it, compiling and executing code, inspecting values and providing services like code completion and documentation look-up.

The example project.clj above shows how to reference swank-clojure as a dev-dependency.

You can start a swank server for your project, with all your project’s dependencies by running:

kyle@indigo64 ~/personal/projects/clj-bloom[master]$ lein swank
     [null] user=> Connection opened on local port  4005
     [null] #

Then from Emacs, execute M-x slime-connect and accept the defaults (localhost and a port of 4005).

That runs the swank sever on the default port, 4005. If you want to be able to run multiple swank servers on different ports (perhaps you’re working with multiple projects), just pass the additional arguments to the lein-swank plug-in, which are the port then the host (if your machine is known by more than one name or has more than one interface):

kyle@indigo64 ~/personal/projects/clj-bloom[master]$ lein swank 4006
     [null] user=> Connection opened on local port  4006
     [null] #

I’ve found Leiningen to be a simple, easy to use alternative to using Ant or Maven for building your code.

Kyle Burton, 03 Jun 2010 – Philadelphia PA

Special Thanks

Special thanks to technomancy, for creating Leiningen.

From the Agony of JUnit to the Ecstasy of RSpec

Kyle Burton — 2010-02-19T00:00:00-05:00

From the Agony of JUnit to the Ecstasy of RSpec

In the past I have had a love hate relationship with unit testing and testing tools in general. I have had a long, healthy, even dreamy relationship with test driven development though. Aahh how it improves design and prevents issues from reaching users!

I am often unhappy with the amount of effort it takes to write tests using some of the tools and libraries out there. I’ve been around the block from Perl to Lisp to Java, even C++ and C (I found using Chicken to be an easy way to write tests for C libraries, but I digress…) I have found, both with myself and with teams I’ve worked with, that if the effort of writing tests is high enough, less (or no) tests will be written. This is partly because test themselves do not represent business value or end-user features, so most things you can do to reduce the effort of writing or running tests will cause more tests to be written. This includes not just the boilerplate set up and typing you have to do, but the amount of time it takes to run your tests (long running tests unfortunately re-introduce the flow destroying edit-compile-debug cycle back into modern dynamic application development). Reducing effort is, of course, dependent on ROI, which is different for each person and team.

A good friend of mine, Jon Tran, pushes the idea that if you can make something abundant (nearly free) you haven’t just made something better, you’ve made something entirely different. I love this idea and try to use it to guide process and tool improvements for myself and my team.

When I started working with Ruby on Rails I found testing to be so well integrated into how you worked on your Rails project that it was, in fact, abundant. It was so much easier to test your code that writing tests was no longer a chore, but it fit easily, even joyfully into the work-flow. Working more recently with a Rails expert, Trotter Cashion, I was introduced to RSpec. Our team was also blessed one fine day by a visit from Kevin Fitzpatrick who bootstrapped our use of Cucumber in a single afternoon pairing session.

These two tools were pure love. They gave me the same kind of feelings that declarative and functional programming give me – you say what you want it to be, not just how you want it to get there. RSpec made unit testing even more abundant, Cucumber made front end or integration testing abundant. Those two tools went a long way to helping our team write more tests and more effective tests, improving the design of our code, applications and ultimately reducing the number of issues (especially regressions) that made it both to and past our excellent QA testers (we’re only human after all).

Writing unit tests for the Java portions of our systems I have really started to miss the clarity and abundance that RSpec provided. Then I stumbled across the maven-rspec-plugin which uses JRuby to execute rspec base tests during the test phase of the maven life cycle.

The storm clouds have parted and the sun is shining again. This is exactly what I was looking for; I just hadn’t known it until I found it.

We’re already using JRuby within the project and have been happy with the access to all of the JVM based libraries it provides to us (not to mention easy integration with our Clojure code, but that’s another post altogether). The tools are a great fit for our team, we’re already familiar with Ruby, RSpec and writing those kinds of tests – just not yet in our Java portions of the application.

On your mark, get set…I followed the instructions in that post, but ended up with a NullPointerException when the plugin tried to write out the runner shell script. I checked out the code from the CodeHaus svn repository and poked around until I found a work around, it seems that the plugin requires a systemProperties tag in its configuration, even if it’s not used, to prevent the NPE. It just has to be present with at least one property present.

Rob DiMarco discovered that the plugin, rspec or something about this confluence of tools and libraries doesn’t work on JRuby 1.4 (see the stacktrace for more information), so make sure you download and install JRuby 1.3.1 for now.

To get started you’ll need JRuby installed, make sure you get 1.3.1 and you’ll have to set the JRUBY_HOME environment variable to point to your installation. You’ll need maven of course as well.

The plugin configuration we were able to get working is the pom.xml fragment you see here:

   
     org.codehaus.mojo
     rspec-maven-plugin
     
       ${env.JRUBY_HOME}
       ${basedir}/src/test/specs
       ${basedir}/target
       
         testProptestValue
       
     
     
       
         test
         test
         
           spec

This configures rspec to be run during the test phase of the maven build. It also configures the plugin to look for spec tests within the src/test/specs directory. When first executed, the plugin will create two helper scripts in the project’s target/ directory: rspec-runner.rb and run-rspecs.sh. The run-rspecs.sh can be used to run the tests more quickly and succinctly from the command line, as opposed to relying on mvn test to execute the tests. The runner shell script passes its arguments through to rspec, so you can use all the features the rspec runner supports, like running a single spec file or a single test within a spec (by line number or name).

Example Maven Project

I’ve created an example Maven project in my github sandbox that you should be able to use as a reference or starting point for your own project. If it doesn’t work for you, after following the instructions, please let me know and I’ll try to work through any issues with you.

The sample project includes a single java class (SomeClass) and a single spec test which exercises it. It is meant as a minimal working example at the moment, as I explore using RSpec for our Java testing I’ll try to keep it up to date with any new findings (and I hope to figure out how to get it working with JRuby 1.4).

The Java code is a simple class with a few members including a method to fetch a url:


package com.github.kyleburton;

import org.apache.commons.io.IOUtils;

import java.net.*;
import java.io.*;

public class SomeClass {
  private String _userName;
  private String _remoteUrl;
  private String _theContent = null;
  private Downloader _downloader;

  public SomeClass() {
    _userName = "*a default*";
    _remoteUrl = "http://localhost/";
    _downloader = new Downloader(_remoteUrl);
  }

  public SomeClass(String name, String url) {
    _userName = name;
    _remoteUrl = url;
    _downloader = new Downloader(url);
  }

  public String getContent() throws IOException {
    if ( null == _theContent )
      _theContent = _downloader.download();

    return _theContent;
  }

  public String getUserName() {
    return _userName;
  }

  public void setUserName(String userName) {
    _userName = userName;
  }

  public String getRemoteUrl() {
    return _remoteUrl;
  }

  public void setRemoteUrl(String remoteUrl) {
    _remoteUrl = remoteUrl;
    _downloader = new Downloader(_remoteUrl);
  }

  public void setDownloader(Downloader downloader) {
    _downloader = downloader;
  }

  public static class Downloader {

    private String url;
    public Downloader(String url) {
      this.url = url;
    }

    public String download() throws IOException {
      return IOUtils.toString(new URL(url).openStream());
    }
  }

}

The rspec test exercises the class, including using Mockito to ensure that getContent actually delegates to the Downloader class to fetch the remote data (without it actually going out and hitting a website).

import com.github.kyleburton.SomeClass
import org.mockito.Mockito

describe SomeClass do
  before(:each) do
    @the_name = "a user"
    @the_url = "http://asymmetrical-view.com/"
    @some_class = SomeClass.new @the_name, @the_url
  end

  it "should accept constructor parameters" do
    @some_class.user_name.should == @the_name
    @some_class.remote_url.should == @the_url
  end

  it "should download the conten when content is accessedt" do
    some_content = "this is some content"
    downloader = Mockito.mock(SomeClass::Downloader.java_class)
    Mockito.when(downloader.download()).then_return(some_content)
    @some_class.downloader = downloader
    @some_class.content.should_not be_empty
    @some_class.content.should == some_content
  end
end

Finally here is an exmaple of how you can run irb to get a Ruby REPL for your project with all of your project dependencies on the classpath and available in irb (assuming you’ve compiled it with maven first):

# run-jirb.sh
CLASSPATH="target/test-classes:target/classes"
CLASSPATH="$CLASSPATH:`mvn -Dmdep.outputFile=/dev/stderr dependency:build-classpath 2>&1 > /dev/null`"
export CLASSPATH

jirb "$@"

Conclusion

Being able to use RSpec to test our Java and Clojure code-bases is going to help our team be more productive, it will bring the same sense of fun to working on these parts of our system – reduce the effort it takes and leading to more tests. I am quite satisfied with the improvement in morale it has brought to the team when we’re working in these areas just in the first few days we’ve had it.

Kyle Burton, 03 Feb 2010 – Philadelphia PA

Special Thanks

Special thanks to Jon Tran for reading drafts of this post.

Image and Photo Credits

“Duke” Pictures Courtesy of Project Kenai
Diagrams

Influential Books

Kyle Burton — 2009-11-30T00:00:00-05:00

Influential Books

I started with a new team in the middle of this year and as I get to know the other members of the team and share our skills and experiences, the books that each of us has found influential has come up. This has given me the opportunity to reflect a bit on which ones I still find worthy of suggesting that others read. Over my career, and as my family has grown, I have become acutely aware of the value of my free time and that of my co-workers. With that in mind, I’m recommending and sharing a lot less than I used to, and since a coworker (Aaron Feng) recently made a point of thanking me for my recommendations I thought I’d journal what I’ve been pushing at this new team.

My current bookshelf contains the books it does after several rounds of being distilled over the past ten years or so. Even so, there is a mix of titles that have been there for years as well as some recent acquisitions that I suspect will have longevity.

Higher-Order Perl: Transforming Programs with Programs, by Mark Jason Dominus

If you program in Java, C#, Python, Ruby or another imperative, stateful language, this is perhaps the best book you can pick up to learn several core fundamentals of functional programming. It is a great introduction to performing functional programming in an imperative, stateful language. Mark goes through the process of showing you how to implement things like Haskell’s infinite sequences, the use of higher order functions, and several other fundamental tenants of functional programming.

I lent my copy to Aaron and it was he that started to tell me where in the book he is seeing some of the techniques that I use on a regular basis. Mark is a great developer and more than just functional programming comes across in the book, a lot of his skill and clarity of thought does as well.

I feel very sorry for any of you that can not get over the fact that this book’s code examples are in Perl.

Paradigms of Artificial Intelligence Programming: Case Studies in Common Lisp
, by Peter Norvig

I picked up PAIP along with ANSI Common Lisp (by Paul Graham) with the intention of learning Common Lisp. I wasn’t prepared for what was in PAIP when I first struggled through the book. I started with an expectation of learning about AI, with little expectation of picking up practical techniques that I’d be able to apply in my day job developing web applications. What I got out of the book was much more than I anticipated. I’ve gone back to it multiple times and it sits on the top shelf in my book case. PAIP was my first introduction to how deep a subject search really was, it jump started my knowledge about lisp, continuations, functional programming, declarative programming, domain specific languages and teased me with CLOS. Each of these topics is deep enough to warrant its own book, Dr. Norvig introduces each, often including implementations, in a single book.

JRM’s syntax-rules primer for the merely eccentric

After being introduced to and exploring common lisp from PAIP and ANSI Common Lisp, I had started to use, but not completely understand Lisp macros. I was only beginning to understand the implementation of, and recognize the appropriate application of the with-gensyms and once-only macros. I had a fuzzy conception of hygiene with respect to what it meant for macros. JRM’s syntax primer introduces R5RS’s alternate macro system: define-syntax and syntax-rules. syntax-rules is based on pattern matching and recursive expansions, and it was automatically hygienic. This seemed like magic and seemed to eliminate many of the hygiene problems that you encounter with the traditional Lisp style macros. It forced me to think about code generation in a completely different light. The primer is fairly short for the topic it introduces and the material helped me with some fundamental understanding. The greatest influence this had was to introduce to me new ways to develop and maintain DSLs and code generators.

Programming Erlang: Software for a Concurrent World

Joe Armstrong said that when Ericson was creating Erlang, they decided their primary concern was robustness, that to have robustness you needed at least two computers and that most of the rest of the features in Erlang flowed forth from there. To me Erlang has come to embody what robustness really means: a platform that is capable of achieving nine 9’s, that concentrates on fail-fast or crash early so that you have more hope of recovery, supervision trees of processes so you know when things go awry and can handle them from a safe distance, the ability to update code live in a running system so that you never have to bring it down. Erlang is the only platform and language that I’ve encountered that has internalized these principles into its core.

The book increased my exposure to pattern matching, introduced me to list comprehensions and message passing concurrency (gave me the name for the actors model of concurrency).

The 22 Immutable Laws of Marketing: Violate Them at Your Own Risk!, and The 22 Immutable Laws of Branding, by Al Reis

Rob Di Marco recommended this to me when I had the privilege of working directly with him. It was something that was outside my core area of expertise and interest (technology, software development in specific). I keep seeing the principles they detail show up all around me – they gave me another model for observing interactions between people, the purpose of blogs, twitter, and how you can use all of communication and points of contact to help shape how people perceive you. They made me realize, and gave me a starting guide, for how to market and brand myself. Mostly as a developer and technologist, but also in a few other ways. Given that IT folks tend to be introverted and not understand the power that how others perceive them has over their interactions, this is something I try to repeatedly recommend to those that I work with.

Applied Cryptography: Protocols, Algorithms, and Source Code in C, Second Edition, by Bruce Schneier

I read Applied Cryptography from cover to cover as fast as I could manage to. At the time I read it there was a lot that was new to me: the historic context, some of the information theory, the idea that there were known best ways to attack the best known algorithms that reduced the effectiveness but did not outright invalidate the approaches, and it drove home the idea that security can not be absolute, that it is a balance between usability and restrictiveness. This helped hone my sense of information leakage in my own development, from what is stored in files and databases, to log statements to what is exposed thorough error messages and stack traces, where user input comes from and goes to.

Design Patterns: Elements of Reusable Object-Oriented Software, by the Gang of Four (Erich Gamma, Richard Helm, Ralph Johnson, John M. Vlissides)

The gang of four book was at first a set of design guidelines, over the years its importance as a design guide has waned and, for me, it has shifted more towards being important because it establishes a shared vocabulary with other developers. Having the vocabulary allows us to describe the systems and frameworks we’re using and creating in a way that increases understanding and minimizes misconceptions.

Refactoring: Improving the Design of Existing Code, by Martin Fowler

This emphasized the importance of unit testing in the ability to effectively maintain and improve an existing codebase. It introduced me to the ideas of test driven development along with the pragmatic programmer.

TCP/IP Illustrated, Volume 1: The Protocols, and UNIX Network Programming: Networking APIs: Sockets and XTI; Volume 1 by Richard Stevens

Having read these books has served my career well over the years. The introduction to the protocols, and the networking APIs, has been invaluable. The single most important thing I took away from this book was how I/O Multiplexing (select/poll, non-blocking IO) worked and how efficient it could be.

Programming Perl (3rd Edition), by Larry Wall, Tom Christiansen, and Jon Orwant

Perl was the first dynamic language I learned (having programmed only in compiled languages before it). I think the camel book was my first O’Reilly title – many, many more followed. O’Reilly as a publisher deserves a lot of credit for helping to bootstrap my programming career. Programming Perl showed me that a language, its creators and practitioners, and especially its community could have character. It emphasized that programming should be enjoyable and fun. “Easy things should be easy and hard things should be possible” is a great slogan for a language – a language that had a slogan!

Algorithms in C++, Parts 1-4: Fundamentals, Data Structure, Sorting, Searching (3rd Edition) (Pts. 1-4), by Robert Sedgewick

Sedgewick provided me with my first introduction to ‘Big Oh’ notation. His examples displayed an enormous amount of thrift, some of the best examples I’ve seen of using the minimum necessary to implement the chosen algorithm. This was not minimalism to the point of terseness, it exemplified that with a clear understanding of the problem, the choice of an algorithm that closely fit the problem, implementations should be clear and uncluttered. The aesthetic of using minimal language features so as to be as clear as possible was a clear influence on me.

Mastering Algorithms with Perl

I was first introduced to fuzzy string matching through this book, which turned out to be an interesting problem domain. I spent about eight years working for a data integration company applying many of the things I first discovered from these algorithms books.

The Pragmatic Programmer: From Journeyman to Master, by Andrew Hunt and David Thomas

Continual improvement. Keep questioning and seeking better processes, techniques, solutions. Learning a new programming language on a regular basis was the single most memorable part of that book for me. That particular advice has worked particularly well for me. I believe that each programming language was created to address a core issue or difficulty that its creator was experiencing. The language is a manifestation of their understanding of the issue and a simplified design that makes that issue easier to contend with. Some address multiple issues. My impression of C is that it made for more structured interaction with the hardware’s basic features (at a time when there was little). C++ improved the expressiveness of C, introducing OO without moving too far away from the machine. Java eliminated memory management and pointer manipulation mistakes that programmers were so often tripped up by in C and C++ applications, simplified platform dependency issues, and smoothed over some of the complexity of the OO features in C++. Perl worked to integrate other tools, most often that produced or consumed text, it was an integration toolbox and grew from there towards a powerful and terse high level language. Lisp removed all barriers to abstraction, it eliminated the sacred parts of the language and put all their power into the hands of the developer: to do their best, or their worst.

The Practice of Programming, by Brian Khernagin and Rob Pike

They tell you to continue to hone your skills, develop your craft, continually improve. Develop your vocabulary and ability to name things clearly and concisely as it very much matters in the clarity of your software. Learn your tools. This is one of the books I have to thank for my never-ending stream of reading and continued exploration of this field.

Surely You’re Joking, Mr. Feynman! (Adventures of a Curious Character), Richard P. Feynman

Intellectual curiosity. Looking outside the box. There is another way to view things. Trust your intuition and be tenacious – if there are only 2 possibilities and the evidence is inconclusive, discover new evidence, don’t just discard the original axioms – this attitude has been a huge help in debugging. In some ways it comes down to having faith in yourself. I hope my children choose to read and are inspired by Feynman’s writing.

The Diamond Age: Or, a Young Lady’s Illustrated Primer (Bantam Spectra Book), Neal Stephenson

Though I enjoyed both Snow Crash and The Diamond Age, I was more inspired by his vision of a young lady’s illustrated primer. You see the ideas he was exploring in the primer continuing to manifest themselves over time. How the Internet is permeating all our devices and experiences, the collaborative nature of how Wikipedia and other crowd sourced assets are being created, Amazon’s Kindle ebook reader (with on-line access to new content) and discritized task sites like the Mechanical Turk. I want to be able to gift my own children with their own primers.

Penn and Teller’s How to Play with Your Food, Penn Jillette and Teller

Thinking outside the box. Looking at things for what they are, not just for what they seem. Seeing alternate, unexpected, uses for common things. In showing how effectively you can deceive your senses, the book helped give me a good sense of skepticism – not taking something at face value if it didn’t seem right (I tie this to debugging believe it or not).

Other Lists of Influential Books

Since the time I wrote this post I’ve been contacted by others (one at this time) with their lists. I’ll update this periodically with those where I find new books:

Tom Panzarella: Influential Books

Conclusion

These are books that have had staying power in my bookshelf. I’d love to hear your own thoughts if you do decide to pick any of these up, or even if you’ve already been through their pages. I always ask people what books they’ve found influential as a standard question when I interview developers. What do you recommend?

Kyle Burton, 30th November 2009 – Philadelphia PA

Thanks

Special thanks to Jonathan Tran, Aaron Feng, Rob Di Marco and others who have lent and recommended books to me over the years. A second thanks to both Jonathan and Aaron for reading drafts of this post.

Connecting SLIME to a remote Clojure Repl

Kyle Burton — 2009-08-20T00:00:00-04:00

Connecting SLIME to a remote Clojure Repl

SLIME is a powerful extension for Emacs that transforms Emacs into an IDE for Lisp. It sets up Emacs so that it can connect to and interact with a Lisp running as a separate process – even on a separate server. This configuration allows you to connect to a running application with Emacs, allowing you to inspect the state of and debug a running application. In this post I will walk you through how to set up and use this scenario using Emacs, SLIME and SSH to connect to a remote JVM running Clojure.

SLIME and Swank

Slime largely provides the ‘IDE’ portion of the extension, where Swank provides the inter-process communication between Emacs and the running Lisp image. Swank is broken into two parts, a layer run within Emacs and a server which runs within the Lisp image. This provides Emacs the ability to execute forms (expressions) within the running Lisp instance, query it for information and otherwise interact with it in a structured manner.

Clojure

Slime was extended by Jeffery Chu to support Clojure through swank-clojure. Swank-clojure creates a server within the JVM running in its own thread to handle connections from the Emacs part of Swank.

Getting Emacs, SLIME, Clojure and clojure-swank set up is covered by others; technomancy.us has a good getting started guide. My own configuration is available as an example in my GitHub account.

Giving your JVM some Swank

Before you can connect to your running image, you’ll first need to start up Clojure (the Repl) and run the swank-server. You will need to ensure the swank-clojure directory is on your class-path as well as the clojure and clojure-contrib jar files. I use a bash script like the following to run a Clojure Repl. This script sets up a class-path that includes the clojure and clojure-contrib jar files as well as swank-clojure, which we’ll be using in a minute. It also adds all of the files in my $HOME/.clojure directory. I keep commonly used jar files in that directory so this is a convenient way to pull in multiple dependencies.

CLOJURE_JAR="$HOME/.clojure/clojure.jar"
CONTRIB_JAR="$HOME/.clojure/clojure-contrib.jar"
CLASSPATH="$CLOJURE_JAR:$CONTRIB_JAR:$HOME/personal/projects/krbemacs/swank-clojure" 
for f in $HOME/.clojure/*; do CLASSPATH="$CLASSPATH:$f" done
java -server -cp "$CLASSPATH" clojure.lang.Repl "$@"

In addition to the above script, I have the following in a ‘run-swank.clj’ file. It requires swank, sets a required protocol version and finally starts swank in its own thread.

(require 'swank.swank)
(swank.swank/ignore-protocol-version "2009-08-19")
(swank.swank/start-server "/tmp/slime.krb" 
                          :encoding "iso-latin-1-unix"
                          :port 4005)

The ignore-protocol-version is needed by swank-clojure to ensure that the swank-elisp and swank-clojure are in sync (I realize it is named confusingly). The first argument to start-server is a path to which the port swank is listening on will be written to. When you run Lisp from within Emacs these functions are executed by swank-elisp and this file is used so that both sides know what the port is. I run this from either the clojure script or from a clojure repl (outside of Emacs) with load-file:

kyle@indigo64 ~/personal/projects/krbemacs[master]$ clojure
Clojure 1.1.0-alpha-SNAPSHOT
user=> (load-file "run-swank.clj")
Connection opened on local port  4005
4005
user=>

Swank is now running within this lisp image.

The current version of swank-clojure (as of August 20th 2009) does not bind to only localhost, allowing connections from external hosts. Ensure you have either firewalled off the swank port (4005 is the default) or patch swank-clojure to bind only to localhost (there is a swank-clojure in my Emacs GitHub repository with this modification).

Since I can’t connect directly to this remote-host (the port is not exposed to the network). I need to use ssh to securely forward a port from my local workstation through an ssh-tunnel to the other machine:

kyle@macbook ~/$ ssh -L1099:localhost:4005 indigo64
Linux indigo64 2.6.18.8-x86_64-ubuntu #1 SMP Thu Apr 10 11:20:13 EDT 2008 x86_64
Last login: Fri Aug 21 01:50:59 2009 from macbook
kyle@indigo64 ~$

I can now connect to this from my Emacs by executing M-x slime-connect, providing localhost or 127.0.0.1 as the Host-name and 1099 as the port. I chose 1099 arbitrarily, you can use any unused port — 4005 is a fine choice especially since it is the swank default.

At this point, with emacs connected, forms evaluated at the repl within Emacs, or from Clojure buffers will actually be run within the remote JVM.

Executing a print-line from Emacs running on my laptop:

Produces output on the remote Clojure image:

Conclusion

Being able to connect to a remote running image can be useful – especially for times when you can’t have your development environment present on the remote server and still need to introspect the running process.

Kyle Burton, 20th August 2009 – Philadelphia PA

Thanks

Special thanks to Jonathan Tran, and Rob DiMarco for reading drafts and providing valuable feedback and suggestions.

Safer Bash Scripting

Kyle Burton — 2009-08-07T00:00:00-04:00

Safer Bash Scripting

I was pairing with a colleague today writing a moderately sized shell script, during the session some of the best practices I try to follow came up. There are some best practices I try to follow when programming, be it in shell or any other language. We took a little time to talk about two of the habits I picked up which encouraged me to share them here. The first has two parts: treating warnings as errors and logging what happens. The second best practice has to do with clean argument and variable handling in Unix (bash) shell scripts.

Enable All Warnings

This is a universal best practice for all my software development. For gcc you can do this by adding -Wall and -Werror which enables all warnings and treats them as errors. For perl you use -w, or within the code, the warnings and strict pragmas. The additional strictness is more work up front but pays off in not having to debug later. gcc even goes so far as to ensure your printf format strings have corresponding arguments of the correct type! As you learn what constitutes an error or warning to the compiler or run-time you will write cleaner code and it will no longer feel like a burden, and whole classes of errors will cease to happen in your software.

‘`set -e`’

Bash supports a setting, enabled via set -e, that causes your shell-script to immediately cease if any of the commands you’ve called return a non-zero exit value back to the shell (a zero exit value is the standard indication of success for a program or command). Enabling this feature is similar to exception handling, like getting a free ‘if this errors, abort the program’.

As pedagogical example, if you were writing a shell script to create a sub-directory and create a file with the current date and time as its contents, you could easily write:

kyleburton@indigo64 ~/tmp$ rm -rf logs/
kyleburton@indigo64 ~/tmp$ cat dlog.sh 
mkdir logs
date >> logs/uptime.log
kyleburton@indigo64 ~/tmp$ bash dlog.sh 
kyleburton@indigo64 ~/tmp$ cat logs/uptime.log 
Fri Aug  7 22:30:18 EDT 2009
kyleburton@indigo64 ~/tmp$ bash dlog.sh 
mkdir: cannot create directory `logs': File exists
kyleburton@indigo64 ~/tmp$ cat logs/uptime.log 
Fri Aug  7 22:30:18 EDT 2009
Fri Aug  7 22:30:21 EDT 2009
kyleburton@indigo64 ~/tmp$

This would work every time it was run (permissions problems and insufficient disk space not withstanding). The second time it was run though, and every subsequent time, the mkdir program will error informing you the logs directory already exists. With the same script, with set -e set, this line will fail immediately, and not append the current date and time to the log file:

kyleburton@indigo64 ~/tmp$ cat dlog.sh 
set -e
mkdir logs
date >> logs/uptime.log
kyleburton@indigo64 ~/tmp$ bash dlog.sh 
kyleburton@indigo64 ~/tmp$ cat logs/uptime.log 
Fri Aug  7 22:31:07 EDT 2009
kyleburton@indigo64 ~/tmp$ bash dlog.sh 
mkdir: cannot create directory `logs': File exists
kyleburton@indigo64 ~/tmp$ cat logs/uptime.log 
Fri Aug  7 22:31:07 EDT 2009
kyleburton@indigo64 ~/tmp$

This protects us against the script continuing and, depending on the behavior and goals of the script, potentially causing damage or corruption. You could wrap each command in an if/else branch, exiting in the case when the command fails, but the set -e effectively inverts that from having to check for errors as the default behavior to errors aborting processing as the default behavior.

‘`set -x`’

The second declaration I often use at the top of my scripts is set -x, which causes bash to echo each command that it executes:

kyleburton@indigo64 ~/tmp$ cat dlog.sh 
set -x
set -e
mkdir logs
date >> logs/uptime.log
kyleburton@indigo64 ~/tmp$ bash dlog.sh 
+ set -e
+ mkdir logs
+ date
kyleburton@indigo64 ~/tmp$

Combined with the tee command, set -x gives you a log of what your bash script (which is a program after all) was trying to do. These are often invaluable in determining where and why failures have occurred.

Safe-r Variable Handling

Bash scripts, almost by definition, call other programs, some of them bash scripts. With any but the most trivial bash scripts, you will also encounter variables. The main gotcha with variables and the shell is the way argument parsing takes place. The default is to use spaces to separate arguments, which if you’re not careful about how you handle your variables, can cause them to be parsed in unexpected ways. You protect against this by wrapping your variable usage with double quotes. You must be diligent about this, anywhere you see a dollar-sign, it should be wrapped in double quotes.

In the case of a bash script wanting to pass its arguments on to another program without modification (perhaps you’re writing a wrapper script to log timing information or log other information about the execution), you use $@ to refer to all of the command line arguments. To pass them on with no additional accidental parsing, you just surround it with double quotes ("$@") as in the following example. Here we have 2 scripts, the first calls the second, first without the double quotes and then again with the double quotes:


kyleburton@indigo64 ~/tmp$ cat first.sh
set -e
bash second.sh not $@
bash second.sh with "$@"

kyleburton@indigo64 ~/tmp$ cat second.sh
set -e
echo 1 = $1
echo 2 = $2
echo 3 = $3
echo 4 = $4
echo 5 = $5
echo 6 = $6
echo 7 = $7
echo ""

kyleburton@indigo64 ~/tmp$ bash first.sh 1 2 "three for five"
1 = not
2 = 1
3 = 2
4 = three
5 = for
6 = five
7 =

1 = with
2 = 1
3 = 2
4 = three for five
5 =
6 =
7 =

kyleburton@indigo64 ~/tmp$

The first invocation of second.sh from first.sh does not use the quotes, which causes bash to re-parse the arguments within the first.sh script. Resulting in "three four five" being split into 3 distinct parameters to be passed to second.sh. In the second case, using "$@", the atomicity of the argument is preserved.

Conclusion

Writing robust bash shell scripts is helped significantly through the enabling of error checking as well as being diligent when using and passing command line arguments. I hope these tips help you in your scripting.

Kyle Burton, 7th August 2009 – Wayne PA

Thanks

Special thanks to Jonathan Tran, and Aaron Feng for reading drafts and providing feedback and suggestions.

Photo Credits

Departure

Kyle Burton — 2009-07-19T00:00:00-04:00

Departure

I recently moved on from a company I was with for eight years (my longest time at any one company). My experience at HMS was absolutely without equal. I was given better opportunities than I could have reasonably asked for. I grew both professionally and personally, and both of my children were born while I was with HMS. I was presented with what I believe to be an unique situation and did not make the decision to leave lightly. I know they will continue to do great things. Based on some of the feedback I received from friends I’ve decided to share the email I sent to them as I departed.

Kyle Burton, 17th July 2009 – King of Prussia PA

You’ll have to…

…do the enthusiastic white boarding; come up with the snide nick names, and make bad puns. I won’t be here to do it.

Don’t be afraid to fail. Don’t be afraid that you might not know it all. Try.

You are all phenomenal at the technical aspect of what you do for HMS.

If I could give you advice that you’d follow, it would be this:

Work hard at developing relationships with your peers in the other business units. They’re in this as much as you are. Work to understand the music they have to face standing in front of customers. Appreciate that you’re shielded from this.

Learn about every role in the organization: find out what sales faces when presenting what you make; find out what the auditors are validating about the data your systems create.

If you do this, you will find that they will come understand you as you develop empathy for them. You will be the one has to do it, no one else can do it for you or instead of you. Do it even if no one else does it back. If you keep doing it eventually you’ll see people start to take your lead.

Chaperone your peers across the building. Walk with them. Be the one who says lets do it now, lets go see if there over there now.

When you see something you sincerely believe to be exceptional, say so, out loud. Tell their manager even if that manager is the CIO or CEO. You will gain more by doing this than I can possibly tell you. You will also be surprised at how people will try to live up to the virtues you point out in them.

You are all living through a shared challenge right now. Realize that what you are doing now is a story; it is a history, it is a rare shared experience. You have an opportunity to develop deep respect and friendships. Events like this do not come along often, you may not get the chance again for a long, long time. You are seeing those around you prove themselves. As busy as you are, make an effort to see what they are doing. From what I am seeing it is incredible.

It has been my distinct pleasure working along side each and every one of you. Thank you for helping me grow these past eight years.

Thank you for being who you are.

Kyle Burton

Array Type Hints in Clojure

Kyle Burton — 2009-07-02T00:00:00-04:00

Array Type Hints in Clojure

How to encode type hints for Java array types came up recently in conversation with a friend and I found it difficult to Google for so I decided to write it up here. This is because Java doesn’t have what you’d normally think of as a class name for its typed arrays, but first a brief explanation of type hints…

Clojure Type Hints

Clojure allows you to use type declarations, or hints, in two ways. The first are declarations to the Clojure compiler which aid in function signature checking. Type-hinted code will be faster in many cases (when the type was otherwise ambiguous to the compiler) because the Clojure run-time doesn’t have to spend time using Java’s Reflection API to figure out which underlying method is appropriate for the type of your arguments. The second aspect of where types used in Clojure is in multi-methods, where the types of the arguments are typically used in determining how method resolution is performed.

You can see Clojure doing reflective lookups by setting *warn-on-reflection* to true. Having this set to true while you run your unit tests, or at the end of your development pushes, is a good habit to get into since the performance impact of all the default reflective look-ups is a good thing to eliminate out of your code once things are stable.

Enabling those warnings produces errors that look like this:


user> (set! *warn-on-reflection* true)
true
user> (def x (StringBuilder.))
#'user/x
user> (.append x "foo")
Reflection warning, NO_SOURCE_PATH:1 - call to append can't be resolved.
#
user> (.append x 1)
Reflection warning, NO_SOURCE_PATH:1 - call to append can't be resolved.
#
user> 
user> (defn second-ch [s]
        (.charAt s 1))
Reflection warning, NO_SOURCE_PATH:2 - call to charAt can't be resolved.
#'user/second-ch
user> (second-ch "twenty")
\w
user>

Note that the warning happened at the time the function was compiled, not when it is called. Clojure is warning us that it couldn’t generate code to call the method directly, but it had to generate code that used the reflection API to first find the proper method to call and then call it. We can avoid the expensive reflective look-up and quiet warning by introducing a type-hint for the parameter, telling Clojure that it is a String:

user> (defn second-ch [#^String s]
        (.charAt s 1))
#'user/second-ch
user> (second-ch "thirty")
\h
user>

This time there is no warning and the generated code will call charAt directly.

Multimethods

Multimethods are the other area where the annotations frequently come into play. There are 2 steps to defining a basic multimethod. The first is the defmulti declaration where you name the multi-method and you then provide a function which Clojure will use to dispatch the call to one of the multi-method instances you later declare. One of the most common dispatch functions is class, which will match up the Java class of the argument to the one declared in the defmethod.

This example declares a multimethod that takes either a String or an Integer:


user> (defmulti bar class)
#'user/bar
user> (defmethod bar String  [s] (str "the-str:" s))
#
user> (defmethod bar Integer [s] (str "the-int:" s))
#
user> (bar "this")
"the-str:this"
user> (bar 123)
"the-int:123"
user>

And if you attempt a call with an argument that doesn’t match any of the declarations:

user> (bar 4.56)
; Evaluation aborted.
No method in multimethod 'bar' for dispatch value: class java.lang.Double
  [Thrown class java.lang.IllegalArgumentException]
...

That’s effectively what Emacs, Slime and Clojure report to me.

There are more nuances to multimethods, like specifying defaults, handling multiple arguments and using your own dispatch functions – for now, the documentation is probably the best place to find out more.

Arrays and Class Names

So, back to why we’re here. As you start using type hints you may (as I did) run into a situation where you want to use a hint for a situation where you have a typed array, or as a dispatching value in multimethods. Javadoc presents these as Type[] and that is how you encode them in your Java source code. The problem, though, is that that’s not what the byte-code or JVM calls it at run-time, and what it does call it is not syntatically valid in either your Java or Clojure source code.

So what are typed arrays called in Java? Lets ask Java what they’re called…you can find out what a String array is called with this bit of example code:

kyleburton@indigo64 ~$ cat Test.java
public class Test {
  public static void main(String [] args) {
    System.out.println("String array: " + args.getClass());
    System.out.println("Byte Array: " + "foo".getBytes().getClass());
  }
}
kyleburton@indigo64 ~$ javac Test.java
kyleburton@indigo64 ~$ java Test
String array: class [Ljava.lang.String;
Byte Array: class [B
kyleburton@indigo64 ~$

You can get the same information from the Clojure REPL as well by using Clojure’s into-array:

user> (class (into-array ["a"]))
[Ljava.lang.String;
user> (class (.getBytes "foo"))
[B
user>

So you now know how to ask for the class name of an array you have an instance of. You can use this to ask Class for the class based on its name (as a String):

user> (Class/forName "[B")
[B
user> (Class/forName "[Ljava.lang.String;")
[Ljava.lang.String;
user>

Of course Clojure supports this for primitive types by pluralizing the primitive type:

user> (defn foo [#^bytes b] (String. b))

That approach doesn’t work for non-primitive types (classes) though.

Java Array Type Hints

Putting these together we can now declare defmethods using either of these techniques. Asking the JVM what the the class is based on a hard-coded example value looks like this:


user> (defmethod bar (class (into-array String [])) [s]
        (str "the-string[]:" s))
#
user> (bar (into-array String ["a" "b"]))
"the-string[]:[Ljava.lang.String;@14f3cf72"

Using a hard-coded string of the class name as the JVM sees it (which also works for the primitive types) looks like the following:


user> (defmethod bar (Class/forName "[Ljava.lang.String;") [s]
        (str "the-string[]:" s))
#
user> (bar (into-array String ["b" "c"]))
"the-string[]:[Ljava.lang.String;@4597871d"
user> 

user> (defmethod bar (Class/forName "[B") [s] ;; same as #^bytes
        (str "the-bytes:" s))
#
user> (bar (.getBytes "foo"))
"the-bytes:[B@7eedec92"
user>

Even though this works, I recommend staying away from hard-coding the string representation and using Class/forName. I worry that it might change in a future JVM release, breaking the code.

Conclusion

Even though Clojure does not have direct syntax support for hints for Java arrays, it’s still possible to use them.

Kyle Burton, 14th July 2009 – Wayne PA

Thanks

Special thanks to Jonathan Tran, and Mike DeLaurentis for reading drafts and providing feedback and suggestions.

Photo Credits

Creating Executable Jars For Your Clojure Application

Kyle Burton — 2009-06-22T00:00:00-04:00

Creating Executable Jars For Your Clojure Application

It is possible to create stand alone executable Jar files for your Clojure programs. In this post I walk you through the issues you need to keep in mind and the steps you need to take to create the jar. You can download the example code this post walks through in its entirety from my GitHub account (under examples/exec-jar).

I used ant to build the jar, other Java development tools can also do the task.

Jar Files

Jar files are Zip files with a few conventions for what goes where and of their contents. Jar files support a Manifest File file which tells Java what the archive contains. The manifest file has a simple format of key/value pairs, very similar to an HTTP header. This is manifest for the Clojure Jar:

Manifest-Version: 1.0
Ant-Version: Apache Ant 1.7.1
Created-By: 11.2-b01 (Sun Microsystems Inc.)
Main-Class: clojure.main
Class-Path: .

The important bit there is ‘Main-Class’. The Main-Class specifies the Java class that will be executed by default when run as java -jar clojure.jar. In clojure.jar the main class is clojure.main.

Clojure Application

The example application performs a Google search, printing each link and its text:


(ns com.github.kyleburton.app
  (:gen-class)
  (:use [com.github.kyleburton.sandbox.web :as kweb]
        [com.github.kyleburton.sandbox.landmark-parser :as lp]
        [clojure.contrib.str-utils :as str-utils]))

(defn fetch-page [terms]
  (let [page (kweb/get->string (format "http://www.google.com/search?q=%s" (str-utils/str-join "+" terms)))]
    (doseq [link (filter #(.contains % "class=l") (lp/html->anchors page))]
      (let [href (lp/anchor->href link)
            text (kweb/strip-html (lp/anchor->body link))]
        (println text)
        (println href)
        (println "\n")))))

(defn show-help []
  (pritnln
   "app term [term2 [term3 ...]]

Performs a Google Search for the given terms

"))

(defn -main [& terms]
  (cond (empty? terms)
        (show-help)
        true
        (fetch-page terms)))

Compilation

Java will only run Java byte code so we need to compile the application. I’ve set up the build.xml to do so:

The jar-with-manifest then uses the compiled classes, generates a manifest file and uses Ant’s jar task with a series of zipfilesets to combine the clojure, clojure-contrib and other dependency’s jars into a single, executable, jar:

Running The Application

Our application can now be run all by itself with java -jar ...:


kyleburton@indigo64 exec-jar[master*]$ java -jar target/exec-jar-0.1.jar ant zipfileset

ZipFileSet Type
http://ant.apache.org/manual/CoreTypes/zipfileset.html

Zip Task
http://ant.apache.org/manual/CoreTasks/zip.html

Ant Best Practices: Use ZipFileSet | The Build Doctor
http://www.build-doctor.com/2008/07/13/ant-best-practices-use-zipfileset

Top 15 Ant Best Practices - O'Reilly Media
http://www.onjava.com/pub/a/onjava/2003/12/17/ant_bestpractices.html

'cvs commit: ant/src/main/org/apache/tools/ant/types ZipFileSet ...
http://marc.info/?l=ant-dev&m=112197754500691&w=2

[picocontainer-scm] [CVS java] improved maven scri: msg#00056 java ...
http://osdir.com/ml/java.picocontainer.cvs/2004-07/msg00056.html

ZipFileSet Apache Ant 1.6.5 API Documentation and Javadoc
http://www.jdocs.com/link/org/apache/tools/ant/types/zipfileset.html

svn commit: r719578 - /ant/core/trunk/src/tests/antunit/types ...
http://mail-archives.apache.org/mod_mbox/ant-notifications/200811.mbox/%3C20081121133918.572D2238889E@eris.apache.org%3E

ZipFileSet (Apache Ant API)
http://lia.deis.unibo.it/Courses/TecnologieWeb0607/materiale/laboratorio/ant/api/org/apache/tools/ant/types/ZipFileSet.html

[news.eclipse.platform] Re: Problem with ANT zipfileset ... Re ...
http://dev.eclipse.org/newslists/news.eclipse.platform/msg07778.html

kyleburton@indigo64 exec-jar[master]$

Conclusion

Bundling your application into a single Jar can simplify the deployment, and distribution of your application. What tends to be less desirable about this technique though is that you have to track your dependencies carefully (you risk ClassNotFoundExceptions) and the jars you create are often quite large. You can not add additional jars or paths to the classpath when running java -jar, which means that you won’t be able to re-use the libraries across other applications.

This technique allows you to create a single file that your users can download to run your clojure based applications.

Kyle Burton, 28th June 2009 – Wayne PA

Links

Photo Credits

Visualizing AMQP Broker Behavior with Clojure and Incanter

Kyle Burton — 2009-06-02T00:00:00-04:00

Visualizing AMQP Broker Behavior with Clojure and Incanter

I’m working with our system architect (Mark Mehalik) evaluating AMQP as a messaging implementation for a new project. One of the things want to prove out is how the brokers behave under a widely varying number messages, message sizes, numbers of producers, consumers, queues and in differing clustering configurations. Since AMQP is a standard we should be able to build a single test suite and then execute it against the different brokers to measure their behavior as well as how robust they are under those conditions.

Scripting and Automation

I started the process of mocking up and automating the tests with the goal to help us figure out what we want to test. I chose to do this with Clojure and to use Incanter for a quick and easy visualization. You can the check the code out of my sandbox project on GitHub. The example code exercises a local AMQP v0.8 broker using the RabbitMQ Java Client Libraries (which support 0.8 at this time).

Producer

To test the broker, I created a simple producer and consumer and had them pass small (371 byte) messages. They use the simple Rabbit Java client library primitives for Channel.basicPublish and Channel.basicGet. The message consists of an array of two values: a message id (integer); and a time-stamp (java.util.Date). They get serialized and placed into the AMQP message body. The producer just loops pushing all the messages to the broker and then logs its start and end times:

(defn producer [producer-num cnt]
  (rabbit/with-amqp
   {}
   (let [start-time now]
     (dotimes [ii cnt]
       (object-publish [ii (Date.)]))
     (log-producer-stat producer-num cnt start-time (now)))))

Consumer

The consumer is a bit more more complex, doing more book keeping. It tracks the number of messages it receives, its own elapsed time and totals up the amount of time each message was spent on the queue. Just as with the producer, it logs its timings after it has consumed all of the messages on the queue.

(defn consumer [consumer-num]
  (rabbit/with-amqp
   {}
   (let [start-time (now -1)
         num-msgs   (atom 0)
         msg-age    (atom 0)]
     (loop [[ii dt] (rabbit/object-get)]
       (if ii
         (let [end         (now -1)
               msg-elapsed (- end (.getTime dt))]
           (reset! num-msgs (+ 1 @num-msgs))
           (reset! msg-age  (+ msg-elapsed @msg-age))
           (recur (rabbit/object-get)))))
     (log-consumer-stat consumer-num @num-msgs start-time (now) @msg-age)
     (let [elapsed (/ (- (now) start-time) 1000.0)
           rate    (/ @num-msgs elapsed)]
       (prn (format "%s messages in %s elapsed seconds @ %s/second"
                    @num-msgs
                    elapsed
                    rate))))))

Benchmark

I chose to run a series of tests with message counts from 1 up to 500k, repeating each test three times. This is only to get some data to play with, we plan on creating longer running and distributed tests over the next week to measure the behavior of the brokers in other configurations. This test was enough to produce data for me to try out Incanter with though and given how easy this was to create, we’ll likely continue to modify this set of files.

(defn run-benchmark [broker num-prods num-cons]
  (doseq [msg-count [1 5 10 50 100 500 1000 5000 10000 50000 100000 500000]]
    (run-single-benchmark (format "Series-%sm-%sp-%sc" msg-count num-prods num-cons)
                          broker
                          3 
                          msg-count
                          num-prods))
  (prn (format "run-benchmark: %s completed" broker)))

(defn run-single-benchmark [series broker num-runs num-msgs num-producers]
  (dotimes [run-number num-runs]
    (binding [*testing-info* (merge *testing-info*
                                    {:series  series
                                     :broker  broker
                                     :run-num (+ 1 run-number)})]
      (let [msgs-per-producer (/ num-msgs num-producers)]
        (dotimes [ii num-producers]
          (.start (Thread. (fn [] (producer (format "p%s" ii) msgs-per-producer))))))
      (Thread/sleep 100)
      (consumer "c1"))))

Incanter Visualization

Once we had run this against Apache’s Qpid and RabbitMQ there was enough data to produce some charts. There is code to support loading the collected data from the files from disk, and the Incanter code is wonderfully easy to use.

get-stat-data pulls the data from the consumer log for the named broker as a sequence of maps where the column headers are the keys and the row fields are the values. get-xy-data pulls two of the fields and separates them into two sequences – just what the plotting functions in Incanter expect. simple-xy-plot then passes the two sequences to Incanter line-plot to visualize the data.

(defn get-xy-data [broker xname yname]
  (let [stat-data (get-stat-data broker *consumer-stats-file*)
        count-and-rate (map (fn [ent] [(ent xname) (ent yname)]) stat-data)
        x-vals    (map #(Double/parseDouble (% 0))  count-and-rate)
        y-vals    (map #(Double/parseDouble (% 1))  count-and-rate)]
    [x-vals y-vals]))

(defn simple-xy-plot [broker xname yname]
  (let [rabbit-data (get-xy-data broker xname yname)]
    (line-plot (rabbit-data 0) (rabbit-data 1)
               :title (format "%s vs %s" xname yname)
               :x-label xname
               :y-label yname)))

(view (simple-xy-plot "RabbitMQ"    "TOTAL-MESSAGES" "M/S"))
(view (simple-xy-plot "Apache-Qpid" "TOTAL-MESSAGES" "M/S"))

I ran the broker, and a single JVM hosting both the producer and consumer on my workstation (along with all my other processes, X, KDE, Emacs and a ton of other processes). I did no tuning of the brokers, running them in their default configurations. I don’t expect these numbers to be too representative of the brokers performance in any kind of production configuration.

There are a few other interesting findings: With no tuning Qpid failed at 100k messages with an out of memory exception (crashed hard). This was with the default 1024Mb heap setting, at 2048Mb it was able to handle 100k messages and the graph shows Qpid using the 2048Mb heap setting. Rabbit survived up to 500k message (as far as I pushed it), but the performance degraded significantly down to about 170 messages per second.

Qpid RSS

This shows the 2048M heap configuration growing up to the 100k message mark.

Qpid VSS

The flatness of this isn’t surprising as the JVM allocated its entire heap all at once.

Rabbit RSS

Erlang’s a bit more interesting, you can see it trying to cope with the various message loads, spiking up to about 3/4 of a gig with 500k messages. In this configuration Rabbit’s rate dropped significantly from its average, to about 170/s. It did stay up though. The next step here is to see what we can learn about how Rabbit is handling this deluge of messages.

Rabbit VSS

This pretty closely follows the shape of the RSS graph.

Rabbit vs Qpid Messages per Second

Clojure and Incanter

It was very straight forward to set up these benchmarks and the graphing code with Clojure and Incanter. We were able to try different approaches and visualizations very rapidly. Clojure’s ease of development and Incanter’s high level graphing functions turned the profiling from a chore into a fun task.

Kyle Burton, 2nd June 2009 – King of Prussia PA

Photo Credits

Trixx Gets a Build

Kyle Burton — 2009-05-21T00:00:00-04:00

Trixx Gets a Build

I just implemented an ant build for Trixx. Trixx now builds, pulls down its dependencies jar, and contains and builds the required version of RabbitMQ.

To build using ant (you will need to have Erlang already installed):

kyleburton@indigo64 ~/personal/projects/$ git clone git@github.com:kyleburton/trixx.git
kyleburton@indigo64 ~/personal/projects/$ cd trixx
kyleburton@indigo64 ~/personal/projects/trixx[master]$ ant fetch-deps compile-rabbit-server compile-rabbit-java-client jar

Then in separate terminals, run RabbitMQ:

kyleburton@indigo64 ~/personal/projects/trixx[master]$ ant run-rabbit
Buildfile: build.xml

run-rabbit:
     [exec] RabbitMQ %%VERSION%% (AMQP 8-0)
     [exec] Copyright (C) 2007-2009 LShift Ltd., Cohesive Financial Technologies LLC., and Rabbit Technologies Ltd.
     [exec] Licensed under the MPL.  See http://www.rabbitmq.com/
     [exec]
     [exec] Logging to "/home/kyleburton/local/var/rabbit/log/rabbit.log"
     [exec] SASL logging to "/home/kyleburton/local/var/rabbit/log/rabbit-sasl.log"
     [exec]
     [exec] starting database             ...done
     [exec] starting core processes       ...done
     [exec] starting recovery             ...done
     [exec] starting persister            ...done
     [exec] starting guid generator       ...done
     [exec] starting builtin applications ...done
     [exec] starting TCP listeners        ...done
     [exec]
     [exec] broker running

In another terminal, run Trixx:

kyleburton@indigo64 ~/personal/projects/trixx[master]$ ant run-trixx
Buildfile: build.xml

run-trixx:
     [echo] Ensure you've run rabbit before this task...
     [java] "com.leftrightfold.trixx: *cookie*=FONEINRZCWQWZOERIHXH"
     [java] "com.leftrightfold.trixx: *server*=localhost"
     [java] "com.leftrightfold.trixx: *rabbit-instance=rabbit"
     [java] 2009-05-22 00:15:58.317::INFO:  Logging to STDERR via org.mortbay.log.StdErrLog
     [java] clojure.proxy.javax.servlet.http.HttpServlet
     [java] 2009-05-22 00:15:58.395::INFO:  jetty-6.1.15
     [java] 2009-05-22 00:16:00.462::INFO:  Started SocketConnector@0.0.0.0:8080

Aaron’s stated goal is to make Trixx a management console for RabbitMQ. He just started implementing a RESTfull interface using Compojure which you can hit with a browser after running ant run-trixx.

Now that things are easier to build and run I expect the development to go more smoothly.

Kyle Burton, 21 May 2009 – Wayne PA

Experience knows a Name

Kyle Burton — 2009-05-20T00:00:00-04:00

Experience knows a Name

Reading Your Language Features Are My Libraries, a thought occurred to me. Not about language features, but about people.

I see new developers latch onto the new languages and the features that they support (Java has garbage collection; Ruby lets you extend classes and makes DSLs easy; C# has list comprehensions; and so on). I see new developers lead themselves down the road of re-implementing solutions when pre-existing libraries may have been available (sometimes core libraries). I see senior developers lament that the junior devs aren’t paying attention to what already exists.

Senior Developers have an Index

Seasoned or experienced developers have a large mental index of existing example code, libraries, design patterns, frameworks and full application stacks. Experienced developers can often qualify and evaluate suitable pre-existing approaches (existing language support, core libraries, available libraries, pre-existing applications) – much more frequently and with more success (where success is re-use rather than re-implementation) than junior developers.

I’m not so sure that the only reason for this is just that new developers don’t have the mental index of language features and libraries that more experienced developers do.

Both types of developers can use Google right (it’s a big “index” after all)? Well…not in the same way. Experienced developers have a higher chance of already understanding core aspects of the problem they are facing. Experienced developers are more likely to know the commonly used names for many of the aspects of the problem they are facing. So when they search, they’re searching for something very specific, something that is more likely to be found.

Junior developers will not be searching by name, rather by definition. They search by using the phrases that describe what they’re after. This isn’t as successful. Look at the difference between searching for carbonated sugar water versus coca-cola.

This has a huge impact on the likelihood that a suitable pre-existing solution will even be identified. I have improved in this respect over the years – partly just by accumulating knowledge of the names of things (this is the idea that naming something or knowing its name gives you some amount of power over it).

Then we come to the issue of understanding the problem. Developers will more often have to work to build a working model of the problem before any search or selection could take place. Developers by their very nature will write code to help them develop understanding of a problem. Writing a prototype is a very powerful tool for helping you understand a problem – if you can instruct a computer to perform a task, you must have a pretty good understanding of it.

Here’s the rub though, once you have a prototype, you have something with value. It becomes less attractive to qualify and acquire an alternative when you already have a working, or close to working “bird in hand”. Junior developers often over value their prototypes, partly due to the actual effort it took them to create it. Senior developers create more minimal prototypes, only explore the core parts of the problem domain they don’t already understand – and often with different tools, not caring about input, output or fixed data models or well defined types. The point is to learn just enough and senior developers are more prescient of what ‘just enough’ is.

Junior Developers Implement to Learn

The transfer of the names of things of these patterns of problems, is a core value that developers get out of mentoring each other.

Junior developers implement to learn. I encourage the junior developers I work with to write prototypes of existing algorithms and design patterns – it is vital practice and helps them develop a deeper understanding of problems. I try to discuss with them what they did, why they did it and try to transfer terminology and discuss with them existing solutions that I may know of for contrast.

As they gain experience, they will know its name.

Kyle Burton, 20 May 2009 – King of Prussia PA

Exploring Quartz from Clojure

Kyle Burton — 2009-05-19T00:00:00-04:00

Exploring Quartz from Clojure

From the Quartz website:

Quartz is a job scheduling system that can be integrated with, or used along side virtually any other software system. The term “job scheduler” seems to conjure different ideas for different people…in short, a job scheduler is a system that is responsible for executing (or notifying) other software components when a pre-determined (scheduled) time arrives.

I wanted to be able to explore how Quartz worked from Clojure. Quartz executes Jobs. You do not schedule job instances in Quartz though, instead you pass the scheduler a factory, specifically a JobDetail. The JobDetail will specify the Class of the class implementing the job, which must support a no-arg constructor. This meant I couldn’t use Clojure’s proxy to implement the Job instance – since it needed to be constructible via a call like Class.newInstance. Looking briefly at the Quartz source it appears to be the technique used in at least the SimpleJobFactory.

ClojureJob.java

To enable the calling of Clojure from a Quartz Job I implemented a very basic ClojureJob class:

package com.github.kyleburton.sandbox.quartz;

import clojure.lang.Namespace;
import clojure.lang.RT;
import clojure.lang.Symbol;
import clojure.lang.Var;
import clojure.lang.IFn;

import org.quartz.Job;
import org.quartz.JobDetail;
import org.quartz.JobDataMap;
import org.quartz.JobExecutionContext;
import org.quartz.JobExecutionException;

/**
 * Quartz Job class for executing a clojure function.  The clojure
 * function will take a single argument, the JobExecutionContext which
 * is passed to this Job's execute method.
 *
 * NB: There is no way to capture or propagate an error from the
 * called function back to the context where the job was scheduled.
 *
 * http://www.opensymphony.com/quartz/
 *
 * @author Kyle R. Burton 
 */
public class ClojureJob implements Job {
  /** JobExecutionContext/JobDetail/JobDataMap Parameter for the namespace of the function that will be called. */
  public  static final String NAMESPACE_PARAMETER     = "job.clojure.namespace";
  /** JobExecutionContext/JobDetail/JobDataMap Parameter for the name of the function that will be called. */
  public  static final String FUNCTION_NAME_PARAMETER = "job.clojure.function";
  /** */
  public  static final String FUNCTION_PARAMETER = "job.clojure.fn";
  private static final Class CLASS = ClojureJob.class;

  /**
   * Quartz required no-arg constructor.  Does nothing.
   */
  public ClojureJob() {
  }

  /**
   * Execute implementation, look up the clojure function specified in
   * the JobDataMap, invoke it with the JobExecutionContext.
   * @param context the JobExecutionContext passed in by quartz
   * @throws JobExecutionContext if the function can not be looked up,
   * or if the function throws an exception.
   */
  @Override
  public void execute(JobExecutionContext context) throws JobExecutionException {
    if ( null != contextParameter(context,FUNCTION_PARAMETER) ) {
      executeFunction(context,(IFn)contextParameter(context,FUNCTION_PARAMETER));
      return;
    }

    executeVar(context);
  }

  private void executeVar( JobExecutionContext context) throws JobExecutionException {
    Var fn = lookupClojureFunction(context);
    if ( null == fn ) {
      throw new JobExecutionException(
        String.format(CLASS.getName() + ".execute: unable to find the specified function, namespace=%s; function=%s", 
                      contextParameterString(context,NAMESPACE_PARAMETER),
                      contextParameterString(context,FUNCTION_NAME_PARAMETER)));
    }

    try {
      fn.invoke(context);
    }
    catch(Exception ex) {
      throw new JobExecutionException(ex);
    }
  }

  private void executeFunction( JobExecutionContext context, IFn fn) throws JobExecutionException {
    try {
      fn.invoke(context);
    }
    catch(Exception ex) {
      throw new JobExecutionException(ex);
    }
  }

  /**
   * Helper function for pulling parameters from the JobDataMap
   *
   * @param JobExecutionException the context to pull the parameter from
   * @param name the parameter to pull from the JobDataMap
   * @return the string value from the JobDataMap
   */
  private String contextParameterString(JobExecutionContext context, String name) {
    return context.getJobDetail().getJobDataMap().getString(name);
  }

  private Object contextParameter(JobExecutionContext context, String name) {
    return context.getJobDetail().getJobDataMap().get(name);
  }

  /**
   * Helper function for looking up the clojure function specified in
   * the quartz job.
   *
   * @param JobExecutionException the context passed in from quartz
   * @return the clojure Var which is the function
   */
  private Var lookupClojureFunction(JobExecutionContext context) { 
    String namespaceName = contextParameterString(context,NAMESPACE_PARAMETER);
    String functionName  = contextParameterString(context,FUNCTION_NAME_PARAMETER);
    Symbol symNamespace  = Symbol.create(namespaceName);
    Namespace namespace  = Namespace.findOrCreate(symNamespace);
    return Var.intern(namespace,Symbol.create(functionName));
  }
}

This class calls a clojure function that is either looked up or one that is passed in via the JobDetails of the JobExecutionContext. In the case when an actual function (Clojure IFn) is passed, it is called without checking to see if a named function is passed. The JobExecutionContext is passed to the invoke call So that parameters can be passed through to the Clojure function in either case.

quartz.clj

This allowed me to then create and schedule jobs which are backed by clojure functions. Here is quartz.clj

(ns com.github.kyleburton.sandbox.quartz
  (:import (org.quartz SchedulerFactory Scheduler TriggerUtils JobDetail)
           (org.quartz.impl StdSchedulerFactory)
           (com.github.kyleburton.sandbox.quartz ClojureJob)))

(def *schedule-factory* (StdSchedulerFactory.))

(def *scheduler* (atom nil))

(defn ensure-scheduler-started []
  (if (or (not @*scheduler*)
          (.isShutdown @*scheduler*)
          (not (.isStarted @*scheduler*)))
    (do
      (reset! *scheduler* (.getScheduler *schedule-factory*))
      (.start @*scheduler*)
      true)
    nil))

(defn stop-scheduler []
  (if (and @*scheduler*
           (.isStarted @*scheduler*))
    (.shutdown @*scheduler*)))

(defn schedule-job [job-detail trigger]
  (ensure-scheduler-started)
  (.scheduleJob @*scheduler* job-detail trigger))

(defn delete-job [job-detail]
  (.deleteJob @*scheduler*
              (.getName job-detail)
              (.getGroup job-detail)))

(defn job-exists? [job-detail]
  (not (nil? (.getJobDetail @*scheduler*
                            (.getName job-detail)
                            (.getGroup job-detail)))))

(defn testfn [context]
  (prn (format "testfn: context=%s time=%s" 
               context
               (java.util.Date.))))

(defn quartz-test []
  (let [job-detail (JobDetail. "myJob" nil ClojureJob)
        trigger (doto (TriggerUtils/makeSecondlyTrigger 10)
                  (.setStartTime (TriggerUtils/getEvenSecondDate (java.util.Date.)))
                  (.setName "My Second Trigger"))]
    (.put (.getJobDataMap job-detail) ClojureJob/NAMESPACE_PARAMETER "com.github.kyleburton.sandbox.quartz")
    (.put (.getJobDataMap job-detail) ClojureJob/FUNCTION_NAME_PARAMETER "testfn")
    (schedule-job job-detail trigger)))

(defn quartz-test-fn [fn]
  (let [job-detail (JobDetail. "myJob" nil ClojureJob)
        trigger (doto (TriggerUtils/makeSecondlyTrigger 10)
                  (.setStartTime (TriggerUtils/getEvenSecondDate (java.util.Date.)))
                  (.setName "My Second Trigger"))]
    (.put (.getJobDataMap job-detail) ClojureJob/FUNCTION_PARAMETER fn)
    (schedule-job job-detail trigger)))


;; (quartz-test)
;; (stop-scheduler)
;; (def *count* (atom 0))
;; (quartz-test-fn (fn [context] 
;;                   (reset! *count* (inc @*count*))
;;                   (prn (format "anon scheduled function! context=%s called %d times!" context @*count*))))
;; (stop-scheduler)

To play with these pull down my sandbox run ant in the clojure-utils sub directory with the fetch-deps and jar targets.

kyle@indigo64 ~/personal/projects/sandbox/clojure-utils[master]$ ant fetch-deps jar
Buildfile: build.xml

fetch-deps:
      [get] Getting: http://asymmetrical-view.com/personal/repo//ant-1.7.0.jar
      [get] To: /home/mortis/personal/projects/sandbox/clojure-utils/lib/ant-1.7.0.jar
      ...and lots more...

compile:
    [mkdir] Created dir: /home/mortis/personal/projects/sandbox/clojure-utils/target/classes
    [javac] Compiling 1 source file to /home/mortis/personal/projects/sandbox/clojure-utils/target/classes

jar:  
      [jar] Building jar: /home/mortis/personal/projects/sandbox/clojure-utils/target/krb-clojure-utils-0.1.jar

BUILD SUCCESSFUL
Total time: 23 seconds
kyle@indigo64 ~/personal/projects/sandbox/clojure-utils[master]$ ant repl

You can use ant repl to run a Clojure REPL with the necessary dependencies on the classpath. Paste or type in the example functions to see them in action:

(quartz-test)
;; wait a bit to see it run (it should run every 10s)
;; stop the scheduler to remove all tasks
(stop-scheduler)

;; define an atom to hold the count
(def *count* (atom 0))
;; schedule an anonymous function
(quartz-test-fn (fn [context] 
                  (reset! *count* (inc @*count*))
                  (prn (format "anon scheduled function! context=%s called %d times!" context @*count*))))
;; wait a bit to see it run (every 10s)
;; stop and exit the JVM
(stop-scheduler)
(System/exit 0)

The next thing I’d like to do is implement a prototype scheduler service exposing Quartz over AMQP (RabbitMQ).

Kyle Burton, 19 May 2009 – Wayne PA

Introduction to Git talk at PLUG West Tonight at 7pm

Kyle Burton — 2009-05-18T00:00:00-04:00

Introduction to Git talk at PLUG West Tonight at 7pm

I’m giving my Introduction to Git talk tonight at the West chapter of the Philadelphia Linux Users Group. I’m bringing some minor give-aways, usb drives and post-it note pads, in case that is enticing :).

Intro To Git

View more presentations from kyleburton.

Kyle Burton, 17 May 2009 – Wayne PA

Extending Jekyll

Kyle Burton — 2009-05-17T00:00:00-04:00

Extending Jekyll

After using Jekyll to build my personal website, I was having some issues with my stylesheet not being reloaded by Firefox. Frameworks like Ruby on Rails provide an HTML helper that add a short querystring which is simply the last modification time of the file being linked to. This interacts nicely with the browse’s cache since it can cache the stylesheet and when you change it, the URL will be different (with the new mtime) and the browser will load the new stylesheet.

Rails provides stylesheet_link_tag, which you use as follows:

  <%= stylesheet_link_tag 'app', :media => "all" %>

stylesheet_link_tag then renders the following HTML:

Where 1234672637I is the last modification time (or mtime) of the stylesheet file. I wanted to have the same feature in Jekyll. Jekyll uses of the Liquid text templating system, and Liquid supports being extended allowing you to add your own filters, tags and tag-blocks. Tags are used for inserting content:

   { { post.content } }

Filters are used to transform content, an example from the atom.xml formats the date:

   { { post.date | date_to_xmlschema } }

What I wanted to be able to write was something like:

   { % stylesheet "/css/style.css" % }

Which meant I had to make a new tag. I created the tag file lib/jekyll/tags/stylesheet_link.rb:

  module Jekyll

    class StylesheetTag < Liquid::Tag
      def initialize(tag_name, file, tokens)
        super
        @file = file
      end

      def find_stylesheet(context)
        file = @file
        file.strip!
        file.gsub! /^["']/, ""
        file.gsub! /["']$/, ""
        if ! file =~ /.[a-z]+$/
          file = "#{file}.css"
        end

        files = [File.join(context.registers[:site].source, @file),
                 # strip a leading slash
                 File.join(context.registers[:site].source, @file[1..-1]),
                 ]
        files.each {|file|
          if File.exists? file
            return file
          end
        }
        return file
      end

      def render(context)
        file = find_stylesheet(context)

        mtime = nil
        if ! File.exists?(file)
          warn "Stylesheet file: '#{@file}' not found (#{file})"
          mtime = rand 1000000000
        else
          mtime = File.mtime(file).to_i
        end

        return %Q{}
      end
    end

  end

  Liquid::Template.register_tag('stylesheet', Jekyll::StylesheetTag)

All tags derive from Liquid::Tag and implement both initialize and render. In the constructor the StylesheetTag saves off the argument (file). In render it attempts to find the css file and then either uses a random value or the mtime of the file (if it was found), finally returning the string representing the stylesheet link.

Finally I added the module to the list of requires in lib/jekyll.rb so it is loaded when Jekyll runs:

  ...
  # internal requires
  require 'jekyll/core_ext'
  require 'jekyll/pager'
  require 'jekyll/site'
  require 'jekyll/convertible'
  require 'jekyll/layout'
  require 'jekyll/page'
  require 'jekyll/post'
  require 'jekyll/filters'
  require 'jekyll/tags/highlight'
  require 'jekyll/tags/include'
  require 'jekyll/tags/stylesheet_link'
  require 'jekyll/albino'
  ...

Once that was completed the stylesheet tag generated links with the mtime as a query parameter and my stylesheets no longer had the caching issue. I’m offering my changes back to Tom Preston-Werner, but you can fork or clone my repository if you want to try them yourself.

Kyle Burton, 17 May 2009 – Wayne PA

Getting Started With Jekyll

Kyle Burton — 2009-05-14T00:00:00-04:00

Getting Started With Jekyll

I took at look at Toby Di Pasquale’s website tonight and it inspired me to get around to sprucing up my own site.

Toby used Jekyll, a site generation and maintenance. Jekyll allows you to maintain your site using liquid (a text based templating tool), Yaml (for configuration) and Markdown (a simplified wiki-ish style markup language). Jekyll is a bit different from dynamic Blogging or publishing platforms like WordPress in that it intentionally generates your site as static content. Most of the other publishing platforms dynamically generate their content at the time the page is requested by the browser.

Jekyll allows you to refactor your content, still allowing you to use templates and simplified markup, while allowing your site to be completely static. This means you can deploy your site (or blog) to servers where you only have a web-server and no application software – it also means that your site will perform as well as your web-server can serve up static content, which is about as fast as you can get.

Setting the site up with Jekyll took only a few minutes…tweaking the css for the site and adding in some basic content took me a few hours.

Installation

You will need Ruby installed along with RubyGems. You will not need Ruby on Rails. Once Ruby and Gem are set up, follow the remaining instructions in the installation guide. A good place to start is by cloning one of the example sites:

kyle@asymmetrical-view ~/projects$ git clone git://github.com/mojombo/mojombo.github.com.git

This will give you a point of reference for creating your own site. I made a directory and am using "git"h:ttp://git-scm.com/ to version my site.

Create your _config.yml

I initially had an incomplete `_config.yml` when I first tried to run Jekyll and it raised an exception. Start with the downloaded example or use mine:

    destination: ./_site
    auto:        false
    lsi:         false
    server_port: 4000
    pygments:    true
    markdown:    maruku
    permalink:   date
    maruku:
      use_tex:   false
      use_divs:  false
      png_dir:   images/latex
      png_url:   /images/latex

Directory Structure

I created a sub-directory for each of the major areas in the site (posts, projects, contact, talks and about), a directory for the ‘blog’ posts (_posts) and for the layouts.

./site/_posts/2000-05-14-starting-with-jekyll.textile
./site/projects/index.html
./site/index.html
./site/contact/index.html
./site/css/style.css
./site/_layouts/post.html
./site/_layouts/default.html
./site/_config.yml
./site/about/index.html

Set up a Layout

In the examples you’ll see a layout file in _layouts/default.html

The layouts are your page templates. The template has the surrounding portion of the HTML page, with an inner yield block for the actual page content:

  
  
  
  ...
  
    
  { { content } }
  
  
  ...

Then make use of your layout, for example in my ./site/index.html, I have a Yaml front matter that specifies the layout and the page title.

  ---
  layout: default
  title: Seeking No Barriers To Abstraction
  ---
  
    Welcome to Asymmetrical View

    
      This is the home site of Kyle Burton on the Internet.
    

    Recent Posts
    
      { %% for post in site.posts limit:5 %% }
        { { post.date | date_to_string } } » { { post.title } }
      { %% endfor %% }

Creating Posts

My posts are all in textile (markdown) format, using a similar technique as with the html files they have a front matter section and then some main content in the markup:

  ---
  layout: post
  title: Getting Started With Jekyll
  ---

  h1. { { page.title } }

  I took at look at Toby Di Pasquale's website tonight and it inspired me to get around to sprucing up my own site.

Building and Testing Your Site

To build out your site, just run jekyll. You can also run it in a simple server mode in which it will attempt to recognize when you change files (via their last modification time) and rebuild the effected portions of your site. One thing to keep in mind is that when you run jekyll with --auto it will not report on exceptions as robustly as it will when only invoked to perform teh build. The workflow I use involves perdiocally restarting the jekyll server and removing the site build directory:

  mortis@kburton-lin ~/personal/projects/this-blog/site[this-blog*]$ rm -rf _site/ && jekyll  --auto --server 4001

When I stop seeing my changes reflected in the site I’ll CTRL-C Jekyl and re-run that command.

Conclusion

Installing and learning Jekyll was easy and it provides me with a much lighter weight process for maintaining my personal web site.

Kyle Burton, 14 May 2009 – Wayne PA

Introduction to Git talk at PLUG North Tonight at 7pm

Kyle Burton — 2009-05-11T00:00:00-04:00

Introduction to Git talk at PLUG North Tonight at 7pm

I’m giving my Introduction to Git talk tonight at the north chapter of the Philadelphia Linux Users Group. I’m bringing some minor give-aways, usb drives and post-it note pads, in case that is enticing :).

Intro To Git

View more presentations from kyleburton.

Kyle Burton, 11 May 2009 – Wayne PA

Introduction to Git talk at PLUG Central Tonight at 7pm

Kyle Burton — 2009-05-06T00:00:00-04:00

Introduction to Git talk at PLUG Central Tonight at 7pm

I’m giving my Introduction to Git talk tonight at PLUG Central the main chapter of the Philadelphia Linux Users Group. I’m bringing some minor give-aways, usb drives and post-it note pads, in case that is enticing :).

Intro To Git

View more presentations from kyleburton.

Kyle Burton, 06 May 2009 – Wayne PA

List Comprehensions in Clojure

Kyle Burton — 2008-11-18T00:00:00-05:00

List Comprehensions in Clojure

This is a basic example of the list comprehension support in Clojure, the same feature supported by Erlang, Haskell and Common Lisp (via a libraries like incf-cl).

;; generate the positions of a chess board:
(for [file "ABCDEFGH"
      rank (range 1 9)]
  (format "%c%d" file rank))

;; Evaluate and put the result into the buffer: C-u 8 C-x C-e
;; ("A1" "A2" "A3" "A4" "A5" "A6" "A7" "A8" 
;;  "B1" "B2" "B3" "B4" "B5" "B6" "B7" "B8" 
;;  "C1" "C2" "C3" "C4" "C5" "C6" "C7" "C8" 
;;  "D1" "D2" "D3" "D4" "D5" "D6" "D7" "D8" 
;;  "E1" "E2" "E3" "E4" "E5" "E6" "E7" "E8" 
;;  "F1" "F2" "F3" "F4" "F5" "F6" "F7" "F8" 
;;  "G1" "G2" "G3" "G4" "G5" "G6" "G7" "G8" 
;;  "H1" "H2" "H3" "H4" "H5" "H6" "H7" "H8")

(count (for [file "ABCDEFGH"
              rank (range 1 9)]
          (format "%c%d" file rank)))
;; Make sure it 8x8 (64)

;; the pythagorean triples example:
(for [aa (range 1 10)
      bb (range 1 10)
      cc (range 1 10)
      :when (= (* cc cc)
               (+ (* aa aa)
                  (* bb bb)))]
  (list aa bb cc))
;; ((3 4 5) (4 3 5))

;; all permutations
(defn all-permutations [things]
  (if (= 1 (count things))
    (list things)
    (for [head things
          tail (all-permutations (disj (set things) head))]
      (do
        (cons head tail)))))

(all-permutations '(a b c))
;; ((a c b) (a b c) (b a c) (b c a) (c a b) (c b a))

Kyle Burton, 11 Nov 2008 – Wayne PA

Cloud Con East 2008

Kyle Burton — 2008-10-21T00:00:00-04:00

Cloud Con East 2008

Overall Chariot Solutions Computing Among The Clouds was a great conference.

Cloud computing is not well defined and mostly correlates to the movement of applications, services and compute resources (machines, storage, queuing services) into hosted data centers and billed based on usage. It brings with it concepts of dynamic provisioning and relinquishing of resources.

So, how do you evaluate the current cloud offerings?

Stick With Your Own Data Center If

You have a steady baseline load.

AWS, used 24×7×365 is more expensive (in 2008) than owning and hosting physical machines in terms of capex for many organizations.

You have hard SLAs

If you need more than 3 or 4 9’s, you are better off with more traditional hosting or your own data center. Currently AWS doesn’t guarantee up-time as well as many medium and large business can with their own dedicated IT staffs and data centers.

You can handle your peak loads

If you have constant processing loads or have an SLA that requires you to have enough spare capacity to handle any given peak, then you’re better served with your own data center. There is no reason to rent capacity if you already have it.

You have sensitive data

You may require more certainty about where the data gets stored and who has access to it. This is likely to be a legislative or contractual issue than organizational. Though there are cloud computing platforms and initiatives that are working towards security data data protection certifications.

On Demand (AWS EC2) Is Right For You If

You have intermittent load

Your needs scale up and then can scale back down, thus saving on keeping spare capacity on-line.

You can plan for the Peaks

If you can anticipate load spikes, then you will have time to provision resources to handle those loads.

You have no capitol budget

If you have no capitol budget but must do large scale testing or data analysis, then renting resources will be significantly cheaper than buying hardware.

You want to charge based on utilization

You have a service where you can charge in direct proportion to utilization rather than based on capacity.

Higher Level Application Stacks, PaaS Google App Engine

Keep in mind these are new and the space is still being explored, more will appear. Models will develop around these PaaS providers [Platform as a Service], more languages and frameworks will be supported – though they will mostly will be based on those that can be easily hosted, sandboxed and run-time instrumented, which is why you’re seeing Python first and Java following closely along.

These providers hide the provisioning from you. Google’s offering dynamically scales your application up and down based on utilization. This provides a significant reduction in your design and administrative overhead for web development projects.

Simplicity of application development and scalability are rarely found in existing technologies, this will be one of the more interesting segments to watch mature.

Organizations and individuals are starting to learn how they need to change the design of their services and applications to take better advantage of the Cloud. It takes a change in mind-set – the phrase was dropped “Machine Instances are the new processes” and I think that’s an appropriate framing of one of the changes in mindset for taking advantage of the mass of resources that is becoming available.

Changing your software to be more easily bootstrapped, eliminate the assumption that you have access to local, disk based services – everything is pulled remotely, use URLs, and services – don’t assume local interfaces, assume remote. Design to come up / boot faster – scaling up / down quickly you don’t get the same amortization over time for start up costs. Design with the crash fast mentality – as robust as these systems are, you should still design with the idea that the system could go away at any moment. In addition to the benefit of quickly recovering from unexpected outages, this allows you to scale down faster, not just up. Keep your persistent data in the providers data stores and use the provided queuing systems to distribute work.

Gaining Wider Acceptance

There are some things these offerings need to do to gain wider acceptance.

Harder SLAs will develop as there is more competition. Higher level tools will develop on top of the instance-based cloud offerings (EC2) to allow for more automated provisioning – this will make it easier for you, but as easy as the PaaS stacks like GAE will make it.

We’ll see tools and offerings develop that will come down towards traditional data centers to allow a simpler mixing of a traditional service with bleed over to the cloud as resources need to be scaled up but also so that you can have control over the processing of your (sensitive) data in your own protected environment but push generic activity up into the cloud as necessary.

Other notable happenings

Microsoft is creating AMIs for Windows on EC2.

Google just announced that Java will be a supported App Engine development language – previously only Python was supported.

Haskell in the corporate environment

This session seemed out of place for the event – not really Cloud oriented. Though I personally see Functional Programming being a larger industry trend and something that facilitates concurrency and parallelization. It follows from structured -> procedural -> object oriented -> functional – with respect to the time line of coming out of academia at least, not necessarily the idea of one being ‘higher level’ than the other – though so far, time has implied that with the other programming paradigms.

The presenter, Jeff Polakow, is using it extensively at his current employer.

Those kinds of firms (Wall St.) allow a lot of latitude to the technical staff, so its easier to experiment (R&D) with new technologies. It’s much harder for a company like my own to decide to take on something like this – it’s hard to find developers who know how to develop, deploy, monitor and design with these technologies.

Functional Programming trend is being pushed into industry by the shift to multi-core, the past difficulties of developing concurrent, the more wide spread need for parallel/distributed applications (concurrency is the new garbage collection – it will become something that developers no longer control manually), the need for infrastructural level automatic scaling, and the easier path to robustness that languages like Erlang offer.

In languages like Java, you have to take into consideration all the libraries (where the default development practice in many cases is to not consider the re-entrancy – you can’t make the assumption that code is thread-safe in Java) you’re using with respect to their referential transparency – it’s not the default. In the FP languages referential transparency is the default case, so you can, in general, make that assumption. The underlying stack can also make that assumption about your code as well – which is why the concurrency / distribution model is less coupled to the implementation than it is in the more imperative languages.

Horizontal Scaling with HiveDB

CafePress has a large catalog. I was surprised to hear that they have 265 million products across all their customers catalogs. They have a low margin based on the aggregate amount of data they have to store and serve up, so commercial solutions like Oracle were just not an option for them simply due to cost.

Cafe spent time analyzing their options and didn’t find anything that fit their needs (cost, performance, on-line resharding), and went down the path of creating a more scalable data storage architecture themselves.

The solution they created performs better, scales better, is more robust and has a better SLA than many of the commercial solutions (their words).

Cafe’s DAL is effectively a hibernate extension that uses MySQL to do data partitioning (pseudo-automatically) by using a set of replicated MySQL databases as a catalog to map to where the data is stored for your shard (replicated 3x). The system supports dynamic repartitioning – migration of shards away from a shard-host to get less busy data away from data that is more ‘hot’ – the busiest data sets end up on their own shard-node with everything else having been pushed away from them.

They only need to ‘lock’ is for a single user when migrating their data off the shard. This is a write-lock, not a read lock – it only keeps the user from updating their own catalog of products while the move is taking place. Most users never notice when this happens. The system as a whole doesn’t go down (their words). The MySQL catalogs are replicated (3 machines, master-master, writing to 1) and can be upgraded by taking 1 of the 3 out of the cluster at a time. The same kind of approach goes for the other sharing servers.

Panel Discussion

The panel discussion was most memorable for how Chris and Toby seemed to dominate the discussion.

Hive and Hadoop

Hive is a data storage system developed on top of Hadoop with its own query language (HiveQL), built by Facebook. The goals are a bit different from HiveDB – HiveDB is more for OLTP, while Hive is more for large-scale analytics. Being built on top of Hadoop, HiveDB is much more batch oriented. Facebook uses it for doing analytics, data-mining, and machine-learning of their user and transactional data sets (logs, user activities, etc.) to mine out aggregate and trending intelligence from the large data set.

Interesting fact: Facebook’s Hive sees 2Tb of growth per day.

Building Scalable Web Applications with Google App Engine

PaaS stacks like GAE take a more managed environment approach than the more raw or primitive services provided by AWS style on-demand services. The two fit into different use cases though and, IMO, one will not necessarily eliminate the other.

GAE takes away from you all the concerns about deployment, production architecture, system management or administration. It gives you a data store with an OO API, and a web-app development environment that you develop your application within. There are things you can’t do, for example, you can’t run arbitrary software or services on GAE like you can on the more machine-image based cloud services (AWS EC2).

What you gain from giving up those capabilities is Google’s infrastructure for scaling, it becomes your infrastructure for scaling. Your app is designed in a pseudo-functional way – the stack encourages you to design your app to perform all dynamism at put/post time and to just render/display at get time. This approach helps with the scaling of the system. Storage location transparency helps with spooling up other instances of the app in disparate data centers, etc.

This kind of stack really makes it easy to develop the most common case of web applications – it is both easy to do and it scales. This is a combination that you rarely see in a platform or technology.

I see these kinds of stacks as becoming more established and a large part of Internet based application development – I think that more organizations will offer these kind of stacks across more technologies.

My advice: You should sign up for an account and try GAE out.

Developing and Deploying Java applications on Amazon EC2

Chris Richardson has created cloud-tools, a package of utilities (and a maven plug-in) for provisioning EC2 instances, pushing your application up and executing tasks across your cluster of instances. The tools look like they make it very easy to get your Java app into EC2.

Conclusion

The main theme I took away from it is that on-demand computing is a continuing trend. Services will continue to appear and be developed that will make taking advantage of these resource pools easier and more cost effective.

The trend for physical data centers will continue to become more and more outsourced to organizations that can provide those services with greater economy of scale. Currently Amazon’s offerings are slightly more expensive than a hosted system that you own – in the case where you need up-time or have high constant utilization. More guidelines are being developed showing when the trade off is appropriate. As a trend, cloud computing is still new and not well defined – it is likely that these trade offs will shift – even as soon as over the next few years (eg: it is likely, in my opinion, that the raw cost of 24×7 allocation for SMBs will fall below the cost of ownership due to these on-demand provider’s economies of scale).

We’re past the point of asking if your organization can make use of these on-demand providers and to the point where you should be identifying the areas where you can realize savings by taking advantage of these services.

Kyle Burton, 21 Oct 2008 – Malvern PA

Common Lisp - destructuring-bind

Kyle Burton — 2008-09-18T00:00:00-04:00

Common Lisp – destructuring-bind

Destructuring bind examples in Common Lisp:

;; This is a typical usage, for pulling apart a list
(destructuring-bind
      (first second)
    '(1 2)
  (format t "~%~%;;; => first:~a second:~a~&" first second))
;;; > first:1 second:2

;; You can also pull apart improper lists:
(destructuring-bind
      (first . second)
    '(1 . 2)
  (format t "~%~%;;; => first:~a second:~a~&" first second))

;;; > first:1 second:2

;; The first argument to destructuring-bind is a lambda list, but you
;; can grab the remainder by either using a dotted list:

(destructuring-bind
      (first second . stuff)
    '(1 2 3 4 5)
  (format t "~%~%;;; => first:~a second:~a rest:~a~&" first second stuff))

;;; => first:1 second:2 rest:(3 4 5)

;; or you can grab the remainder with &rest, just like you do for
;; functions that take a variable number of arguments:
(destructuring-bind
      (first second &rest stuff)
    '(1 2 3 4 5)
  (format t "~%~%;;; => first:~a second:~a rest:~a~&" first second stuff))

;;; => first:1 second:2 rest:(3 4 5)

;; It really is a lambda list, you can use default parameters:
(destructuring-bind
      (first second &optional (third 'default))
    '(1 2)
  (format t "~%~%;;; => first:~a second:~a third:~a~&" first second third))

;;; => first:1 second:2 third:DEFAULT

(destructuring-bind
      (first second &optional (third 'default))
    '(1 2 3)
  (format t "~%~%;;; => first:~a second:~a third:~a~&" first second third))

;;; => first:1 second:2 third:3

;; And you can use keyword parameters:
(destructuring-bind
      (first second &key third)
    '(1 2 :third 3)
  (format t "~%~%;;; => first:~a second:~a third:~a~&" first second third))

;;; => first:1 second:2 third:3

;; Finally, you can use it to 'unparse' trees as well, which is a
;; really great feature, since your variable declaration matches the
;; 'shape' of the data strucutre you're pulling apart.  This technique
;; is really handy for dealing with XML after it's been converted to
;; s-expressions.
(destructuring-bind
      (a (b (c d e (f g) h i j)) &rest remainder)
    '(1 (2 (3 4 5 (6 7) 8 9 10)) 11 12 13 14 15)
  (format t
          "~%~%;;; => a:~a b:~a c:~a d:~a e:~a f:~a g:~a h:~a i:~a j:~a remainder:~a ~&"
          a b c d e f g h i j remainder))

;;; => a:1 b:2 c:3 d:4 e:5 f:6 g:7 h:8 i:9 j:10 remainder:(11 12 13 14 15)

Introduction to Lisp talk at PLUG North Tonight at 7pm

Kyle Burton — 2008-08-11T00:00:00-04:00

Introduction to Lisp talk at PLUG North Tonight at 7pm

I’m giving my Introduction to Lisp talk tonight at the north chapter of the Philadelphia Linux Users Group.

Introduction To Lisp

View more presentations from kyleburton.

Kyle Burton, 11 Aug 2008 – Wayne PA

A Survey of Fuzzy String Matching Algorithms at PLUG Central Tonight at 7pm

Kyle Burton — 2008-08-06T00:00:00-04:00

A Survey of Fuzzy String Matching Algorithms at PLUG Central Tonight at 7pm

I’m giving my Survey of Fuzzy String Matching Algorithms talk tonight at the central chapter of the Philadelphia Linux Users Group.

Fuzzy String Matching

View more presentations from kyleburton.

Kyle Burton, 06 Aug 2008 – Wayne PA

OSCon Day 1

Kyle Burton — 2008-07-21T00:00:00-04:00

OSCon Day 1

Morning

Andrew and I arrived for OScon 2008 registration and took advantage of the continental breakfast before heading up to the Intro to Python.

O’Reilly had the registration process pretty streamlined. They had a long bank of laptops which you needed only enter your registration code, or your email address (if you registered on the OSCon conference web site). Register, then walk up to the materials station and pick up your ID and badge card.

There were plenty of juices, coffee, fruit and pastries. There was also plenty of seating. To either O’Reilly’s or the Oregon Conference Center’s credit, things were very well organized.

Python in 3 Hours

The first conference room we were in must have had seating for a few hundred people and it was effectively full. There was limited space for each attendee and their items (it was at least cramped for me) – though they anticipated a laptop per person – so there were plenty of power strips laid along every other row of tables within easy reach of every single seat. It was well planned and laid out.

The intro to Python got underway at 8:30 and although it was geared toward an audience with some programming experience, it assumed (as the title suggested) no python experience. Steve Holden was a great speaker, filling in twice with anecdotes while technical issues were worked out with equipment (once was a mis-configuration of his laptop, the other was a power interruption).

Python is a very capable language. It is more consistent about its OO and syntax when compared to Perl. The python community is also a lot bigger on the use of common conventions. This is mostly focused on formatting (one expression per line), in-line documentation and coding style in general.

Functions are first class types, you can assign a function to a variable, you can implement the equivalent of funcall and apply in python. Functions can be passed as arguments. Python supports positional parameters, default values for function params and calling functions, any function, with positional arguments, named arguments, a tuple of arguments (similar to funcall), or a dictionary (an indirect way of using named arguments).

Python actually has a lot of features which were inspired by functional programming (including one of my favorites: list comprehensions).

Python is byte-compiled, like Java. You write code in a .py file, and the first time it is loaded as a module (import), python compiles the code for you. The time stamp check of the .pyc vs the .py file is transparently handled by python – no explicit make or compile step.

Strings are immutable, which is something that helps Jython be a natural fit in the JVM.

Python supports list destructuring, based on tuples. It’s easier to show an example than to try to explain:

  a, (b, c) = (1, (2, 3))

  print a,b,c => 1, 2, 3

Tuples, and this kind of binding syntax, are widely used in processing things like lists, and maps.

An interesting feature of the language is the pair of functions, local() and gobal(). local() returns a dictionary (Python’s name for a Map), of all of the variable bindings (and values) that are visible in the current scope (exclusive of global variables). globals() returns the variables in the entire module’s scope (not local, lexical or class scope, and not global in the sense of a Perl global – not universally global).

Other Highlights

The yield() form, is like a weak kind of continuation.

for, and while loops can have an else clause which is executed when the form does not execute.

The Python try/catch form (try/except/finally) can have an else form, again, which is executed if no exception was thrown in the try block.

Introduction to Django

After a break for lunch, both Andrew and I attended the Introduction to Django, presented by Jacob Kaplan-Moss.

Django is an MVC framework for Python for rapid development of interactive web sites. It is an MVC framework very much in the spirit of Ruby on Rails – I’ve done some work in Rails and the parallels are very close between the two frameworks.

Django has a code generation framework, an ORM layer (which is very similar to Rails’ ActiveRecord), an html template system (with a default syntax based on PHPs smarty template system), and integrated support for testing.

Django has an interesting testing feature called doctests. If you’ve worked with an interactive language with a REPL, you have probably used it to explore the behavior of code and to informally test the code. Doctests are a way of (almost literally) taking a cut and paste of the interactive session and vivifying the transcript as a regression test. I like the idea of a recorded test, but as Andrew and I talked about it he convinced me that the literal representation wasn’t the best choice for implementing those kinds of tests. I do like the reduction of effort that comes with that kind of testing and recognize the inherent informality of it.

All that said, Django (like Rails) is big on doing test driven development.

I looked up the status of Django on Jython and apparently it’s close to being a 1.0 release (nothing I’d recommend for use at my employer at the moment, but Sun has hired people to work on Jython and Django is one of the frameworks they are concerned with supporting).

I’m looking forward to tomorrow.

A Survey of Fuzzy String Matching Algorithms at Philly Lambda

Kyle Burton — 2008-06-26T00:00:00-04:00

A Survey of Fuzzy String Matching Algorithms at Philly Lambda

I’m giving my Survey of Fuzzy String Matching Algorithms talk tonight at Philly Lambda hosted by Algorithmics, See you at 7.

Fuzzy String Matching

View more presentations from kyleburton.

Kyle Burton, 25 Jun 2008 – Wayne PA

Idempotency or Singleton Memoization in Perl

Kyle Burton — 2008-06-18T00:00:00-04:00

Idempotency or Singleton Memoization in Perl

This is an example of a factory for creating a function who’s body will only fire once, returning the first computed result each time it is invoked thereafter.

sub makeDoOnce {
  my($sub) = @_;
  my $alreadyDone = undef;
  my @result      = undef;
  my $exception   = undef;
  return sub {
    die $exception if $exception;
    if ($alreadyDone) {return wantarray ? @result : $result[0];}

    eval {
      my $w = wantarray;
      if (not defined $w) {             $sub->(@_)}
      if ($w)             {@result    = $sub->(@_)}
      else                {$result[0] = $sub->(@_)}
    };
    $exception = $@ if $@;
    die $exception if $exception;
    $alreadyDone = 1;
    return wantarray ? @result : $result[0];
  };
}

This works by creating a closure over the $alreadyDone, @result and $exception variables. Within the returned sub, any exception is re-thrown, if the result was previously computed, it is returned. If no exception or previous result was calculated, then the original function is invoked, storing off the exception or result and returning or throwing as appropriate.

An example usage is:

  my $getStartTime = makeDoOnce(sub { time });
  print "We started at: ", scalar(localtime $getStartTime->()), "\n";
  sleep 5;
  print "We started at: ", scalar(localtime $getStartTime->()), "\n";

I often use this pattern for one-time initializations (loading plugin systems, ensuring file structures exist, etc), where I want the time of call to be flexible but the action it performs to happen only one time.

Kyle Burton, 18 Jun 2008 – Wayne PA

Basic Data Analysis at the Unix shell

Kyle Burton — 2008-04-28T00:00:00-04:00

Basic Data Analysis at the Unix shell

I often prefer the shell and Unix utilities to having to wait to load data into a relational database or MS Access. There are plenty of cases when an RDBMS is a better choice – especially when what you’re doing requires joins. At the shell it’s often possible to not even have to transform the encoding of the files before analyzing them. I have developed a couple of recipes for doing some SQL equivalents at the shell. These are a few that I just used so they’re fresh in my mind. Most of the time all it takes is a bit of imagination about how to create a simple data-flow by learning and composing a small handful of the ubiquitous Unix utilities.

All these examples will also work within the Cygwin environment for Windows, or at a Terminal in OS X (especially when combined with the additional software available via Fink or Mac Ports projects.

Counting Records

"SELECT COUNT(*) FROM TABLE"

Just selecting the count of records from an input file is one of the easiest things to accomplish (if your file is already line-oriented). The wc, or word count, utility can do this easily. By default it counts characters, words and lines. With ‘-l’ it will emit only the count of lines.

  user@host:~/data$ wc -l table.tab
  10

If you want to ignore the header, start with the second line (see the next example for a more thorough explanation):

  user@host:~/data$ tail -n +2 | table.tab
  9

Counting Distinct Values from a Column

"SELECT COUNT(DISTINCT(FIELD1)) FROM TABLE"

For getting a distinct count of values in a column:

  user@host:~/data$ cut -f1 table.tab | tail -n +2 | sort | uniq -c

This operations starts with the cut utility. cut allows you to take particular columns from a tab-delimited file, or character ranges from a fixed-width file. cut also allows you to specify the delimiter – but be warned that the commonly encountered CSV format is more complex than cut can handle (CSV can support embedded delimiters and quote characters, which are beyond the scope of what cut attempts to handle). The usage of cut takes the first column out of the input file.

The next part is the tail command. tail outputs the end or ‘tail’ of a file. The ‘-n’ option instructs tail to emit a specific number of lines (counted from the end of the file) when no ‘+’ sign is present on the number. The ‘+’ tells tail to start at the second line from the beginning (in stead of from the end). This effectively discards the header line.

Then the values from the first column themselves are sorted. This is necessary for the uniq command, which will only collapse or count duplicate lines when they are adjacent.

Finally we reduce duplicate lines with uniq. The ‘-c’ tells uniq to emit the count of duplicates when collapsing them.

Dealing with various file archive types

I often work with files in zip archives and tar (unix tape archive) archives, sometimes with additional compression applied to them (.Z, unix compress; .gz, gzip; and .bz2, bzip). It is possible to work with these files without having to unarchive or decompress them permanently if all you need is a simple count of lines or to only process them once.

Pulling a file from a Zip Archive

To pull one or more files from within a zip archive, and send them to another command (as part of a pipeline):

  user@host:~/data$ unzip -l archive.zip
  Archive: archive.zip
    Length    Date   Time   Name
   --------   ----   ----   ----
        34  04-28-08 11:01  table1.tab
        56  04-28-08 11:01  table1.tab
  user@host:~/data$ unzip -c archive.zip table1.tab table2.tab | wc -l
  36

That example uses the unzip command to pull 2 files out and send them to standard output (the -c option to unzip instructs it to print the files to standard out rather than extract them to the file system). Those two files were then sent to wc to get the combined line (record) count for the two files. We don’t have to worry about cleaning up the two files after the command has completed since they were never written to disk.

Kyle Burton, 28 Apr 2008 – Wayne PA

SRFI-26's cut macro

Kyle Burton — 2008-01-30T00:00:00-05:00

SRFI-26’s cut macro

I’m working on an introduction to lisp presentation for Plug West. I’m trying to think through examples of macros which are a good demonstration of what they’re for. Towards that end I picked Scheme’s srfi-26’s cut macro, which allows for specialization of argument, and try to extend it in a couple of ways. Below is the example code, including a simplified cut, a recursive (or tree) cut, and a pattern cut (which allows the cut points to be named).

  ;; the scheme cut macro...srfi-26, when written as:
  ;;
  ;;  (cut #'format "~a~&" <>)
  ;;
  ;; it produces:
  ;;
  ;; => #'(lambda (x) (format "~a~&" x))
  ;;
  ;; 'Lifting' the '<>' out of the form as an argument to the generated lambda.
  ;; It's 'flat' though, it'd be nice to have something which could work
  ;; on any form...a recursive cut, which is what we'll write, first a helper:
  
  (defmacro aprog1 (it &rest body)
    "Anaphoric prog1, returns the value of the first expression,
  executing all subsequent expressions for their side effects.  It binds
  the symbol 'it' to the result of the first expression.  This can also
  be seen as a 'construct and initialize' pattern.
  
    (aprog1
      (make-hash-table :test #'equal)
      (setf (gethash \"a\" it) 1)
      (setf (gethash \"b\" it) 2)
      (setf (gethash \"c\" it) 3))
    => #
  
  "
    `(let ((it ,it))
       ,@body
       it))
  
  (defmacro cut (fn &rest body)
    "srfi-26's cut in Common Lisp, some examples:

  (cut #'cons (+ a 1) <>) is the same as (lambda (x2) (cons (+ a 1) x2))
  (cut #'list 1 <> 3 <> 5) is the same as (lambda (x2 x4) (list 1 x2 3 x4 5))
  (cut #'list) is the same as (lambda () (list))
  (cut #'list 1 <> 3 <...>) is the same as (lambda (x2 . xs) (apply list 1 x2 3 xs))
  
  The following form is not supported in this version:

    (cut <> a b) is the same as (lambda (f) (f a b))

  Scheme does that simply by virte that it is a Lisp-1, I didn't go through the
  effort of doing the check. "
    (let* ((formals (list))
           (new-body
            (mapcar #'(lambda (item)
                        (if (equal '<> item)
                            (aprog1
                             (gensym)
                             (push it formals)
                             it)
                            item))
                    body)))
      `#'(lambda ,(reverse formals)
           (funcall ,fn ,@new-body))))

  ;; lets take a look at an expansion and try a couple of examples  
  ;; (macroexpand-1 '(cut #'format t "~a: ~a~&" "thing" <>))
  ;; (funcall (cut #'format t "~a: ~a~&" "thing" <>) 10)
  ;; (funcall (cut #'format t "~a: ~a~&" <> <>) "thing" 10)

  ;; Our next helper, the visitor pattern, invoke the function 
  ;; on each non-branch (leaf node) element in the tree, replacing
  ;; the existing value with fn's result.  NB: this is a depth first 
  ;; search.
  (defun map-tree (fn tree)
    (cond ((null tree)
           tree)
          ((not (listp tree))
           (funcall fn tree))
          (t
           (mapcar #'(lambda (elt) (map-tree fn elt)) tree))))
  
  ;; test it out, this should increment each number in the tree:
  ;; (map-tree #'(lambda (elt) (format t "x:~a~&" elt) (1+ elt)) '(1 2 (3 4 (5 6 (7)))))
  
  ;; With those tools we can enhance cut to use map-tree instead of map, providing
  ;; the recursive search for cut points.
  (defmacro rcut (&rest body)
    (let* ((formals (list))
           (new-body
            (map-tree #'(lambda (elt)
                          (if (equalp '<> elt)
                              (aprog1
                               (gensym)
                               (push it formals)
                               it)
                              elt))
                      body)))
      `#'(lambda ,(reverse formals)
           ,@new-body)))
  
  ;; see what it expands to
  (macroexpand-1
   '(rcut
     (cond ((> <> 1)
            (format t "first arg was >1, second arg is: ~a~&" <>))
           (t
            (format t "first arg was <1, third arg is: ~a~&" <>)))))
  

  ;; test out the resulting function  
  (let ((fn
         (rcut
          (cond ((> <> 1)
                 (format t "first arg was >1, second arg is: ~a~&" <>))
                (t
                 (format t "first arg was <1, third arg is: ~a~&" <>))))))
    (funcall fn 1/2 'a 'b)
    (funcall fn 2/1 'a 'b))
  
  
  
  
  ;; that's all well and good, but the whole depth first ordering can
  ;; be hard to think through with respect to how it maps to the
  ;; ordering of the function arguments.  What we want to try next
  ;; is pcut (pattern cut), for example:
  ;; 
  ;;    (pcut
  ;;      (cond ((> ?fst 1)
  ;;             (format t "first arg (~a) was >1, x is: ~a~&" ?fst ?x))
  ;;            (t
  ;;             (format t "first arg (~a) was <1, x is: ~a, y is:~a~&" ?fst ?x ?y)))
  ;;
  ;; The ordering still mattes, it's still depth-first, but now we can re-use arguments by 
  ;; name, without inventing new locals.

  ;; Helper predicate to see if we have a cut pattern symbol  
  (defun starts-with-? (sym)
    (equal "?"  (subseq (format nil "~a" sym) 0 1)))
  
  ;; the main difference here is DRY (don't repeat yourself, 
  ;; only capture each named binding once), and use the #'starts-with-? 
  ;; predicate instead of the equality test (#'equalp).
  (defmacro pcut (&rest body)
    (let* ((formals (list))
           (fml-hash (make-hash-table :test #'equal))
           (new-body
            (map-tree #'(lambda (elt)
                          (if (starts-with-? elt)
                              (aprog1
                               (or (gethash elt fml-hash)
                                   (aprog1
                                    (gensym)
                                    (push it formals)
                                    (setf (gethash elt fml-hash) it)
                                    it))
                               it)
                              elt))
                      body)))
      `#'(lambda ,(reverse formals)
           ,@new-body)))

  ;; take a look at the expansion
  (macroexpand-1
   '(pcut
          (cond ((> ?fst 1)
                 (format t "first arg (~a) was >1, x is: ~a~&" ?fst ?x))
                (t
                 (format t "first arg (~a) was <1, x is: ~a, y is:~a~&" ?fst ?x ?y)))))
    
  ;; test out the expansion
  (let ((fn (pcut
             (cond ((> ?fst 1)
                    (format t "first arg (~a) was >1, x is: ~a~&" ?fst ?x))
                   (t
                    (format t "first arg (~a) was <1, x is: ~a, y is:~a~&" ?fst ?x ?y))))))
    (funcall fn 1/2 'a 'b)
    (funcall fn 2/1 'a 'b))

Conclusion

Hopefully these are useful pedagalogical examples of macros in Common Lisp.

Kyle Burton, 30 Jan 2008 – Wayne PA

Asymmetrical View

Locating a bad expression in Emacs and Clojure

Locating a bad expression in Emacs and Clojure

Fixing A Broken Sudoers File on an Amazon's EC2

Fixing A Broken Sudoers File on an Amazon’s EC2

Conclusion

Narcissism, or don't try this at home

Narcissism, or don’t try this at home

Full Program

Glitch?

Conclusion

Look Ma, No Threads!

Look Ma, No Threads!

Conclusion

Cucumber, Gherkin and Multi-line arguments

Cucumber, Gherkin and Multi-line arguments

Simple Process Coordination with Tellmewhen

Simple Process Coordination with Tellmewhen

An Interactive Voice Response System in 10 Minutes

An Interactive Voice Response System in 10 Minutes

Deploing your First Application

1. Sign up for Heroku

Grab The Source Code For This Guide

Install the bundler gem

Run bundle install

Run heroku create

Push the application code to heroku:

Open the App in your Browser

Sign up for the Twilio free trial.

Configure Twilio’s Voice URL

Quick! Call Your Application!

Conclusion

Upcoming Talk: Large Data and Clojure

Upcoming Talk: Large Data and Clojure

Sampling a Sequence with Clojure

Sampling a Sequence with Clojure

SQL

Shuffle

Clojure and Large Result Sets

Clojure and Large Result Sets

First Attempt: LIMIT using an Offset

Back to the drawing board: Try Database Cursors

See Also

How We Deploy Our Clojure Services

How We Deploy Our Clojure Services

Chef

recipes/default.rb

templates/default/the-clj-serviced.init

/var/lib/the-clj-service/the-clj-service.sh

deploy.rb

Automating Capistrano Password Prompts with Expect

Automating Capistrano Password Prompts with Expect

Spawn and Expect

New Clojure Libraries: Bloom Filter and LFSR

New Clojure Libraries: Bloom Filter and LFSR

Bloom Filter

Linear Feedback Shift Register [LFSR]

How We Run Cucumber

How We Run Cucumber

Server Execution

Xnest

script/server Arguments

cucumber-runner

Creating Standalone Java Applications with Leiningen

Creating Standalone Java Applications with Leiningen

project.clj

src/cljcsv/core.clj

Special Thanks

Leiningen

Leiningen

Dependency Version Strings

:warn-on-reflection

:dependencies

:dev-dependencies

Example project.clj

Common Leiningen Commands

lein deps

lein test

Emacs + SLIME Integration

Special Thanks

Run `bundle install`

Run `heroku create`

First Attempt: `LIMIT` using an Offset

`script/server` Arguments

`cucumber-runner`

`project.clj`

`src/cljcsv/core.clj`

`:warn-on-reflection`

`:dependencies`

`:dev-dependencies`

`lein deps`

`lein test`

Paradigms of Artificial Intelligence Programming: Case Studies in Common Lisp
, by Peter Norvig

‘`set -e`’

‘`set -x`’