Pickling Case Classes to Memcached With Scala

Recently, I’ve been working on a rewrite of Sluggy Freelance - a friend’s site which I’ve worked on for about a decade now. Caching is a big part of keeping site cost down, and over the years I’ve come to trust Memcached. Fast, lightweight, and easy, Memcached has served me well over the years over several iterations of the site… from Perl, PHP, and Python.

Today, I’m rewriting the site in Scala with Scalatra and React.js. As a result, I’m discovering all sorts of new fun that I haven’t dealt with in Scala yet. One of these is using Memcached, and specifically serialising/deserialising Case Classes. With the combination of a good Memcached library - shade, in this case - and Scala Pickling*, I’ve found a powerful combination.

So far, I’m working with a datastructure representing navigation: the current book, chapter, and section as well as lists of others of those based on context. As a bonus… I’m actually serialising case classes that are fetched from Slick.

Distributing Akka Workloads - and Shutting Down Afterwards

Recently, as part of my role with the Professional Services team at Typesafe, I have been working on site at a customer who is using a lot of Akka and Play. During this time, I’ve gotten a chance to solve some interesting problems and answer obscure questions… which for those who like chasing these kinds of puzzles issues (like myself) is a fantastic way to spend the day (and if this kind of thing sounds exciting to you, we’re aggressively hiring for this kind of work ;) )

One item in particular came up recently as we tried to create a cron-style job to do interval data processing – big blocks of input data would be separated into individual instructions for processing, using Akka 2.0.x. The developer I was working with found that, among other things, using only a single actor to process all of their data items was not particularly performant. Further, once we solved this problem we couldn’t figure out how to cleanly shut down Akka without interrupting any messages being processed. Fortunately, Akka offers simple answers to both of these problems… if you know where to look.

MongoDB Community – Call for Postcards

In the distant annals of MongoDB History, when Men were Real Men, Women were Real Women, and Small Furry Creatures from Alpha Centauri were real Small Furry Creatures from Alpha Centauri… There were postcards.

The tradition was started before I started at 10gen, wherein when we travelled somewhere for MongoDB we would send postcards back for Meghan Gill. Meghan, being the evil genius behind growing MongoDB’s community, was often indirectly responsible for our travel (which is why I of course sent a postcard from Armenia reminding her of this fact). When 10gen finally moved into our first New York office (having shared space, in the form of a few spare desks, with another company in the beginning), Meghan got her own office and created an entire wall for the postcards. As postcards came in they went up on the wall, and by the time 10gen moved to yet another office the wall was pretty impressive.

Unfortunately, between the new office not having a wall for postcards and newer employees not picking up the postcard tradition, our efforts fell off. Which makes me somewhat sad, because MongoDB’s community has grown all over the globe. With my departure from 10gen imminent, I had an evil thought – to bury poor Meghan in postcards from around the world, written and mailed by members of the community which she has helped create.

And so, I call upon you, MongoDB community members. Do your darndest, from your distinct corners of the globe, to flood Meghan with postcards about how much you love MongoDB. Bonus points if you send it from somewhere out of the way that nobody has sent her a postcard from yet!

If you are game, find your most awesome postcard from the town you live in (or even are travelling to), write a message and put a stamp on it. Send it away to this address:

Meghan Gill
c/o 10gen, Inc.
578 Broadway
7th Floor 
New York, NY 10012
United States of America

That’s it. My thanks in advance for helping advance my evil, evil plans.

-b

And Now for Something Completely Different!

I write this it is at the end of two very long years working with 10gen (The MongoDB Company), a company of less than 20 people when I joined that is now approaching 200. Although my role has been fungible during my time here, my primary focus has been to improve Scala integration for MongoDB, and evangelize it around the world. In this pursuit I’ve worn out an entire US Passport, as well as my favorite pair of Doc Martens.

In addition to writing code, I’ve trained new users (customers and colleagues alike), given presentations to audiences large & small, and helped companies to better deploy MongoDB in their applications – in over 50 cities and 15 countries (Those who know me now will expect a whinging anecdote about the time they sent me to Armenia…) I’ve spent a quarter of my time at 10gen working out of a London office that didn’t exist when I first arrived, traveling and helping build a MongoDB following in Europe.

Most excitingly I’ve had the opportunity to have met, and in some cases had the pleasure of working with, some of the most interesting and talented individuals I’ve ever encountered. Engineers, Developer Evangelists, Community Managers, Systems Administrators, and one self-described “Markitect” (Which is how Jared Rosoff once described his job to me; a cross between Marketing and Software Architecture). People like Meghan Gill (and her growing team), have helped create and grow the Open Source community around MongoDB – Meghan’s the one you can blame for the MongoDB mugs that are multiplying like tribbles around the globe.

At the risk of offending any by singling so few out, I could spend all day just citing excellence in many of my colleagues at 10gen – there is some incredible talent on hand. In my own time here, I’ve managed to develop and hone some previously unknown skills as a teacher, consultant, and developer evangelist. Through building software with hundreds of thousands of open source users, I’ve learned to be a better engineer. The most important lesson? That the users, and how they use (or want to use) your software matters above all else; observe, listen and adapt. The user may not always be right, but in many cases they can point you in the direction from which the wind is blowing.

The reality is that all of this has been an incredible adventure – but also a lot of hard work, and tremendously tiring. 10gen today is a very different company from the one I’ve joined – and about 10 times the size. I feel that I’ve reached the apex of what I wish to accomplish with 10gen, and I’ve spent a bit of time considering new challenges; it is far from easy to decide if I can leave all the great work I’ve done behind. So, when I finish my current tour of duty in Europe, I pursue that challenge.

As such, it is time that I announce my impending departure from 10gen, to pursue something new. MongoDB is a tremendous product, and 10gen has built something quite amazing in a very short time. Being in the community from early on, it has been humbling to watch how quickly they (we) have built a product, company and community. To further have had an opportunity to have participated in that was humbling, and I look forward to seeing what the future brings, even if I won’t be part of this particular story.

I’m turning the page, and starting a new chapter…

What’s next? After returning home from Europe in December I plan to take a few days off to gather my wits. After that, I’m very happy to announce that I’ll be joining the incredible team at Typesafe, as a member of their burgeoning Professional Services team. While there, I look forward to helping build, teach, and promote the foundation for the next generation of scalable applications. This generation includes the Scala programming language, Akka framework for distributed & concurrent computing, and Play! for web applications. I have been working with Scala and Akka for almost as long as I have MongoDB; for me, it is a natural progression forward.

The team that Typesafe has built is awe-inspiring, and the plentitude of knowledge and experience they offer are hard to refuse for someone seeking new challenges and skills.

By no means is this the end of my MongoDB life, merely my 10gen one. Many of you will still see me at the same conferences, looking much the same. Maybe in a different t-shirt (though I still have an awful lot of comfortable MongoDB ones) I may be promoting a different product, but I still believe in MongoDB and am always glad to help see that forward as well. I helped set the tone for Scala and MongoDB and don’t intend to abandon it. I plan to continue my contributions to projects like Casbah as I move forward. Projects to rethink MongoDB on the Scala platform, such as Hammersmith, will continue to occupy what spare time I find; I’m always happy to help with other projects and ideas that strike my fancy.

And, as I sharead with my coworkers already, to those of you I won’t see as much of in my new role… So Long, And Thanks For All The Fish!

Understanding Scala’s Type Classes

Over the last year or so, I have found myself making more and more use of Scala’s Type Class system to add flexibility to my code. This is especially evident in the MongoDB Scala Driver, http://github.com/mongodb/casbah, where the most recent work has been to simplify many features by migrating them to type classes.

During this work however, I’ve found during that many otherwise adroit Scala engineers seem befuddled or daunted by the Type Class. It does me no good to take advantage of clever features that my users don’t understand, and many will benefit from introducing these concepts to their own code. So let’s take a look at what type classes are, as well as how & why we can utilize them.

Wikipedia defines a Type Class as “… a type system construct that supports ad-hoc polymorphism. This is achieved by adding constraints to type variables in parametrically polymorphic types”. Admittedly, a bit of a mouthful – and not very helpful to those of us who are self taught and lack the benefit of a comprehensive academic Computer Science education (myself included). Surely, there must be a way to simplify this concept?

In evaluating these ideas, I’ve found it easiest to think of a Type Class (in Scala, at least) as a special kind of adapter, which can impart additional capabilities upon a given type or set of types. In Scala the Type Class is communicated through implicits, and imparts one, or both, of two behaviors. First, a Type Class can be to utilized to filter what types are valid for a given method call (which I detailed in this earlier post). Second, a Type Class can impart additional features and behaviors upon a type at method invocation time. This latter is much along the lines of an enhanced kind of composition, rather than the weaker inheritance which often plagues similar behaviors in a language like Java.

To better understand what I am describing, let’s compare a few concepts around the creation and interaction of custom domain objects. I have several sets of tasks I have had to accomplish in Scala in the past – and Scala solutions show some elegant Type Class oriented approaches which are rooted in the Standard Library. While this may seem a bit contrived, it is exactly the kind of problem through which I initially came to understand Type Classes –– and is thus an ideal lesson.

Forcing Scala Compiler ‘Nothing’ Checks

Since early in its history, Casbah has had a helper method called getAs[T], where T is “Some type you’d like to fetch a particular field as”. Because of type erasure on the JVM, working with a Mongo Document can be annoying – the representation in Scala is the equivalent of a Map[String, Any]. If we were to work with the Map[String, Any] in a standard mode, fetching a field balance which is a Double would require manual casting.

val doc: DBObject = MongoDBObject("foo" -> "bar", "balance" -> 2.5)

val balance = doc.get("balance")

We have already hit another issue here – in Scala, invoking get on a Map returns Option[T] (Where, in this case, T is of type Any). Which means casting has become more complex: to get a Double we also have to unwrap the Option[Any] first. A lazy man’s approach might be something hairy like so:

balance.getOrElse(null).asInstanceOf[Double]

In the annals of history (when men were real men, and small furry creatures from Alpha Centauri were real small furry creatures from Alpha Centauri), the above became an annoyingly common pattern. A solution was needed - and so getAs[T] was born. The idea was not only to allow a shortcut to casting, but take care of the Option[T] wrapping for you as well. Invoking getAs[Double] will, in this case, return us an Option[Double].

But not everything is perfect in the land of getAs[T] – if the type requested doesn’t match the actual type, runtime failures occur. Worse, if the user fails to pass a type, the Scala compiler substitutes Nothing, which guarantees a runtime failure. Runtime failures are bad – but fortunately, Miles Sabin & Jon-Anders Teigen came up with an awesome solution.

Casbah 2.3.0-RC1 Released

Today, I published the first Release Candidate of Casbah 2.3.0, available for SBT users as "org.mongodb" % "casbah" % "2.3.0-RC1". My release announcement to implicit.ly contains the details on all of the bugs fixed – I will also be posting another set of blog entries shortly outlining the specific improvements to the code and demoing fetaures.

It has been just about a year since the last major release of Casbah, which was version 2.1.5-1. A number of factors led to the delay in getting a major update out the door, for which I apologize. Amongst other things I have spent much of the last year since Casbah’s prior release on the road doing training, consulting and evangelization of MongoDB to users around the globe; I had less time for code among all these things than I expected! Additionally, after releasing the 2.1.x series of Casbah I embarked on what quickly morphed from Casbah 2.2.0 to 3.0.0 – a major refactoring and cleanup of 2+ years of API cruft and “I’m Gonna Learn Me Some Scala!” detritus. In all the excitement to release a perfect release to end all releases, I did a poor job of making it easy to backport and maintain a compatibility series for 2.x users – a harsh lesson in the importance of creating small, bite sized git commits that can be cherry picked.

So What Happened to Casbah 3.0? And 2.2?

Casbah 2.2.x is dead – 3.0.x is certainly not! When the work following 2.1.x was begun, I had published a number of early snapshots as Casbah 2.2.0-SNAPSHOT. During this development cycle I found a lot of the aforementioned detritus such as overloaded methods (Casbah was begun as a Scala 2.7.x project and I never fully moved its core APIs over to use named and default arguments - some of these are corrected in Casbah 2.3.0 but in the interest of backwards compatibility with prior releases of 2.x, not completely). As I worked on coding improvements around these things the API drifted further and further away from compatibility and I chose to kill off the 2.2.x series, planning the next release of Casbah as 3.0.0. In addition to that, I intended Casbah 3.0.0 to coincide with MongoDB 2.2.0 which will have additional features such as the New Aggregation Framework. As MongoDB 2.2 hasn’t been released yet, it became clear I needed to provide an updated Casbah release with many of the improvements but without many of the API breakages introduced in 3.0 - including a vastly improved build of the Casbah Query DSL which has stronger type safety and compiler checks thanks to the inimitable Jon-Anders Teigen.

Casbah 3.0 is still very much alive and in development, with 2.3.0 representing a backporting of many of the changes and improvements from 3.0. Because of the abandonment of the original 2.2 development series, I felt it was saner to kill 2.2.x dead and bring the backports into a 2.3.x series. You can, if you wish, think of this as Casbah 2.3 - The Search for Casbah 2.2 (The long rumored sequel to Spaceballs has been said to be called Spaceballs 3: The Search for Spaceballs 2).

I will continue to support and improve Casbah 2.3.x moving forward as well as completing Casbah 3.0 (still intended to coincide with the release of MongoDB 2.2). If you have any questions, please don’t hesitate to contact me!

Later tonight or tomorrow I will post an entry or two detailing all of the wonderful changes in Casbah 2.3.0 and how to take advantage of them.

Immutability and Clever Variable Usage in the Land of Blocks and Branches

Last night, I found myself unconciously refactoring some Scala code (I don’t recall if it was something I wrote or someone else did at this point). As I looked at what I was doing I realized that many Scala developers don’t seem entirely aware of one of my favorite features. What I’m talking about is effectively capturing values from multibranch block statements in Scala. Used correctly they can greatly decruft complicated code as well as helping us use mutability in places we might not expect an easy way to do so.

In typical C-like languages (such as C, C++, and Java) we are restricted in our syntax should we wish to capture a value when running many branching blocks such as if-else statements, switch statements and even for/foreach constructs. When we find ourselves wanting to set the value of a variable within each possible condition or iteration, we need to declare a mutable variable before the block. We then mutate this variable within each condition or iteration. Take this example from Java:

boolean valid = false;

String status = null;
if (valid) {
    status = "VALIDATED";
} 
else {
    status = "INVALID";
}

User Configurable Type Filtering With Scala Type Classes

When I woke up this morning and looked through my twitter mentions, I found this gem sitting there from the middle of the night:

@rit when using "_id" $lt new ObjectId(timestamp) it throws ValidDateOrNumericType, but we might want to select records after id timestamp (from @justthor)

The user in question is complaining that when using Casbah’s DSL, it doesn’t allow a MongoDB ObjectId as a valid type to the $lt operator. But as @justthor points out, it is entirely possible to use ObjectId with the $lt operator since it contains timestamp information (See the documentation for ObjectId if you want nitty gritty detail). When I wrote the code for $lt however, I needed to decide what types were valid and weren’t valid; I can’t exactly guarantee type safety wih a DSL like Casbah’s, but I can enforce type sanity. Whether I forgot that you can use ObjectId in $lt or just decided that most people wouldn’t need to is irrelevant — I had in this case blocked a user from accomplishing something valid that they needed to.

It is a more than reasonable problem, and my initial reaction was “oh crap, I guess I need to patch that”. But what I forgot is that a few releases back, I rearchitected Casbah to obviate this kind of problem. Casbah now allows for a user definable (or, if you prefer, “adjustable”) type filter on any of its DSL operators. This is accomplished through a very simple application of Scala Type Classes, a term which gets batted around a lot in the Scala community, but few seem able to understand or articulate its meaning to us lesser mortals. Over the last few months I’ve come to understand Type Classes much more deeply than I think I ever expected, and applied these lessons to the design of my code. As I failed to document the power and usage of these features at the time, I am going to be writing some additional detailed articles about my understanding of Type Classes in the next few weeks, and this is the first of such explanations.

So the question at hand is, how exactly does Casbah allow us to do this magical type filtering that I just mentioned, without patching the driver or creating a new release? First, let’s look at how Casbah used to do things before the introduction of the as-yet unexplained Type Class introduction.

MapReduce With MongoDB 1.8 and Java

In my last post, I introduced the new MapReduce features introduced in MongoDB 1.8, which is now available as a release candidate. Most importantly the temporary collection system has gone away, now requiring that you specify an output parameter. With that required output comes new options for how to create incremental output using the merge and reduce output modes.

As I write this, we are prepping new releases of our Java Driver (v2.5) and our Scala Driver, Casbah (v2.1) which are intended to support MongoDB 1.8’s new features including incremental MapReduce. Since I implemented the APIs for the new MapReduce output in both drivers, I thought I’d demonstrate the application of these new output features to the previous dataset. This post is focused on the Java API, but a Scala one will likely follow.

As a reminder (or a primer for those who skipped my last post), I’ve been testing the 1.8 MapReduce using a dataset and MapReduce job originally created to test the MongoDB+Hadoop Plugin. It consists of daily U.S. Treasury Yield Data for about 20 years; the MapReduce task calculates an annual average for each year in the collection. You can grab a copy of the entire collection in a handy mongoimport friendly datadump from the MongoDB+Hadoop repo; here’s a quick snippet of it:

{ "_id" : ISODate("1990-01-10T00:00:00Z"), "dayOfWeek" : "WEDNESDAY", "bc3Year" : 7.95, "bc5Year" : 7.92, "bc10Year" : 8.03, "bc20Year" : null, "bc1Month" : null, "bc2Year" : 7.91, "bc3Month" : 7.75, "bc30Year" : 8.11, "bc1Year" : 7.77, "bc7Year" : 8, "bc6Month" : 7.78 }
{ "_id" : ISODate("1990-01-11T00:00:00Z"), "dayOfWeek" : "THURSDAY", "bc3Year" : 7.95, "bc5Year" : 7.94, "bc10Year" : 8.04, "bc20Year" : null, "bc1Month" : null, "bc2Year" : 7.91, "bc3Month" : 7.8, "bc30Year" : 8.11, "bc1Year" : 7.77, "bc7Year" : 8.01, "bc6Month" : 7.8 }
{ "_id" : ISODate("1990-01-12T00:00:00Z"), "dayOfWeek" : "FRIDAY", "bc3Year" : 7.98, "bc5Year" : 7.99, "bc10Year" : 8.1, "bc20Year" : null, "bc1Month" : null, "bc2Year" : 7.93, "bc3Month" : 7.74, "bc30Year" : 8.17, "bc1Year" : 7.76, "bc7Year" : 8.07, "bc6Month" : 7.8100000000000005 }
{ "_id" : ISODate("1990-01-16T00:00:00Z"), "dayOfWeek" : "TUESDAY", "bc3Year" : 8.13, "bc5Year" : 8.11, "bc10Year" : 8.2, "bc20Year" : null, "bc1Month" : null, "bc2Year" : 8.1, "bc3Month" : 7.89, "bc30Year" : 8.25, "bc1Year" : 7.92, "bc7Year" : 8.18, "bc6Month" : 7.99 }