Blog Posts

A Look At MongoDB 1.8's MapReduce Changes

27 Jan 2011

MongoDB 1.7.5 shipped yesterday, and is expected to be the last 'beta' release of what will become MongoDB 1.8. As part of the release, I've been doing testing of the new MapReduce functionality and thought this a good time to highlight those changes for people.

If you aren't new to MongoDB MapReduce, the most important thing to note since MongoDB 1.6.x is that temporary collections are gone; it is now required to specify an output. Previously, if you omitted the out argument MongoDB would create a temporary collection and return its name with the job results; In non-sharded MongoDB setups these temporary collections would go out of scope and be cleaned up when the connection closed. Unfortunately, for sharded setups it wasn't possible to safely clean these up–--they would remain behind and clutter up the database. For this and other reasons the temporary collection feature was removed. There is good news though: they've been replaced with an even better system for saving the results of MapReduce jobs!

While the out argument is now a required parameter in MapReduce jobs, it has a number of options for controlling what MongoDB does with results. If you're running a truly one-off job where you don't need to keep the results later, MongoDB now supports returning results "inline". Be careful here though: your results are being returned in a single document and are subject to the document size limitations of MongoDB (16MB per document in 1.8). To use inline results, set the value of out to a document {inline: 1}. The result object will contain an additional key results which contains the MapReduce output; the result field will be omitted.

As with previous versions of MongoDB, you can specify a collection name (as a string) in the out argument. If the named collection already exists MongoDB will replace it entirely with the MapReduce results. Along with the inline mode, MongoDB 1.8 introduces support for "merge" and "reduce" output modes; instead of replacing the target collection MongoDB can be instructed to reconcile the MapReduce results with the existing data. To use these modes, set the value of out to a document with a key of either "merge" or "reduce" and a value of the collection to save to.

The difference in "merge" and "reduce" has to do with MongoDB does when it encounters duplicate keys in both the existing collection and the MapReduce results. In "merge" mode, MongoDB will simply overwrite the existing key with the new one from the MapReduce output. In "reduce" mode, MongoDB will run the reduce function again with both the new and old data, saving those results to the collection (you remembered to make your reduce function idempotent, right?). UPDATE: If you specified a "finalize" function, MongoDB will re-run this after the "reduce" runs.

Now that I've thoroughly confused you, lets dig into examples of each of these behaviors. I've been testing the 1.8 MapReduce using a dataset and MapReduce job originally created to test the MongoDB+Hadoop Plugin. It consists of daily U.S. Treasury Yield Data for about 20 years; the MapReduce task calculates an annual average for each year in the collection. You can grab a copy of the entire collection in a handy mongoimport friendly datadump from the MongoDB+Hadoop repo; here's a quick snippet of it:

{ "_id" : ISODate("1990-01-10T00:00:00Z"), "dayOfWeek" : "WEDNESDAY", "bc3Year" : 7.95, "bc5Year" : 7.92, "bc10Year" : 8.03, "bc20Year" : null, "bc1Month" : null, "bc2Year" : 7.91, "bc3Month" : 7.75, "bc30Year" : 8.11, "bc1Year" : 7.77, "bc7Year" : 8, "bc6Month" : 7.78 }
{ "_id" : ISODate("1990-01-11T00:00:00Z"), "dayOfWeek" : "THURSDAY", "bc3Year" : 7.95, "bc5Year" : 7.94, "bc10Year" : 8.04, "bc20Year" : null, "bc1Month" : null, "bc2Year" : 7.91, "bc3Month" : 7.8, "bc30Year" : 8.11, "bc1Year" : 7.77, "bc7Year" : 8.01, "bc6Month" : 7.8 }
{ "_id" : ISODate("1990-01-12T00:00:00Z"), "dayOfWeek" : "FRIDAY", "bc3Year" : 7.98, "bc5Year" : 7.99, "bc10Year" : 8.1, "bc20Year" : null, "bc1Month" : null, "bc2Year" : 7.93, "bc3Month" : 7.74, "bc30Year" : 8.17, "bc1Year" : 7.76, "bc7Year" : 8.07, "bc6Month" : 7.8100000000000005 }
{ "_id" : ISODate("1990-01-16T00:00:00Z"), "dayOfWeek" : "TUESDAY", "bc3Year" : 8.13, "bc5Year" : 8.11, "bc10Year" : 8.2, "bc20Year" : null, "bc1Month" : null, "bc2Year" : 8.1, "bc3Month" : 7.89, "bc30Year" : 8.25, "bc1Year" : 7.92, "bc7Year" : 8.18, "bc6Month" : 7.99 }

The map function I'm using extracts the year from the date, and the 10 year benchmark value:

function m() { 
    key = typeof( this._id ) == "number" ? this._id : this._id.getYear() + 1900; 
    emit( key, { count: 1, sum: this.bc10Year } ) ;
}

While the reduce function aggregates the data by year, creating a set that can be averaged. Remember that MongoDB reduce tasks have to be able to be called repeatedly, so the output is crafted to match the input: something that becomes even more important when we say, ask MongoDB to re-reduce our output with the old data.

function r( year, values ) { 
  var n = { count: 0, sum: 0 } 
  for ( var i = 0; i < values.length; i++ ){ 
      n.sum += values[i].sum; 
      n.count += values[i].count; 
  } 
   
  return n; 
} 

We'll round it all out out with a quick and dirty finalize function which can calculate the current average. Note that I'm keeping all the intermediate data around for demonstrating "reduce" mode.

function f( year, value ){
  value.avg = value.sum / value.count;
  return value;
}

First, a quick look at "inline" mode (I'll leave plain old name a collection as an exercise to you, my humble reader).

> res = db.runCommand(
...   { 
...     "mapreduce": "yield_historical.in",
...     "map": m,
...     "reduce": r,
...     "finalize": f,
...     "query" : { "_id" : { "$gt" : new Date(2000, 0, 1) } },
...     "verbose" : true , 
...     "out" : { "inline" : 1 }
...   }
... )
{
    "results" : [
        {
            "_id" : 1990,
            "value" : 8.552400000000002
        },
        /* ... */
        {
            "_id" : 2010,
            "value" : 3.3255026455026435
        }
    ],
    "timeMillis" : 218,
    "timing" : {
        "mapTime" : NumberLong(168),
        "emitLoop" : 215,
        "total" : 218
    },
    "counts" : {
        "input" : 2690,
        "emit" : 2690,
        "output" : 11
    },
    "ok" : 1
}

To demonstrate "merge" and "reduce" mode, I'm going to use queries to break out the data a bit. Lets look first at "merge", by first running MapReduce against the first half of the data, and then merge in the second half.

> res = db.runCommand(
...   { 
...     "mapreduce": "yield_historical.in",
...     "map": m,
...     "reduce": r,
...     "finalize": f,
...     "query" : { "_id" : { "$lt" : new Date(2000, 0, 1) } },
...     "verbose" : true , 
...     "out" : "yield_historical.merged",
...   }
... )
{
    "result" : "yield_historical.merged",
    "timeMillis" : 223,
    "timing" : {
        "mapTime" : NumberLong(166),
        "emitLoop" : 217,
        "total" : 223
    },
    "counts" : {
        "input" : 2503,
        "emit" : 2503,
        "output" : 10
    },
    "ok" : 1
}
> db.yield_historical.merged.find({}, {"value.avg": 1})
{ "_id" : 1990, "value" : { "avg" : 8.552400000000002 } }
{ "_id" : 1991, "value" : { "avg" : 7.8623600000000025 } }
{ "_id" : 1992, "value" : { "avg" : 7.008844621513946 } }
{ "_id" : 1993, "value" : { "avg" : 5.866279999999999 } }
{ "_id" : 1994, "value" : { "avg" : 7.085180722891565 } }
{ "_id" : 1995, "value" : { "avg" : 6.573920000000002 } }
{ "_id" : 1996, "value" : { "avg" : 6.443531746031743 } }
{ "_id" : 1997, "value" : { "avg" : 6.353959999999992 } }
{ "_id" : 1998, "value" : { "avg" : 5.262879999999994 } }
{ "_id" : 1999, "value" : { "avg" : 5.646135458167332 } }
> 

That gives us our first half of the data; we ran that with a normal named collection output. Lets merge in the second half:

> res = db.runCommand(
...   { 
...     "mapreduce": "yield_historical.in",
...     "map": m,
...     "reduce": r,
...     "finalize": f,
...     "query" : { "_id" : { "$gt" : new Date(2000, 0, 1) } },
...     "verbose" : true , 
...     "out" : { "merge" : "yield_historical.merged" },
...   }
... )
{
    "result" : "yield_historical.merged",
    "timeMillis" : 242,
    "timing" : {
        "mapTime" : NumberLong(173),
        "emitLoop" : 236,
        "total" : 242
    },
    "counts" : {
        "input" : 2690,
        "emit" : 2690,
        "output" : 21
    },
    "ok" : 1
}

> db.yield_historical.merged.find({"_id": {$gt: 1998}}, {"value.avg": 1}) 
{ "_id" : 1999, "value" : { "avg" : 5.646135458167332 } }
{ "_id" : 2000, "value" : { "avg" : 6.030278884462145 } }
{ "_id" : 2001, "value" : { "avg" : 5.020685483870969 } }
{ "_id" : 2002, "value" : { "avg" : 4.61308 } }
{ "_id" : 2003, "value" : { "avg" : 4.013879999999999 } }
{ "_id" : 2004, "value" : { "avg" : 4.271320000000004 } }
{ "_id" : 2005, "value" : { "avg" : 4.288880000000001 } }
{ "_id" : 2006, "value" : { "avg" : 4.7949999999999955 } }
{ "_id" : 2007, "value" : { "avg" : 4.634661354581674 } }
{ "_id" : 2008, "value" : { "avg" : 3.6642629482071714 } }
{ "_id" : 2009, "value" : { "avg" : 3.2641200000000037 } }
{ "_id" : 2010, "value" : { "avg" : 3.3255026455026435 } }

To close out, lets take "reduce" mode for a quick spin. We'll select a half of a year for the first part, and then reduce in the second half.

> res = db.runCommand(
...   { 
...     "mapreduce": "yield_historical.in",
...     "map": m,
...     "reduce": r,
...     "finalize": f,
...     "query" : { "_id" : { 
...         "$gte": new Date(2001, 0, 1),
...         "$lte" : new Date(2001, 5, 1) 
...     } },
...     "verbose" : true , 
...     "out" : "yield_historical.reduced",
...   }
... )
{
    "result" : "yield_historical.reduced",
    "timeMillis" : 21,
    "timing" : {
        "mapTime" : NumberLong(6),
        "emitLoop" : 17,
        "total" : 21
    },
    "counts" : {
        "input" : 105,
        "emit" : 105,
        "output" : 1
    },
    "ok" : 1
}
> db.yield_historical.reduced.find()
{ "_id" : 2001, "value" : { "count" : 105, "sum" : 539.5599999999998, "avg" : 5.138666666666665 } }

That handles the first half... Let's grab the second:

> res = db.runCommand(              
...   { 
...     "mapreduce": "yield_historical.in",
...     "map": m,
...     "reduce": r,
...     "finalize": f,
...     "query" : { "_id" : { 
...         "$gt": new Date(2001, 5, 1),
...         "$lte" : new Date(2001, 11, 31) 
...     } },
...     "verbose" : true , 
...     "out" : { "reduce" : "yield_historical.reduced" },
...   }
... )
{
    "result" : "yield_historical.reduced",
    "timeMillis" : 26,
    "timing" : {
        "mapTime" : NumberLong(9),
        "emitLoop" : 22,
        "total" : 26
    },
    "counts" : {
        "input" : 143,
        "emit" : 143,
        "output" : 1
    },
    "ok" : 1
}
> db.yield_historical.reduced.find()
{ "_id" : 2001, "value" : { "count" : 248, "sum" : 1245.1299999999997, "avg" : 5.020685483870967 } }

Of course, this does us no good if the results don't add up. A quick comparison between the 'merged' output and the 'reduced' output validates our code:

> db.yield_historical.reduced.find({_id: 2001})
{ "_id" : 2001, "value" : { "count" : 248, "sum" : 1245.1299999999997, "avg" : 5.020685483870967 } }
> db.yield_historical.merged.find({_id: 2001}) 
{ "_id" : 2001, "value" : { "count" : 248, "sum" : 1245.1300000000003, "avg" : 5.020685483870969 } }

There are some minor differences at a decimal level since we are working with floating point numbers here, but the results are the same.

These new MapReduce output parameters are available in MongoDB as of version 1.7.4 (which is part of the unstable/development branch) and will ship with MongoDB 1.8. Leave a comment; I'd love to hear what clever tricks you can pull off with these new options.

Exploring Scala with MongoDB

10 Jan 2011

2010 proved to be a great year for growth and adoption of many fledgling technologies---not least among them, MongoDB and Scala. Scala is designed as an alternative language for the Java platform, with a focus on scalability. It merges many of the Object Oriented concepts of languages like Java and C++ with the functional tools of Erlang, Haskell and Lisp with a bit of the dynamic natures of modern languages like Ruby and Python. This flexible nature has sped Scala's adoption in the technology stacks of platforms like LinkedIn, Twitter, FourSquare and many more. By running on the JVM Scala has a strong affinity for working alongside existing Java applications, which allows users to build on their existing technology investments.

For 2011, MongoDB has added official support for Scala with the release of Casbah, a Scala driver for MongoDB. Casbah is built around the existing MongoDB Java Driver to give it a strong foundation, but designed to take advantage of many of the idioms of Scala such as a strong collections library, fluid syntax for building DSLs and functional concepts like closures and currying.

Because it is designed to be easy to work with for Scala users, Casbah introduces a more 'friendly' syntax for creating MongoDB Objects, using Scala's Map syntax:

 
import com.mongodb.casbah.Imports._

/** Create an object directly */
val newObj = MongoDBObject("foo" -> "bar",
                           "x" -> "y",
                           "pie" -> 3.14,
                           "spam" -> "eggs")

/** Or, use a builder interface */
val builder = MongoDBObject.newBuilder
builder += "foo" -> "bar"
builder += "x" -> "y"
builder += ("pie" -> 3.14)
builder += ("spam" -> "eggs", "mmm" -> "bacon")
val newObj = builder.result

The goal of this syntax is to be more readable, similar to what one might expect from a dynamic language like Ruby or Python. In contrast, the same statements in Java tend to be more verbose:

import com.mongodb.*;

DBObject newObj = new BasicDBObject();
newObj.put("foo", "bar");
newObj.put("x", "y");
newObj.put("pie", 3.14);
newObj.put("spam", "eggs");

/** or, builder style */

BasicDBObjectBuilder builder = BasicDBObjectBuilder.start();
builder.add("foo", "bar");
builder.add("x", "y");
builder.add("pie", 3.14);
builder.add("spam", "eggs");
builder.add("mmm", "bacon");
DBObject newObj = builder.get();

The semantics of working with Collections and Cursors in Casbah are similar to the Java driver they wrap, with a bit of Scala-friendly syntactic sugar added for things like for comprehensions. Where Casbah really shines is in its use of a DSL syntax for creating MongoDB Queries.

/** This Query Object ... */
val query = new MongoDBObject(
                "foo" -> MongoDObject("$gte" -> 5, "$lte" -> 10),
                "baz" -> 5,
                "x" -> "y",
                "n" -> "r"
            )
/** Can be constructed instead with the Query DSL: */
val queryDSL = ("foo" $gte 5 $lte 10) ++ ("baz" -> 5) ++ ("x" -> "y") ++ ("n" -> "r")

/** Easily create negated statements. 
    Instead of a nested DBObject constructor like this: */

val ltGt = MongoDBObject(
            "foo" -> MongoDBObject(
                "$not" -> MongoDBObject(
                    "$gte" -> 15, 
                    "$lt" -> 35.2, 
                    "$ne" -> 16)
                )
            )

/** Use Casbah's Query DSL to say it much simpler */
val ltGtDSL = "foo" $not { _ $gte 15 $lt 35.2 $ne 16 }

All of MongoDB's $ Operators including Geospatial Queries are supported by Casbah's DSL.

This is just a small taste of what Casbah and Scala offer to the MongoDB user, but we encourage you to explore more. Version 2.01 is now available for download.

Talking to ActiveDirectory from IronPython

20 Oct 2009

We're building a new intranet system at work, and I've been toying with a few things that the Windows admin asked for. Namely, since the secretaries here will update the intranet data to add people's Work & Emergency contact numbers, AIM handles, email addresses, etc. that we find a way to keep it all in sync with ActiveDirectory. Thereby keeping all the Outlooks and Blackberries up to date with the latest contact information.

This seemed like a fairly reasonable request, presuming we could figure out how to do it and since I've been using Mono and IronPython a lot more lately, I figured there would be a way to accomplish it. Most of the information I found online was either really old and/or crappy docs for doing it in C#, or more commonly using PowerShell or VBScript. So, I managed to poke around and sort out how to get IronPython on Mono (IronPython 2.6RC + Mono 2.4.2.3) to find and update our users.

The end result is that I can now, from IronPython, find and update valid information on ActiveDirectory entries to reflect the latest and greatest information. One thing to note, the MS .Net ActiveDirectory APIs (System.DirectoryServices, which is mirrored in Mono) do something that confused and annoyed me. There are a limited set of 'valid' attribute keys for a user object in Active Directory (Which is really just LDAP, in case you didn't know). The DirectoryEntry object has a Properties attribute, which contains a hashmap of these values.

The object will not allow you to set an "Invalid" key (see this list for valid keys). But if you call .Properties.Keys you only get back the Properties that have values set. So, it doesn't appear to be possible to actually ask What keys are valid? and do some introspective programming. I have written a wrapper class to make the DirectoryEntry properties look a bit more pythonic (but disabled support for multi-value attributes for now) - at some point in the near future i'll likely add in a "valid value" filter.

The end result is, if I want to find my own user in ActiveDirectory by my name, I can do the following from the IronPython console:

>>> import ad_util
>>> adh = ad_util.ActiveDirectorySearcher('mydomaincontroller.hostname.or.ip', 'my.domain', 'myUsernameAllowedToChangeObjects', 'myPassword')
>>> adh
<ActiveDirectorySearcher object at 0x000000000000002D>
>>> userObj = adh.find_name('McAdams', 'Brendan')
[debug] Searching activedirectory for (&amp;(objectCategory=user)(objectClass=person)(sAMAccountName=*)(sn=McAdams*)
(givenname=Brendan*)).  Allow multiple results? False
>>> userObj
<DirectoryEntryHelper:{sn=McAdams,givenName=Brendan,mail=None,sAMAccountName=brendan.mcadams}>

You'll note the object returned is a "DirectoryEntryHelper" type; this is a crappy little wrapper class I put together to simplify attribute access, etc. You can tell any of the find_ methods to return a raw .Net API object instead of wrapped by passing the kwarg nowrap=True; find_name() takes last_name, first_name. Several other utility methods exist on the class including find by account name. Note that, in order to filter out entries in the Global Address List I'm requiring the search to find objects who have sAMAccountNames. I know the bare minimum about ActiveDirectory, but my Windows admin here tells me that sAMAccountName is a required and unique attribute on any actual domain account object.

You can see the keys that are already defined on the object with the keys helper method I added onto DirectoryEntryHelper:

>>> userObj.keys
['lockouttime', 'primarygroupid', 'msexchuseraccountcontrol', 'distinguishedname', 'cn', 'dscorepropagationdata', 
'whencreated', 'logoncount', 'msexchhomeservername', 'objectclass', 'memberof', 'lastlogontimestamp', 'displayname',    'msexchalobjectversion', 'objectguid', 'whenchanged', 'badpwdcount', 'useraccountcontrol', 'badpasswordtime', 'name', 
'samaccountname', 'mdbusedefaults', 'accountexpires', 'countrycode', 'msds-supportedencryptiontypes', 'homedirectory', 
'userprincipalname', 'lastlogon', 'objectsid', 'givenname', 'homedrive', 'usncreated', 'admincount', 'instancetype', 
'codepage', 'physicaldeliveryofficename', 'samaccounttype', 'sn', 'objectcategory', 'telephonenumber', 'pwdlastset', 
'usnchanged', 'lastlogoff', 'initials']

This returns a comprehended list of the keys (Rather than calling resultObj.Properties.Keys, which returns a Hashtable of HashKeys. It just makes life easier.

If I want to setup my mobile phone # (which isn't already set already) in Active Directory:

>>> userObj.mobile
# Nothing returned as mobile isn't set...
>>> userObj.mobile = '(646) 555-1212'
# It is committed on set to Active Directory, so another find gets it...
>>> newUserObj = test.adh.find_name('McAdams', 'Brendan')   
[debug] Searching activedirectory for (&amp;(objectCategory=user)(objectClass=person)(sAMAccountName=*)(sn=McAdams*)
(givenname=Brendan*)).  Allow multiple results? False 
>>> newUserObj.mobile
'(646)555-1212'

I'll leave the rest as an exercise for the reader, but I'm interested in comments and changes. You can fetch the latest code from my BitBucket toybox. I've also pasted it, for posterity sake, below the fold.

#!IronPython Specific Script!
# 
# Brendan W. McAdams <bwmcadams@gmail.com>
#
#------------------------------------------------- 
# No copyright or licensing, made public
# as example code.  Feel free to use it as you like;
# No warranty, liability or guarantees are implied.  
# In other words, if you use it YOU ARE ON YOUR OWN. 
#-------------------------------------------------
#
# Utility script to interface from IronPython to 
# MS Active Directory, allowing queries of properties 
# such as telephone numbers, and changes to said properties
# if you'd like to update them.
# 
# Note that your authentication user has to have valid permissions.
# If you're uncertain as to what permissions are needed, please see
# your AD Admin.

import clr 

# Add the System.DirectoryServices Mono DLL in.
# not sure on Windows what you need to add,
# but for Mono/Linux you just need to set IRONPYTHONPATH
# to include the location of the dll

clr.AddReference('System.DirectoryServices.dll')

import sys

# Import the LDAP / ActiveDirectory interface
from System.DirectoryServices import *

# A few predefined values to simplify LDAP querying, pilfered from various internet-ey type places.
BINARY_PROPS = ('objectguid', 'objectsid', 'msexchmailboxsecuritydescriptor', 'msexchmailboxguid')
ACCOUNT_QUERY = "(&amp;(objectCategory=user)(objectClass=person)(sAMAccountName=_ACCOUNTNAME_))"
PRINCIPAL_QUERY = "(&amp;(objectCategory=user)(objectClass=person)(userPrincipalName=_PRINCIPALID__))"
GROUP_QUERY = "(&amp;(objectCategory=group)(sAMAccountName=_GROUPNAME_))"
PARTIAL_LAST_QUERY = "(&amp;(objectCategory=user)(sAMAccountName=*)(objectClass=person)(sn=_LASTNAME_*))" 
PARTIAL_NAME_QUERY = "(&amp;(objectCategory=user)(objectClass=person)(sAMAccountName=*)(sn=_LASTNAME_*)(givenname=_FIRSTNAME_*))" 
PARTIAL_FIRST_QUERY = "(&amp;(objectCategory=user)(objectClass=person)(sAMAccountName=*)(givenname=_FIRSTNAME_*))" 



class DirectoryEntryHelper(object):
  """ Helper class for wrapping
  and simplifying AD search results.
  As much fun as dealing with Microsoft style
  object hierarchies can be...
  TODO: Is there an easy way to inject things into __dict__ for dir &amp; ipy tab completion?
  """

  def __init__(self, entry):
      if isinstance(entry, SearchResult):
          self.__dict__['_entry'] = entry.GetDirectoryEntry()
      elif isinstance(entry, DirectoryEntry):
          self.__dict__['_entry'] = entry
      else:
          raise TypeError, "Invalid type '%s', don't know how to proxy it" % type(entry)

  @property
  def properties(self):
      return self._entry.Properties

  @property
  def keys(self):
      """Returns a comprehended list
      of the *defined* Keys on the Properties object.
      Any valid keys w/o values won't get listed.
      """
      return [key for key in self.properties.Keys]

  def __getattr__(self, key):
      """Proxies the Properties objects in the DirectoryEntry
      to provide a simple getter.
      First checks if DirectoryEntry has an attribute (property)
      matching requested key, and passes that if so.
      Otherwise, fetches a matching property.  If you have
      a name collison, use the .properties property on this object
      to get a clean copy of the Properties object
      TODO: Better support for multi-value keys (only returns first idx right now]
      """
      if hasattr(self._entry, key):
          return getattr(self._entry, key)
      elif self._entry.Properties[key].Count:
          return self._entry.Properties[key][0]
      else:
          return None

  def __setattr__(self, key, value):
      """Proxies the Properties objects in the DirectoryEntry 
      to provide a simpler setter.
      First checks if DirectoryEntry has an attribute (property)
      matching requested key, and sets on that if so.
      Otherwise, sets upon a matching property.  
      If the property has an existing count > 0, sets Index 0  
      If not, it does an Add().
      The way the MS Classes work, Properties.Keys only returns keys that
      have a value count.  However, any *valid* Key has a silent value which
      can be initialized via Add().
      Invalid keys throw an error, so you can't be arbitrary.
      I use a list of keyspace found at: 

          http://www.dotnetactivedirectory.com/\
              Understanding_LDAP_Active_Directory_User_Object_Properties.html

      If you have a name collison, use the .properties property on this object
      to get a clean copy of the Properties object
      TODO: Better support for multi-value keys (only sets first idx right now]
      """
      if hasattr(self._entry, key):
          setattr(self._entry, key, value)
      else:
          if self._entry.Properties[key].Count > 0:
              if self._entry.Properties[key].Count > 1:
                  print >> sys.stderr, "[warning] Key '%s' contains multiple values which we don't properly support yet.  Using Index 0"
              self._entry.Properties[key][0] = value
          else:
              self._entry.Properties[key].Add(value)
          # commit early, commit often...
          self._entry.CommitChanges()

  def __repr__(self):
      """Print all pretty like on console.
      My biggest pet peeve working in .Net is that most objects don't have
      any kind of __str__/__repr__ value on them to give you quick information
      on their contents.  ARGH!
      """
      return "<DirectoryEntryHelper:{sn=%s,givenName=%s,mail=%s,sAMAccountName=%s}>" %\
          (self.sn, self.givenName, self.mail, self.sAMAccountName)

class ActiveDirectory(object):
  """ ActiveDirectory interface class.
  Tested on IronPython 2.6RC + Mono 2.4.2.3 on Linux. YMMV.
  """

  # Active Directory Handle
  adh = None

  server = None
  domain = None
  baseDN = None

  def __init__(self, server, domain, username, password, baseDN=""):
      """Constructor to instantiate the LDAP connection.
      server is a resolvable address to your domain controller (e.g. the resolvable hostname or IP)
      domain should be the ActiveDirectory domain as specified by your admin.
      If you don't pass BaseDN, your 'domain' argument will be split on
      . and each piece will be passed as DC=<piece>. This will be used as your BaseDN
      E.G. domain aurigasv.local becomes "DC=aurigasv,DC=local"

      *** Any exception in connecting will be wrapped and rethrown, aborting construction.
      """
      try:
          # Setup the baseDN
          self.server = server
          self.domain = domain
          if baseDN:
              self.baseDN = baseDN
          else:
              for tier in domain.split('.'):
                  baseDN += "DC=%s," % tier
              self.baseDN = baseDN[:-1]
              #print "Established a parsed BaseDN of '%s'" % self.baseDN

          # Establish a directory object, with auth info, pointing at the authenticating user
          self.adh = DirectoryEntry("LDAP://%s/%s" % (server, self.baseDN))
          self.adh.AuthenticationType = AuthenticationTypes.Secure
          self.adh.Username = username
          self.adh.Password = password

          # connect and fetch a copy of the authenticating user
          filter = "sAMAccountName=%s" % username
          dsSystem = DirectorySearcher(self.adh, filter)
          dsSystem.SearchScope = SearchScope.Subtree
          result = dsSystem.FindOne()

          if not result:
              raise Exception, "Found no result searching for specified authentication user %s." % username

      except Exception, e:
          raise e, "Authentication failure.  Did you provide valid connection information / authentication credentials?"

  def get_helper(self, result):
      if hasattr(result, "__iter__"):
          return (DirectoryEntryHelper(item) for item in result)
      else:
          return DirectoryEntryHelper(result)


class ActiveDirectorySearcher(ActiveDirectory):

  def _search(self, filter, multi=True):
      """Semi-private method for searching.
      Note that "filter" needs to be a string which is a 
      *VALID* LDAP query/filter.  You should probably use one of the useful
      utility find methods instead of querying this directly
      unless you understand LDAP queries
      By default searches for one only, pass multi=True for multiple results.
      """
      print "[debug] Searching activedirectory for %s.  Allow multiple results? %s" % (filter, multi)
      searcher = DirectorySearcher(self.adh)

      searcher.ReferralChasing = ReferralChasingOption.All
      searcher.SearchScope = SearchScope.Subtree
      searcher.Filter = filter

      results = searcher.FindAll()

      if results.Count == 0:
          raise Exception, "No results found for search."

      if multi:
          # Generator for performance
          return (entry for entry in results)
      else:
          if results.Count > 1:
              raise Exception, "Found multiple results in single-search mode.  To support multiples, please pass multi=True to the find method."
          return results[0]  


  def find_last_name(self, last_name, multi=False, nowrap=False):
      """Searches by sn (lastname).
      This can be a partial query, as long as the first
      few letters of last_name are defined 
      By default searches for one only, pass multi=True for multiple results.
      If you want  raw .Net SearchResult(s) object(s), pass nowrap=True, otherwise
      you get it back wrapped in a helper class.
      """
      result = self._search(PARTIAL_LAST_QUERY.replace('_LASTNAME_', last_name), multi)
      if not nowrap:
          result = self.get_helper(result)

      return result

  def find_name(self, last_name=None, first_name=None, multi=False, nowrap=False):
      """Searches by sn (lastname) &amp; givenName( firstname) 
      This can be a partial query, as long as the first
      few letters of last_name are defined
      If firstname is None, it quietly passes into find_last_name.
      By default searches for one only, pass multi=True for multiple results.
      If you want  raw .Net SearchResult(s) object(s), pass nowrap=True, otherwise
      you get it back wrapped in a helper class.
      """
      if not last_name:
          raise Exception, "You must, at a minimum, specify last_name for a name search."

      if not first_name:
          result = self.find_last_name(last_name, multi)
      else:
          result = self._search(PARTIAL_NAME_QUERY.replace('_FIRSTNAME_', first_name).replace('_LASTNAME_', last_name), multi)
          if not nowrap:
              result = self.get_helper(result)

      return result

  def find_account(self, account_name, multi=False, nowrap=False):
      """Searches by  sAMAccountName (account_name), which should be unique
      This can be a partial query, as long as the first
      few letters of account_name are defined
      If firstname is None, it quietly passes into find_last_name.
      By default searches for one only, pass multi=True for multiple results.
      If you want  raw .Net SearchResult(s) object(s), pass nowrap=True, otherwise
      you get it back wrapped in a helper class.
      """
      result = self._search(ACCOUNT_QUERY.replace('_ACCOUNTNAME_', account_name), multi)
      if not nowrap:
          result = self.get_helper(result)

      return result

  def find_first_name(self, first_name, multi=False, nowrap=False):
      """Searches by givenName, which should be unique
      This can be a partial query, as long as the first
      few letters of account_name are defined
      If firstname is None, it quietly passes into find_last_name.
      By default searches for one only, pass multi=True for multiple results.
      If you want  raw .Net SearchResult(s) object(s), pass nowrap=True, otherwise
      you get it back wrapped in a helper class.
      """
      result = self._search(PARTIAL_FIRST_QUERY.replace('_FIRSTNAME_', first_name), multi)
      if not nowrap:
          result = self.get_helper(result)

      return result