Monday, January 31, 2011

Pythonic Extrinsic Visitor

Regardless of the language chosen to express it, the Visitor pattern remains one of the more basic behavioral patterns described by the "Gang of Four's" Design Patterns Book. Herein I describe a Pythonic Visitor implementation that overcomes common language-based limitations of C, C++, Java, and C# Visitors and present a usable implementation that goes well beyond available sample/snippets. In particular, the implementation provided addresses two commonly omitted but crucial features "left as an exercise to the reader": correct method resolution/dispatch of types within an inheritance hierarchy and caching method resolution results for performance's sake. More on those and other features later.

Implementing a visitor pattern in practice poses challenges:
  1. Applying visitors to existing type hierarchies demands refactoring potentially large sets of visitors'  visited types to include accept() method
  2. Refactoring described by the previous bullet, in-effect, couples the visitor and visited hierarchy through the visitor accept()/ visit() interface. Additions to the visited type tree hierarchy thus commonly demand commensurate changes to the visitor API and potentially its implementation
  3. Behavior of language-specific polymorphic dispatch (in this case double-dispatch) often elude both junior and senior programmers alike. 
Visitors take different forms, actually. Spurred by problems resulting from bullets #1 and #2 above, some design and implement Extrinsic Visitors. Informally, an extrinsic visitor makes decisions about how to 'visit' a visitable object without relying on double-dispatch. Put another way: the visitor need not call a method on the target visitable object itself to 'reclaim' its specific type before selecting the applicable visit() method.

Challenges to Extrinsic Visitation
When Developers choose languages such as Java or C# to implement an extrinsic visitor they must rely on reflection or introspection (respectively) in order to reclaim the visitable object's specific type without invoking a method of that object's instance. During code review, I've observed sloppy implementations revert to a switch/case statement failure mode--again introducing brittle dependence between the visitor implementation and the visited type hierarchy (#2 above). Bullet #3 (above) remains a problem, but challenges represented by bullets one and two are replaced with the following:
  1. Sandbox security policy (Applets, etc.) may prevent use of reflection/introspection in some environments
  2. Reflection/introspection can cause a costly speed hit depending on the language, and VM implementation (not nearly as much a problem with modern Java 1.5+)
  3. Depending on visitor design, visitor API may remain rigidly tied to visited hierarchy demanding an explicit concrete method for each visitable type (as in #2 above)
Search for Extrinsic Visitor in Python and Google will return plenty of results. Often, implementations provided express the problems described by the bullets above or leave their solutions as exercises to the reader. Unfortunately, many programmers I encounter simply do not possess the familiarity with Python's object model or method-resolution mechanics (mro) to quickly accomplish the goal. Having fought my own lack of Python knowledge (hard to un-seat understanding of Java), I can sympathize.

Pythonic Extrinsic Visitor: Design
The design of my Pythonic visitor bear the following properties:
  • Extrinsic
    • The visitor and its sub-classes require no invocation of visitable type methods (double-dispatch) in order to reclaim specific target type
    • The visitable type requires no refactoring (IE: no accept() method necessary)
  • The visitor employs Python mro for "C++-style" best-match visit() method dispatch
  • The first method resolution result is cached to avoid hit on subsequent all calls to the visit() method for a particular type
  • Support for NoneType (IE: visitor.visit(None))
Sections below describe each design element above in turn.

At its heart, the visitor class has only two public methods:
  • def visit(self, visitable)
  • def defaultVisitor(self, visitable)
Along with the class' constructor, these make up the visitor API. As with any extrinsic visitor, developers enjoy the follow the following simplified calling convention:

fruit = grocerer.pick("Banana") 
visitor = MyVisitor()
Snippet - A - Simplest Visitor Test Driver

Presume here that grocer.pick() represents a factory method capable of returning a variety of fruit sub-types based on their string name. The defaultVisitor() method provides developers the ability to specify a 'catch-all' visitation function, capable of handling any typed passed to visit() as its parameter. Developers simply override the  defaultVisitor() method in their subclass. Thus, the following represents a "Hello World" attempt at extending the visitor provided:
class MyVisitor(visitor):
    def defaultVisitor(self, visitable):
         print '%s' %(visitable)
The visitor base class provides a defacto implementation for the defaultVisitor() method so Developers need not override. The defacto implementation throws TypeError when the implementation's method resolution can not resolve a specific handler match.

The key feature of this visitor implementation is that those extending only need to implement those visit (handler) methods they desire explicitly. Developers do this by adding methods to their subclass in the form:
def visit(self, visitable): pass
So,  from the example above, a Developer might choose to do the following:

class MyVisitor(visitor):
    def visitBanana(self, visitable):
         print '%s' %(visitable) 

Because this visitor employs a 'best match' dispatch, the following would have the same effect in our test driver captioned "Snippet A".

class MyVisitor(visitor):
    def visitFruit(self, visitable):
         print '%s' %(visitable)

More on how the implementation accomplishes this in the next section.

Best Match
The best match property is the key to overcoming brittle coupling and tedious visitor implementation update. As designed, the visitor described herein employs a lookupVisitorMethod() method to accomplish this. Roughly, the algorithm employed follows:

def visit(self, visitable):
    clazz = visitable.__class__

    for _type in clazz.mro():
       handler =  'visit%s' %(_type.__name__)
       visit_method = getattr(self, handler, None)
       if not (visit_method is None):
          return visit_method(visitable)
    return visit_method(visitable)

Roughly, this algorithm iterates through the visitable objects' class hierarchy and looks for a method in the visitor object named visit(). Remember, mro() is an ordered list (Specific to most generic).

Caching method resolution proved the most difficult task in creating the presented implementation. The critical decision: what scope in which to place the method resolution cache. Placed at a high level (class object, module scope, or similar) the performance will increase but the caching scheme need be more advanced to handle circumstances in which multiple visitor sub-classes visit overlapping visitable type trees. Seeking to avoid this complexity, I chose to place the method cache in the visitor object instance.

def visit(self, visitable):     clazz = visitable.__class__                         try:         visit_method = self._method_map_cache[clazz]     except KeyError:         visit_method = self._lookupMethod(visitable, clazz)     return visit_method(visitable)                         
def _lookupMethod(self, target, clazz):    
   for _type in clazz.mro(): 
      handler =  'visit%s' %(_type.__name__) 
      visit_method = getattr(self, handler, None)
      if not (visit_method is None):          self._method_map_cache[clazz] = visit_method          return visit_method    self._method_map_cache[clazz] = self.defaultVisitor    return self.defaultVisitor
This implementation's scheme performs sufficiently well for even tight-loop operations (such as AST, DOM, and other large tree visitations) but avoids visitor/visited type tree collision. For more information, see the test suite. While not perfect, the implementation strategy above avoids opaque error messages and some more odd stack-frame insertions that complicate debugging (as experienced with other visitor schemes).

The dispatching schemes as adopted by this algorithm (and other samples available online)  fall prey to a simple failure mode: a caller passing a None rather than a valid instance (IE: v.visit(None)). Sample code available online almost always omits support for this corner case, despite how common it can be in tree visitations and collection iteration.  This implementation solves the problem of handling None types with a special case prefacing the lookup method:

class dummy_lookup(object):
   def mro():
       return [type(None)]
def _lookupMethod(self, target, clazz):
   if clazz is None:
       clazz = dummy_lookup()


The effect of the dummy lookup class is to serve the mro() iterator that follows in the lookup method.

Configurable Visit method
The implementation provided provides an addl. feature to avoid name-based collisions for existing visitation logic that an adopter may want to move to the visitor pattern. The implementation's constructor takes a visit_method=  keyword parameter that defines what prefix visitor subclass definitions will use to define specific visit methods. Use this feature when defining your visitor as follows:

class myVisitor(visitor):
    def __init__(self, *args, **kwargs):
        kwargs['visit_method'] = 'handle'
        visitor.__init__(*args, **kwargs)

   def handle(self, visitable): return self.visit(visitable)

   def handleType1(self, visitable): pass
   def handleType2(Self, visitable): pass

This mechanism allows developers to hide the visitor pattern's name from those calling it. Electing to call the visit method by another name (PrettyPrint, handle, ) can be more intuitive.

Download the current code base from its Google Code Repository  or use SVN to do a checkout using URL:

svn checkout code-poetic-read-only

  • Python 2.x


Thursday, January 20, 2011

Creating Dictionaries & Cracking Passwords on Mac OS X

If you've forgotten your password on a Mac OS X machine, for either a disk image (sparse bundle, sparse image, etc.) or keychain, you know a feeling of hopelessness. If you're going to attempt to break into your own encrypted store, you have two difficulties:
  1. Generating a minimal password dictionary
  2. Adapting a dictionary password test driver to OS X password-checking 
Test Drivers: Simple Examples Rarely Suffice 
When, frantically, I realized that I was unable to manually guess my image password back in 2008, I wasn't satisfied with what image-cracking utilities Google search returned. The simplified-for-demonstration-purposes test drivers provided in forums didn't handle the kinds of characters my passwords contained (redirects '<', '>' and pipes '|').

I created two Bash scripts to drive unlock attempts from a created password dictionary:
  • unlock_image (using hdiutil attach)
  • unlock_keychain (using security unlock-keychain)
Each conforms to shell scripting norms, returning 0 on success (and printing the successful password), returning 1 on failure, and returning 2 on error.  

Optimizing Minimal Dictionary Creation
Worse, because my passwords typically use a large variety of character sets, I feared coming up with a minimal dictionary would prove too difficult. Back-of-the-napkin estimations on 1) cost per password attempt (time) and 2) dictionary size indicated success would demand some tricks.

Generating an inclusive dictionary proved impossible quickly. The image I was trying to crack had a twelve (12) to fourteen (14) character password, containing (what I believe to be) some combination of fourteen (14) unique characters. This produced, without compression or clever storage solutions, a multi-hundred gigabyte dictionary that would require 48 years to exhaustively attempt using my four-machine set-up.

Robust rule-based password dictionary creation utilities exist but I found these projects a false-optimization to cracking one's own password. You know some aspects of your passwords character set and structure but I found that including too many rules into dictionary creation (creating few-minute or few-hour) attempts always came up empty--my dictionaries were too restrictive.

As a result, I created a simple dictionary creation utility that allowed its user to specify a few password qualities:
  • User-specified password dictionary character set
  • Minimum password length
  • Maximum password length
  • Are password characters unique (or may they be repeated?)
  • Fixed password preface
  • Fixed password suffix

With this utility I created dictionaries inclusive-enough to find the passwords I'd forgotten while (I believe) striking a decent balance between complexity of dictionary specification and resulting password dictionary size.

To download, build, and use the utilities, please go to its Google Code Repository:

Google Code - dmg_password_crack

  • Mac OS X
  • Maven2
  • Implementation requires only modest memory, requiring 384K of RAM at steady-state under default settings
  • Implementation appears I/O bound, not bound by syscalls (malloc/free), memory, or computation
  • Password dictionary creation code does not guard explicitly against bad or malicious input. Even careless values (such as negatives) create unexpected results.
  • Recursion and memory allocation, as implemented, lessen LoC but will not perform as well as a well set-up fixed array and iteration
  • hdiutil attach remains the 'long pole' (critical path, most onerous step, etc.) in utilities
  • Code compiles error-/warning-free on Apple's gcc 4.2.1, i686 and performs well on Valgrind leak check, but fails to conform to OOA/D and style conventions
  • Code passes four included integration-level tests on OS=OS X 10.6.6 arch=x86_64