Home Books Software Projects Forums

Python + UML =
An Example

The following example is a starting point for our demonstration of what we have described as the benefits of using Python + UML. Our choice for this example is a key Python infrastructure component called Medusa.

Medusa is a high-performance internet server written in Python by Nightmare Software. It takes a unique approach to scalability by running a limited number of threads and by employing asynchronous event handling to multiplex the processing of a large number of connections.

Medusa has been deployed in several high-performance sites, a good example of which is eGroups, a discussion group service with over 5 million subscribers and 250,000 discussion groups. The eGroups site itself is implemented completely in Python. Medusa is also used by the Zope application server as the underlying web server.

A Class Diagram

The following class diagram shows a portion of the inheritance hierarchy in Medusa. The diagram was produced by using ObjectDomain as a UML modeling tool and reverse-engineering from the Python code.

Class Diagram - Medusa HTTP Server

HTTP Server

Reverse Engineering

The reverse-engineering itself was fairly straightforward with some minor quirks. ObjectDomain lets you select the Python files to be reverse-engineered and, when the list is complete, you start the reverse-engineering process. The process parses the Python code and imports classes and their methods into the current model. However, the creation of a diagram is a separate sequence of operations which requires that the designer select the classes to appear on a class diagram. It would be nice if ObjectDomain could immediately produce a first-cut class diagram as part of the reverse-engineering process, as is done with some UML tools.

Another deficiency of this process is that ObjectDomain does not automatically draw the inheritance associations. These had to be created by hand, although the tool could have determined the inheritance relationships from information in the Python code:

# from http_server.py:
class http_channel (asynchat.async_chat):                    

# from asynchat.py:
class async_chat (asyncore.dispatcher):

# from asyncore.py:
class dispatcher:

Attributes: Types, Visibility, Scope

An anomaly of Python reverse-engineering is the lack of type information in the UML diagram for attribute class members. This of course is due to the fact that Python, like most scripting languages, is dynamically typed. There are no type declarations; variables come into existence only when they are initialized or used for the first time.

ObjectDomain finds the attributes for a class by looking for variable initializations at the class level - after the class definition - and the instance level - inside constructors and methods. The tool elects to show the type of an attribute as a question mark ('?'). This has the unusual affect of making the class specification look incomplete for class attributes. There are actually two other alternatives that could have been employed by ObjectDomain:

  • The UML allows type information for attributes to be suppressed in the notation for a class. This approach is probably the simplest and preferable from a Python purist's standpoint.

  • The type of a variable may be derived from the type of its initializer - String, Int, Long, Float, List, etc. Python defines these data types in the types.py module and even provides a type() built-in function to retrieve the type of a variable. The deficiency of this approach is that it requires the parser in the reverse-engineering process to keep track of type information.

Note that the visibility is public ('+') for all class attributes and for that matter, operations as well. This is by design within Python and differs markedly from Java and C++.

Note also that ObjectDomain accurately reflects the scope of attributes. Attributes with class scope are underlined while those with instance scope are not underlined, as specified by the UML. As an example, compare the following code fragment from the async_chat class with the UML diagram above:

# from asynchat.py:

class async_chat (asyncore.dispatcher):                      
    ac_in_buffer_size   = 4096
    ac_out_buffer_size  = 4096

    def __init__ (self, conn=None):
        self.ac_in_buffer = ''
        self.ac_out_buffer = ''

All instances of the async_chat class automatically inherit the buffer_size attributes, because these attributes have class scope, whereas each new class instance gets its own copies of the in_/out_buffers, because they have instance (self) scope. The UML notation for scope helps to reinforce this concept visually and concisely.

From within a Python session, let's see the ramifications of class vs. instance scope:

Python 1.5.2 (#1, Apr 18 1999, 16:03:16) 
Copyright 1991-1995 Stichting Mathematisch Centrum, Amsterdam 
>>> import asynchat

# create an instance
>>> channel =  asynchat.async_chat()

# access to attributes from an instance
>>> channel.ac_in_buffer
>>> channel.ac_in_buffer_size

# now let's look at the name spaces
>>> channel.__dict__['ac_in_buffer']
>>> channel.__dict__['ac_in_buffer_size']
Traceback (innermost last):
  File "<pyshell#11>", line 1, in ?
KeyError: ac_in_buffer_size

# not found in the instance's namespace!
# check the class namespace
>>> asynchat.async_chat.ac_in_buffer_size

From this session we can explicitly see that the ac_in_buffer_size attribute is inherited by the instance (channel) from the class scope of async_chat.

Operations: Inheritance

Inheritance of implementation is a powerful OOP feature supported by Python. Medusa exploits this feature nicely. An inheritance hierarchy is created whereby general behaviour is abstracted and captured in the classes found in the asyncore and asynchat modules. Medusa contains servers for both the HTTP and FTP protocols, each built on these classes. Additionally, clients for such protocols as NNTP may also be built on top of the classes found in the asyncore and asynchat modules. This is an excellent example of the reuse promised by OOP.

To understand the inheritance hierarchy of this example, it helps to briefly understand the class responsibilities:

  • asyncore.dispatcher - Handle low-level, asynchronous send and receive on sockets using the event-driven model provided by the select() system call.

  • asynchat.async_chat - Handle buffering of data for both send and receive. Receive data is buffered until a protocol terminator (marker) is found. Send data is buffered while complete messages for a protocol are constructed.

  • http_server.http_channel - Handle the creation of http_request objects as HTTP requests (GET, POST) are received, and dispatch these requests to request handlers.

Now let's look at how inheritance is used in our example. We expect a subclass to specialize behaviour in some way. Specialization may be accomplished either by extending or replacing the behaviour in a superclass method. Refer to the class diagram above for the following discussion.

The first example of specialization is the handle_read() method in the async_chat class. The dispatcher class provides a do-nothing handle_read() method which is supposed to be replaced by a subclass method. The async_chat.handle_read() method provides this replacement. The first thing handle_read() does is to call self.recv() to read in the incoming data and add it to the buffer:

# from asynchat.py

def handle_read (self):

        data = self.recv (self.ac_in_buffer_size)            
    except socket.error, why:

    self.ac_in_buffer = self.ac_in_buffer + data

Now this is interesting - which recv() is being called? Looking at the class diagram above, there is a recv() in both the ancestor class dispatcher and the descendant class http_channel. As you probably guessed, it is http_channel.recv() which is being called. Remember that self is a reference to an object instance, in this case an instance of http_channel. When Python searches the namespaces for the method, it begins with the most derived class. This is a nice example of polymorphism in Python. But what about the dispatcher.recv() method which is the real workhorse?

# from http_server.py

def recv (self, buffer_size):
        result = asynchat.async_chat.recv (self, buffer_size)
        self.server.bytes_in.increment (len(result))
        return result

From here we see that the http_channel class extends the recv() method by gathering statistics for the incoming bytes. It then invokes the superclass behaviour through its parent, async_chat. Python searches the class hierarchy from that point and finds the method in dispatcher. This is a nice example of specialization through extension.

Back to asynchat.handle_read(), the received data is added to the ac_in_buffer that we already saw earlier. Then the data in the buffer is scanned for the HTTP terminator characters.

If a terminator is found, then the found_terminator() method is called after the data is passed on to the consumer through the collect_incoming_data() method. These last two methods are what one might call virtual methods. Although the async_chat class calls these methods through self (i.e. self.found_terminator()), it doesn't define these methods at all. This type of virtual function provided by Python is different from what you find in Java and C++ where the method must be declared in the superclass and then overidden (replaced) by a subclass!

Summary: There is More...

We recognize that the preceding example is not for beginners. Our intent was rather to target experienced Pythoneers and to demonstrate the power of using the UML to understand complex, object-oriented software written in Python. When the day is done, the example code looks a lot more tame than before using the UML. We do believe that the same power of clarification may be employed to explain object-orientation in Python to the newcomers targeted by the CP4E project.

But there is more... To fully understand the Medusa software, we needed an additional tool. The class diagram was like an X-ray, allowing us to visually penetrate the code to find the underlying structure. To see how the pieces actually interact, however, we needed a sonogram to see the software in action. To that end we extended the Generic Python debugger base class - Bdb - with a few extra lines of code and used it to run Medusa to see the sequence of methods which where being called. Sound like the makings of a sequence diagram? Yes, but it would need more work to be really presentable. Hopefully more later...

Valid XHTML 1.0!