Design Standard for Object Identification (UUIDs)


Overview

The need to identify objects, both within and outside of a database, is a critical feature of any enterprise information systems solution. This identification should be non-semantic in nature and support generation in a distributed environment while supporting global uniqueness.

Universal Unique Identifiers (UUIDs or ObjectIds) have been standard practice in distributed environments for the last decade. As a result there has been a vast amount of research into various algorithms and strategies. This document is a brief synopsis of techniques and specification on a particular algorithm.

Generation of unique identifiers by the database has several negative implications.

  1. It limits the database’s scalability by increasing its CPU demand.
  2. It does not support applications that are filesystem-based.
  3. It implies that for an object to have an ID it must be persistent. (i.e. For any object graph to be created, records in the persistence store (database) must also be created for each object, regardless of any actual persistence occuring.)
  4. It implies that the ID is not important between the time an object is created and the time it is persisted.
In a distributed environment, an object may be passed from one machine to another for processing. In passing this object across process/platform boundaries, its true instance identity is lost. This can present problems. See diagram below.
 

If process one passes an object to process two in a message and then process two passes that same object back to processes one, the instance of the object will be different. Process one has no consistent identity by which to determine if the objects are the same instance. This is why middleware solutions, such as CORBA, assign IORs – interoperable object references.

One could call a network service or database stored procedure to create an identifier at object instantiation time. While functional, this has serious performance implications since network (and possibly database overhead) can contribute a significant latency to the generation process. Conceptually, every object that is created should have an identity. <>Some platforms provide DCE compliant generation utilities such as UUIDGEN or GUIDGEN. These utilities are not much different than calling a network service or stored procedure in that they imply significant overhead. In addition, use of these utilities couples the implication to a platform.

In order to preserve instance identity in a distributed environment (that may or may not be based on CORBA or EJB), and to provide for the best possible performance, this design recommends that the identifier generation occur as a part of each application. This generation should occur at object instantiation time, using a local implementation of the generator. This design recommends an abstract data type (a class) which is the identifier. Assuming a Java implementation, this design recommends that an interface, a class, and a factory be created to support ObjectIds.

ObjectId is the class that contains the logic to generate the unique identifier. Identifier is an interface which allows for strategies of differing types of ids should the need arise. However, the strategy inforces that all ids manifest themselves as byte arrays.

Some exceptions to this may exist and should be handled individually. Similar conventions can be used in other object-oriented languages. If the language does not support object identity, conceptually, the object address can be substituted.

To provide this support the generation of object identities will be based loosely on the DCE standard for Universally Unique Identifiers (UUIDs). Java is the stated primary implementation language for the iWombat libraries. Due to this, a minor limitation is imposed in the composition of the identifier. DCE calls for the creating host’s MAC address to be embedded in the identifier. Due to an apparent limitation of Java, only the IP address can be determined in a platform independent fashion.

Due to this limitation full implementation of the DCE specification requires that MAC address must be obtained through “abnormal” means (not java native). Thus, for iWombat.com purposes the ObjectId class will be expecting a MACADDR environment variable be present and available to the JVM.

Extensibility

Given this object model it is possible to implement a strategy pattern using the Identifier to create a multitude of other identifiers besides the ObjectId. One such potential Identifer is the AssetId in use by Teams – a 20 byte id. 

ObjectId Design

The ObjectId generated by this algorithm produces a 16-byte value which adheres to the standard DCE specification ( http://www.opennc.org/onlinepubs/9629399/apdxa.htm ) with the following exceptions:

Resolutions to these issues are as follows:

Milisecond resolution

The ObjectId generator will keep track of the last system time recorded and compare to the current time in a synchronized section. If the time signature is stale (the same) it will treat as if it were a clock rollback and increment the clock-synch sequence number appropriately as per the DCE specification.

Clock begins at 1/1/1970

Calculations will be made to adjust the clock to begin at 8/15/1582 as per the DCE specification.

Java does not support MAC address

The ObjectId will look for MAC address to be specified in a System property.

To minimize the amount of time required in generating a value, the MAC address is initialized once and then stored as static attributes of the ObjectId. The static nature of these attributes means they are only stored once despite the number of instances created.

The generated value, being 16 bytes, is too large to fit in a primitive data type. The Java long integer data type is only 64-bits (8 bytes) long. Therefore, the generated value is stored as an array of bytes.


Identifier Implementation

There is one interface and a factory to support the implementation of unique identifiers—the factory for creation and the Interface for persistence and manipulation.
 

Identifier Creation

The creation of Identifiers will be performed via the factory pattern. There are only two constructor methods in the IdentiferFactory: Creation of an Itentifier is simple, though this explicit call would never typically be required of an application. To create a new Identifer, use the following code:

Identifier anOid = IdentiferFactory.getUniqueIdentifier();

Additionally, since all identifiers need to have as their root primitive a byte-array the factory can create a new Identifier from an existing byte-array:

Identifier anOid = IdentiferFactory.getUniqueIdentifier(myBytes);
 
 
 

Using Identifiers

To upport persistence, instance checking and the usage of an identifier outside of the system the Identifier interface supports four methods.
  1. getValue()
  2. toHexString()
  3. toString()
  4. equals()
The getValue() method simply returns the byte-array for the object, and is typically used in the following fashion:
byte[] myOidValue = myIdentifer.getValue();
 
 
 

The equals() method allows one to test equality between two objects. The Identifier may be compared to any other object for equality. If the object that the Identifier is being compared to is not of the same class, the result will be false. If the second object being compared is another GUID, then the equality test is double-dispatched to the second GUID in order to preserve encapsulation. Equality may be tested in the following fashion:

if( anIdentifer.equals(anObject) ) {

   //do something

}

 

 
 

The method toHexString() returns the value of the ObjectId in a hexadecimal representation. This method may be used in the following fashion:

System.out.println( "Hex: " + anIdentifer.toHexString() );

Output: Hex: ac10011c13630703111d1e027b03f692
 
 
 

The toString() method is used to provide a common string representation of an object. This includes display in debuggers and watch lists. To facilitate debugging and other human interaction with the Identifer, the toString() method simply returns some human-readable result and the class name. This method may be used in the following fashion:

System.out.println( anIdentifer );

Output: com.capitalstream.foundation.ObjectId:172.16.1.28-1999/6/30- 10:16:27.42-1323282


Performance Testing

The ObjectId class has been built and tested. Strict attention has been given to ensure no extra code is being imported or included, minimizing the footprint of the object. Besides simple testing of the value generation, multiple tests have been performed. In a single threaded test scenario, 1,000,000 ObjectIds (Identifers) were generated, the elapsed time measured and the average time to an ObjectId calculated. The test consistently produced ObjectIds at the rate of between 100 and 130 ObjectIds per millisecond.

The tests above were run on more than one machine. In each case, the ObjectId generated correctly reflected the MAC address of the host on which it was generated. Therefore, by definition, running the test on multiple hosts should not produce duplicates.

The final test involved running multiple concurrent processes on the same machine. This eliminates the MAC address from the uniqueing process. To stress the algorithm, four concurrent processes were run, each generating 1,000,000 ObjectIds in as rapid a fashion as possible. Following the generation, the resulting 4,000,000 ObjectIds were examined. No duplicates were produced.


Conclusions

There are two key priorities driving the implementation of this ObjectId factory: platform independence and speed of implementation. Also the potential for another more widely suppored UUID class to appear is likely, thus the factory and interface approach.

In the event that some third party product might generate a unique ID, these IDs will be treated as data, along with the other information being produced. Additionally the relationship between ObjectId and Identifier easily lends itself to a classic GOF strategy pattern should the need arise.

Finally, it is not necessary that every instance of data be stored with an ObjetId. A server log may contain sufficient information to uniquely identify every row of data. In such a case where uniqueness is inherent and access is limited to reporting and mining efforts, the storage of an ObjectId may be unnecessary overhead and dropped for space efficiency reasons.