Stay away from UUIDs!

Stay away from UUIDs!

It wasn't until recently that I understood about selecting an appropriate primary key is crucial for efficient database design. I had to come out of a Hackathon mindset of "making things work" and adopt an engineer's mindset of "making things work better", I used to use UUIDs (Universally Unique Identifiers) every time I created a Data Model, it was easy, quickly and there was no problem of collision, but then life happened. In this article, I'll show you why you're wrong if you are a UUID-activist and if you wanna debate please take me out to dinner. That being said, let's get started!!

Before understanding why they're wrong, we need to understand how a typical SQL database stores our data. SQL servers use something called as B+ Trees to store your data.

NonClustered Index Structure – SQLServerCentral

The reason they use this approach is that these SQL servers believe in the concept of clustering. Clustering as you can see above helps the DB servers to retrieve your data faster, as you can see, if creates multiple node partitions based on the cluster indexes and then stores these indexes in sorted manner which helps in faster data retrieval. For example:
if you ask the DB for a roll number 6, it only searches the second node as it knows that roll number 6 is greater than 5 and less than 9 ( which are the starting points of second and thrid nodes ) .

The problem that arises using the UUIDs is, you guessed it, they aren't cluster-able ( if that's a word ). If you give the DB a UUID as a primary key, it won't be able to hash it, nor sort it as each UUID is expected to be completely different than the other.
But still the poor thing tries it level best to make you happy by providing the optimized storage, which results in something like this

DS: GATE CSE 1996 | Question: 1.14

A heavily distorted tree!

Now, if you're into this distorted stuff, please stop reading and use your beloved UUIDs, but if you're an optmization freak like me, continue..

I found that there are two types of IDs that you can use to avoid this problem
1. TSID (Time Stamp-based ID)
2. ULID (Universally Unique Lexicographically Sortable Identifier )

So, what are TSIDs, well these are primary keys that combine a random value with a timestamp, leveraging the timestamp for sorting data and the random component for ensuring uniqueness, similar to UUIDs. TSIDs offer an effective way to maintain a well-balanced B+ tree index, resulting in improved database performance and reduced memory wastage.

TSID Implementation:

public class TSIDGenerator
{
    private static Random random = new Random();

    public static string GenerateTSID()
    {
        DateTime timestamp = DateTime.UtcNow;
        long timestampTicks = timestamp.Ticks;
        long randomValue = random.Next();

        string tsid = $"{timestampTicks:x16}-{randomValue:x8}";

        return tsid;
    }
}

In the above code, we have a TSIDGenerator class with a static method GenerateTSID() that generates a TSID. It uses the current UTC timestamp (DateTime.UtcNow) to obtain the timestamp component and converts it to ticks. It also generates a random value using Random.Next() for uniqueness. The final TSID is constructed by combining the timestamp ticks and the random value.

Whereas ULID is even better ULID retains the unique nature of UUIDs while providing the ability to sort entities. By incorporating a timestamp component, ULIDs enable databases to organize data in a structured manner, optimizing data retrieval operations. With a low collision rate, ULIDs offer a promising solution for those seeking both uniqueness and sorting capabilities.

ULID Implementation:

public class ULIDGenerator
{
    private static Random random = new Random();

    public static string GenerateULID()
    {
        DateTime timestamp = DateTime.UtcNow;
        long timestampMillis = (long)(timestamp - new DateTime(1970, 1, 1)).TotalMilliseconds;

        byte[] randomBytes = new byte[10];
        random.NextBytes(randomBytes);

        StringBuilder ulidBuilder = new StringBuilder();

        for (int i = 0; i < 10; i++)
        {
            ulidBuilder.Append(ULIDCharacters[randomBytes[i] % ULIDCharacters.Length]);
        }

        string ulid = $"{timestampMillis:x10}{ulidBuilder}";

        return ulid;
    }

    private const string ULIDCharacters = "0123456789ABCDEFGHJKMNPQRSTVWXYZ";
}

In the above code, we have a ULIDGenerator class with a static method GenerateULID() that generates a ULID. It uses the current UTC timestamp (DateTime.UtcNow) to calculate the timestamp component as the number of milliseconds since the Unix epoch. It generates 10 random bytes using Random.NextBytes() for uniqueness. The ULID is constructed by appending the timestamp and random bytes, following the ULID character set.

Both these alternative approaches give you low-collision, unique values and sortable entities for your DB.
So, the next time you use a GUID, please feel sorry for your database.

Did you find this article valuable?

Support Kunal Dubey by becoming a sponsor. Any amount is appreciated!