Pierre-Henry Soria ✨

Posted on Nov 8 • Edited on Nov 25

Why it's time to ditch UUIDv4 and switch to UUIDv7!

#uuid #database #performance #node

I've been using UUIDv4 as my go-to identifier for database primary keys for quite a long time, moving from sequential integer IDs (auto-increment/SERIAL). UUIDv4 immediately reminds me of the time when we didn't have better alternatives for distributed systems.

Apart from being widely adopted and having massive usage, UUIDv4 has some issues that can be easily fixed with a "more modern" alternative.

I've recently started using UUIDv7 and it does have many advantages in comparison with UUIDv4.

First of all, it's really fast (UUIDv7, with its time-ordered structure, claims to be 2-5x faster for inserts than UUIDv4). Writing records to the database has been a real pleasure. The insert performance and index maintenance are considerably faster.

Second, it naturally sorts by creation time, whereas UUIDv4 doesn't do this at all. With UUIDv4, each time you insert a record, it lands in a random position in your B-tree index, causing page splits and fragmentation. This leads to degraded performance over time. Furthermore, you will still need to add a separate created_at timestamp column if you want to sort records chronologically. In addition, there is always index fragmentation when inserting with UUIDv4 (UUIDv7 appends sequentially to the index, providing better cache locality for index operations).

UUIDv7 (and a few others such as ULID that can also be used as time-ordered identifiers) handle this under the hood. This way, when you are dealing with high-volume inserts and large databases, you won't have any bad surprises like severe performance degradation or bloated indexes, for instance.

For instance, here's how UUIDv7 structures its data:

018c8e8a-9d4e-7890-a123-456789abcdef
└─timestamp─┘ └───random bits────┘

The first 48 bits contain a Unix timestamp in milliseconds, so UUIDs generated over time are naturally sequential. You can also (and very easily) start using UUIDv7 in your existing projects without migrating old UUIDv4 records - both can coexist in the same column.

// Node.js usage
const { v7: uuidv7 } = require('uuid');
const id = uuidv7();
console.log(id); // 018c8e8a-9d4e-7890-a123-456789abcdef

// example database record
{
  "id": "018c8e8a-9d4e-7890-a123-456789abcdef",
  "user_id": "018c8e8a-9d50-7000-c345-6789abcdef01",
  "created_at": "2024-11-08T10:30:00Z"
}

It's also good to mention that UUIDv7 maintains the same 128-bit format as UUIDv4, so it works with all existing UUID columns in your database.

Benchmark

UUIDv4 database inserts: 2,847ms
UUIDv7 database inserts: 2,763ms

Note: My benchmark was run with Node v20.10.0

// Run: node benchmark-uuid-comparison.js

// UUIDv4 benchmark
const { randomUUID } = require("crypto");

console.time("UUIDv4 database inserts");
for (let time = 0; time < 10_000_000; time++) {
  randomUUID();
}
console.timeEnd("UUIDv4 database inserts");

// UUIDv7 benchmark
const { v7: uuidv7 } = require("uuid");

console.time("UUIDv7 database inserts");
for (let time = 0; time < 10_000_000; time++) {
  uuidv7();
}
console.timeEnd("UUIDv7 database inserts");

// note, I've used 10_000_000 with _ which are numeric separators
// https://github.com/pH-7/GoodJsCode#-clearreadable-numbers

The real performance difference shows up in actual database operations where UUIDv7's sequential nature prevents index fragmentation.

Downsides...

The only case where you still want to use UUIDv4 is when you explicitly don't want temporal ordering. This occurs either when you're building security tokens, API keys, or session IDs where predictability could be a security concern. The problem is that UUIDv7's timestamp-based structure reveals when the identifier was created. We need something completely unpredictable in these scenarios, such as pure random identifiers that don't leak any information about creation time.

However, for database primary keys and foreign keys, UUIDv7 is the clear winner. Anyway, it's worth trying it in your next new project 😉

Now, it will give a significant boost to performance for your database operations, as well as better index efficiency over time, which is the most important at the end of the day, right? 😊

Alternatives

UUIDv6 is another time-based UUID option (still relatively new and not very popular either), that is essentially a fixed version of UUIDv1. It also provides sequential ordering like UUIDv7, but it still includes MAC address information (or random node ID) in its structure, which UUIDv7 avoids entirely for privacy reasons.

Where to continue using UUIDv4

Lastly (although you might not need this), it's good to mention that UUIDv4 is still perfectly valid for security tokens, API keys, session IDs (where you don't want creation time leakage), or PostgreSQL databases (where the performance impact of random UUIDs is less severe for PostgreSQL since it appends new data to a heap rather than reordering it in a clustered index like MySQL).

👉 See my projects on GitHub, GitHub.com/pH-7 💡
☕️ Was this helpful? You could offer me one of my favorite coffees: Ko-fi.com/phenry 😋

Top comments (4)

Pierre-Henry Soria ✨ • Nov 8

Feel free to share your thoughts or feedback on whether you use UUIDv4, UUIDv7, or sequential IDs. Always keen to discuss! 🤗

Paweł Świątkowski • Nov 10

UUIDv7, with its time-ordered structure, claims to be 2-5x faster for inserts than UUIDv4

I find this puzzling. After all this is just a certain number of bytes to be inserted. Why would v7 be faster and which DB engine we are talking about? Of course, I assume this is taking away index update times, which you cover in the following paragraph.

UUIDv7 is generally great. The main hurdle I found was that I apparently learned to read the first 5-6 characters when debugging and now I have to read last 5-6, but that's a minor inconvenience.

Matthew O. Persico • Nov 10

If you stick u7 data in a column already containing u4 data, the odds of a new u7 timestamp part matching the first 48 bits of an u4 entry are miniscule, but how would you be able to detect it if it happens? Do you have to create a current u7, then search your existing u4 data for any 48 bits >= the current u7 timestamp and replace them, just to be safe?

Lawrence Aberba • Nov 10

Started using uuidv7 myself. I still use uuidv4 like you mentioned though. Great writeup