What is MongoDB's _id Field and How to Use It

Posted by John Potocny and Ewen Fortune on Jul 6, 2017 7:43:12 PM

MongoDB's users are well known for valuing the database's capacity for growth. After all, the system's tagline is "for giant ideas," and for good reason.  The system supports extreme scalability and has been designed to store exceptionally large numbers of documents. Among all the data that a MongoDB system handles, however, there's only one field that exists across every document: the _id field.

identification.jpg
Image Source

The Makeup of the _id Field

The _id field is the unique naming convention that MongoDB uses across all of its content. As such, it's important that you, as a user, understand its makeup and characteristics, so you can utilize the ways your database is organizing your documents.

As a quick, opening summary, these are a few of _id's principal characteristics:

  • _id is the primary key on elements in a collection; with it, records can be differentiated by default.
  • _id is automatically indexed. Lookups specifying { _id: <someval> } refer to the _id index as their guide.
  • Architecturally, by default the _id field is an ObjectID, one of MongoDB's BSON types. Users can also override _id to something other than an ObjectID, if desired.

As MongoDB's documentation explains, "ObjectIds are small, likely unique, fast to generate, and ordered." Interestingly, ObjectIds are 12 bytes long, made up of several 2-4 byte chains. Each chain represents and designates a specific aspect of the document's identity. The following values make up the full 12 byte combination (quoted from MongoDB's documentation):

  • "a 4-byte value representing the seconds since the Unix epoch,
  • a 3-byte machine identifier,
  • a 2-byte process id, and
  • a 3-byte counter, starting with a random value."

In summary, the _id's first nine bytes guarantee its uniqueness across machines and processes, in relation to a single second; the last three bytes provide uniqueness within a single second in a single process.

Again, all documents in MongoDB must have a populated _id field. If a document has not been assigned an _id value, MongoDB will automatically generate one.

Interesting Properties of _id

Thanks to the universal, required nature of the _id field, there are several actions available to users that simplify your organization and handling of documents. For instance, users who stick to the default ObjectID can access the creation of their timestamps with the getTimestamp() command, providing an easy and reliable method for seeing precisely when a document was created. Additionally, with a little client-side computation, you can leverage the built-in timestamp to find documents based on insertion time.

Similarly, because timestamps are included in the ID’s generation, sorting by _id (or any ObjectID field) will also sort your documents with a rough adherence to when they were created. Be sure to note, however, that this sorting method does not represent a strict or exact insertion time—other components of the ID can come into play here, causing the order to reflect other variables than just creation time.

Finally, keep in mind that the _id field is immutable — that is, once a document exists in your MongoDB system, it has, by definition, been assigned an _id, and you cannot change its primary key. That said, _id can be overridden when you insert new documents, but by default it will be populated with an ObjectID. Overriding the _id field for a document can be useful, but when you do so, you need  to ensure that the values for each document are unique.

What to Beware When Leveraging _id

Because of these factors and properties of the _id field, there are (at least!) two things to be wary of when dealing with it:

  • Be careful when sharding with the _id index — because ObjectIDs contain timestamps, they are generated in ascending order! Therefore, if you shard a collection by _id, make sure you use hashed sharding. Otherwise, all new inserts will go to a single shard. This is a problem that MongoDB beginners frequently run into if not warned that this is the default behavior.
  • Remember that overriding the _id field means you'll need to generate a unique value for your documents. While you can use UUIDs or some other value unique to your application, it’s best to allow MongoDB to use its default if you aren’t certain your replacement field is unique.

Conclusions

MongoDB's _id field is fundamental to every collection in MongoDB, and, by default, it has some useful properties that users can take advantage of when familiar with how _id is generated. Understanding the field's default behavior and the advantages and pitfalls associated with it can be useful for managing your collections as well as understanding when to override it.

Subscribe to Email Updates

Posts by Topic

see all