Update 11/10/14: The next blog post in this series is on managing disk space in MongoDB.
As your MongoDB grows in size, information from the db.stats() diagnostic command (or the database “Stats” tab in our management portal) becomes increasingly helpful for evaluating hardware requirements.
We frequently get questions about the dataSize, storageSize and fileSize metrics, so we want to help developers better understand how MongoDB storage works and what these particular metrics mean.
MongoDB storage structure basics
First, we’ll go over the basics of how MongoDB stores your data.
Every MongoDB instance consists of a namespace file, journal files and data files. For our discussion, we’ll only be focusing on data files, since that is where all of the data and indexes for your database reside.
Data files store BSON documents, indexes, and MongoDB-generated metadata in structures called extents. Each data file is made up of multiple extents.
Extents are logical containers within data files used to store documents and indexes.
The above diagram illustrates the relationship between data files and extents. Note:
- Data and indexes are each contained in their own sets of extents; no extent will ever contain content for more than one collection
- Data and indexes are never contained within the same extent
- The data and indexes for a collection will usually span multiple extents
- When a new extent is needed, MongoDB will attempt to use available space within current data files. If space cannot be found MongoDB will create new data files.
Metrics from db.stats()
Now that we understand the basics of how MongoDB storage is organized, we can explore metrics commonly examined with db.stats(): dataSize, storageSize and fileSize.
The dataSize metric is the sum of the sizes (in bytes) of all the documents and padding stored in the database.
While dataSize does decrease when you delete documents, dataSize does not decrease when documents shrink because the space used by the original document has already been allocated (to that particular document) and cannot be used by other documents.
Alternatively, if a user updates a document with more data, dataSize will remain the same as long as the new document fits within its originally padded pre-allocated space.
The storageSize metric is equal to the size (in bytes) of all the data extents in the database. This number is larger than dataSize because it includes yet-unused space (in data extents) and space vacated by deleted or moved documents within extents.
The storageSize does not decrease as you remove or shrink documents.
The fileSize metric is equal to the size (in bytes) of all the data extents, index extents and yet-unused space (in data files) in the database. This metric represents the storage footprint of your database on disk. fileSize is larger than storageSize because it includes index extents and yet-unused space in data files.
While fileSize does decrease when you delete a database, fileSize does not decrease as you remove collections, documents or indexes.
That’s it! The next time someone asks you how big your database is you know what to tell them.