Archive | education RSS feed for this section

Telemetry Series: Page Faults

A key component of optimizing application performance is tuning the performance of the database that supports it. Each post in our Telemetry series discusses an important metric used by developers and database administrators to tune the database and describes how MongoLab users can leverage Telemetry, MongoLab’s monitoring interface, to effectively review and take action on these metrics.

Page Faults

Databases are optimized for working with data that is stored on disk, but usually cache as much data as possible in RAM in order to access disk as infrequently as possible. However, as it is cost-prohibitive to store in RAM all the data accessed by the application, the database must eventually go to disk. Because disks are slower than RAM, this incurs a significant time cost.

Effectively tuning a database deployment commonly involves assessing how often the database accesses disk with an eye towards reducing the need to do so. To that end, one of the best ways to analyze the RAM and disk needs of a MongoDB deployment is to focus on what are called Page Faults.

What is a Page Fault?

MongoDB manages documents and indexes in memory by using an OS facility called MMAP, which translates data files on disk to addresses in virtual memory. The database then accesses disk blocks as though it is accessing memory directly. Meanwhile, the operating system transparently keeps as much of the mapped data cached in RAM as possible, only going to disk to retrieve data when necessary.

When MMAP receives a request for a page that is not cached, a Page Fault occurs, indicating that the OS had to read the page from disk into memory.

What do Page Faults mean for my cluster?

The frequency of Page Faults indicates how often the OS goes to disk to read data. Operations that cause Page Faults are slower because they necessarily incur disk latency.

Page Faults are one of the most important metrics to look at when diagnosing poor database performance because they suggest the cluster does not have enough RAM for what you’re trying to do. Analyzing Page Faults will help you determine if you need more RAM, or need to use RAM more efficiently.

How does Telemetry help me interpret Page Faults?

Select a deployment and then look back through Telemetry over months or even years to determine the normal level of Page Faults. In instances where Page Faults deviate from that norm, check application and database logs for operations that could be responsible. If these deviations are transient and infrequent they may not pose a practical problem. However, if they are regular or otherwise impact application performance you may need to take action.

A burst in Page Faults corresponding to an increase in database activity.

A burst in Page Faults corresponding to an increase in database activity.

If Page Faults are steady but you suspect they are too high, consider the ratio of Page Faults to Operations. If this ratio is high it could indicate unindexed queries or insufficient RAM. The definition of “high” varies across deployments and requires knowledge of the history of the deployment, but consider taking action if any of the following are true:

  • The ratio of Page Faults to Operations is greater than or equal to 1.
  • Effective Lock % is regularly above 15%.
  • Queues are regularly above 0.
  • The app seems sluggish.

Note: Future Telemetry blog posts will cover additional metrics, such as Effective Lock % and Queues. See MongoDB’s serverStatus documentation for more information.

How do I reduce Page Faults?

How you reduce Page Faults depends on their source. There are three main reasons for excessive Page Faults.

  1. Not having enough RAM for the dataset. In this case, the solution is to add more RAM to the deployment by scaling either vertically to machines with more RAM, or horizontally by adding more shards to a sharded cluster.
  2. Inefficient use of RAM due to lack of appropriate indexes. The most inefficient queries are those that cause collection scans. When a collection scan occurs, the database is iterating over every document in a collection to identify the result set for a query. During the scan, the whole collection is read into RAM, where it is inspected by the query engine. Page Faults are generally acceptable when obtaining the actual results of a query, but collection scans cause Page Faults for documents that won’t be returned to the app. Worse, these unnecessary Page Faults are likely to evict “hot” data, resulting in even more Page Faults for subsequent queries.
  3. Inefficient use of RAM due to excess indexes. When the indexed fields of a document are updated, the indexes that include those fields must be updated. When a document is moved on disk, all indexes that contain the document must be updated. These affected indexes must enter RAM to be updated. As above, this can lead to thrashing memory.

Note: For assistance determining what indexes your deployment needs, MongoLab offers a Slow Query Analyzer that provides index recommendations to Shared and Dedicated plan users.

Have questions or feedback?

We’d love to hear from you as this Telemetry blog series continues. What topics would be most interesting to you? What types of performance problems have you struggled to diagnose?

Email us at support@mongolab.com to let us know your thoughts, or to get our help tuning your MongoLab deployment.

Introducing flip-flop: MongoDB Replica Set demonstration and experimentation service

Greetings adventurers!

A lot of our users upgrade from single-node databases to replica set clusters without fully understanding how their driver, and therefore their application, will react to failover. In fact, we get so many questions about best practices with MongoDB replica sets that we thought it could be cool to host a replica set that anyone can connect to using their MongoDB driver of choice.

Today we invite you to check out flip-flop, a MongoDB Replica Set demonstration and experimentation service.  The flip-flop service consists of:

  • A live replica set that fails-over (i.e. “flips” and “flops”) every 60 seconds.  This cluster is always running and available to all at the following address:
    mongodb://testdbuser:testdbpass@flip.mongolab.com:53117,flop.mongolab.com:54117/testdb
  • A set of example client scripts (currently just in Python) that simulate client interactions with the cluster that you can use as a starting point for your own experimentation

The flip-flop service is also great for those of you working on third-party drivers. Gustavo Niemeyer, author of mgo, a MongoDB driver for the Go language, told us flip-flop helped him find and quickly fix a small bug in the driver: “This is brilliant. I actually managed to find an edge case coding a trivial example against it due to the timing of the server re-election.” Pretty cool!

Continue Reading →

{ "comments": 2 }

[“Thinking”, “About”, “Arrays”, “In”, “MongoDB”]

Greetings adventurers!

The growing popularity of MongoDB means more and more people are thinking about data in ways divergent from traditional relational models. For this reason alone, it’s exciting to experiment with new ways of modelling data. However, with additional flexibility comes the need to properly analyze the performance impact of data model decisions.

Embedding arrays in documents is a great example of this. MongoDB’s versatile array operators ($push/$pull, $addToSet, $elemMatch, etc.) offer the ability to manage data sets within documents. However, one must be careful. Data models that call for very large arrays, or arrays with high rates of modification, can often lead to performance problems.

Continue Reading →

{ "comments": 33 }

How to use MongoDB on RedHat OpenShift with MongoLab

Hey RedHat fans – we’ve got your MongoDB hosting needs covered!

In today’s post we’ll be presenting a quick-start guide on how to connect OpenShift, the free RedHat auto-scaling Platform-as-a-Service (PaaS), with our popular MongoDB Database-as-a-Service (DBaaS), MongoLab.

For demonstration purposes, we’ll be using a Node.js application that we’ve written (available for download here). All it takes to connect your OpenShift application is five easy steps!

Continue Reading →

{ "comments": 7 }

Weekend Project: Send sensor data from Arduino to MongoDB

mongolab-motion-layout

Arduino is an open-source electronics platform that can acknowledge and interact with its environment through a variety of sensor types.  It’s great for hardware prototyping and one-off projects.

I just got an Arduino Board from our friends at SendGrid, who also gave me a little tutorial in the art of Arduino hacking. Inspired by the tutorial and armed with this new  board, I bought a passive infared (PIR) motion sensor from my local Radio Shack. Now I was ready to play; in particular, I wanted to be able to collect that continuous stream of hardware sensor data into a MongoDB database for logging, trend analysis, system event correlation, etc.

Continue Reading →

{ "comments": 2 }

Object Modeling in Node.js with Mongoose

Check it out! We’ve just updated our Heroku Dev Center tutorial on object modeling in Node.js using Mongoose, a MongoDB ODM library. Mongoose gives your collections structure and simplifies Node’s callback patterns to make using MongoDB with Node.js even easier.

Learn more and download the sample Node.js app right here at the Heroku Dev Center.

Node.js and MongoLab on Windows Azure

(This tutorial was originally published on the Windows Azure documentation portal in January 2013)

Greetings, adventurers! Welcome to MongoDB-as-a-Service. Are you looking to create a Node.js Application on Windows Azure with MongoDB using the MongoLab Azure Store add-on?

In this tutorial you will:

  1. Provision the database – The Windows Azure Store MongoLab add-on will provide you with a MongoDB database hosted in the Windows Azure cloud and managed by MongoLab‘s cloud database platform.
  2. Create the app – It’ll be a simple Node.js app for maintaining a list of tasks.
  3. Deploy the app – By tying a few configuration hooks together, we’ll make pushing our code a breeze.
  4. Manage the database – Finally, we’ll show you MongoLab’s web-based database management portal where you can search, visualize, and modify data with ease.

At any time throughout this tutorial, feel free to kick off an email to support@mongolab.com if you have any questions. To complete this tutorial, you need a Windows Azure account that has the Windows Azure Web Sites feature enabled. You can create a free trial account and enable preview features in just a couple of minutes. For details, see the Create a Windows Azure account and enable preview features tutorial.

In addition, ensure that you have the following installed:
Continue Reading →

{ "comments": 2 }

DZone MongoDB Reference Card

As developers we’re always appreciative of documentation that lets us absorb a lot of detailed information as quickly as possible. While nothing can replace a detailed reading of the core MongoDB documentation from 10gen, a few pages of pithy reminders can make operational life a lot easier.

MongoLab sponsored the recent DZone reference card for MongoDB. Here’s a little snippet below. Download the rest of it here http://refcardz.dzone.com/refcardz/mongodb(registration with DZone required)

Dzone Refcard screenshot-02

Thanks to Kristina Chodorow and the 10gen crew for a nice reference card and to the folks at DZone for producing it!

UPDATE 2013-02-06: added note saying that DZone requires registration