Archive | general RSS feed for this section

Welcoming the Parse community

Last week, Parse announced that it was winding down its service, and it will be fully retired on January 28, 2017.  

Parse provided a great mobile backend platform for developers who don’t want to wrangle with the complexities of server-side infrastructure. By encapsulating both the database and the app server into a single cloud service, mobile app developers were able to leverage Parse to rapidly build rich mobile applications with much less effort than would be required if they had to build the server component themselves.

The good news is that even though the Parse service is shutting down, it has open sourced its underlying software so that Parse customers can still experience the power of their platform. The missing component is now the hosting, and the Parse team has done a good job of giving its customers alternatives on how to host each of the components of the Parse platform, along with tools to help customers migrate their apps.

To run a Parse app in the cloud using the open-source software there are two components that need hosting:

  1. The Parse Server, written in Node.js
  2. The database that underlies Parse, which is MongoDB

We have been working with the Parse team for some months to help ensure that Parse customers have a great option for where they host the database component of their app following the closure of the Parse service, and we welcome Parse customers to the MongoLab MongoDB-as-a-Service platform.

Using MongoLab to host the database component of your Parse app will free you from having to run and manage MongoDB yourself. We handle all of the automation around database provisioning and scaling, ensure timely backups of your database are taken each day, and provide a suite of great tools to help you manage your data. Our multi-node MongoDB clusters also offer High Availability and failover so that your app stays running even in the face of infrastructure failure, and is available whether you host your Parse app on Amazon, Azure, or Google.

In the past week, we have seen a good number of Parse customers migrating to our service. We feel we are starting have a good handle on the process and the types of speed bumps customers may face while migrating. Over the next few weeks we will be releasing a FAQ that encapsulates these learnings in order to help customers navigate the process smoothly.

In the meantime, if you have any questions or need migration help we invite you to email us at support@mongolab.com. We look forward to helping you build the future.

Help save Robomongo with your donation

We recently learned that the team behind Robomongo, a free and open source MongoDB admin GUI, are in need of funds to keep the project going. The project, which has over 3,622 stargazers on GitHub, is a valuable free tool for the MongoDB community. We know that many of you, our customers, use and love Robomongo, so let’s try and help them!

The Robomongo team is currently fundraising on Indiegogo. To help support them, we’re announcing that MongoLab will match all donations starting now for a total of up to $15,000. Donate now and help save Robomongo!

New Telemetry features – metric descriptions and alert incidents

If you are running MongoDB in production, you should have a robust uptime and historical monitoring solution for every database deployment. Uptime monitoring ensures that your application runs smoothly by tracking database stability and alerts you to take action if necessary. Historical monitoring helps you analyze and compare database and operating system metrics over time so that you can make informed decisions when tuning and scaling your database.

Telemetry, our real-time and historical monitoring tool, provides a customizable dashboard and alerting system that allows you to track key MongoDB metrics, analyze specific points in time, and configure custom alert thresholds. We’ve now made Telemetry even easier to work with by adding metric descriptions for each graph along with a list of alert incidents.

Metric descriptions in UI

Telemetry CPU graph

You may have noticed the new “?” icon located on every Telemetry chart. If you click on the icon, you can view the descriptions for each metric. For example, the CPU metric descriptions are the following:

Telemetry metrics help text

We hope these descriptions help you better understand each metric and serve as a quick reference when you are reading the charts or configuring alerts.

Telemetry alert incidents

Telemetry also allows you to create custom alerts. For example, you may want to configure an alert whenever the CPU User metric exceeds 75%. You can visit our Telemetry documentation for more information on how to configure Telemetry alerts and set up different notification channels.

If you have multiple alerts configured and your database is under duress, it is likely that there are multiple alerts are triggered at once. To help you keep track of alert incidents we have now added a new tab in Telemetry called “Alert Incidents”. You can toggle between “open” and “closed” status, where “open” events are active issues and the “closed” events are past events.

Questions or feedback?

For questions about Telemetry metrics or feedback on what you would like to see in Telemetry, please contact MongoLab support. We look forward to hearing from you!

Shared Cluster plans in AWS Oregon (us-west-2)

We are now offering Shared Cluster plans in the AWS Oregon region. You can visit the MongoLab create page if you would like to provision the plan.

Shared Cluster plans are configured with two data-bearing nodes and an arbiter. These plans are hosted on shared multi-tenanted resources but are also suitable for “production use“. If you have questions about our different plans, we have helpful documentation on each plan type. For additional help, you can also email us at support@mongolab.com.

 

Telemetry Series: Queues and Effective Lock Percent

A key component of optimizing application performance is tuning the performance of the database that supports it. Each post in our Telemetry series discusses an important metric used by developers and database administrators to tune the database and describes how MongoLab users can leverage Telemetry, MongoLab’s monitoring interface, to effectively review and take action on these metrics.

Queues and Effective Lock Percent

Any time an operation can’t acquire a lock it needs, it becomes queued and must wait for the lock to be released. Because operations that are queued on the database side often imply that operations are queued on the application side, Queues is an important Telemetry metric for assessing how well your MongoLab deployment is responding to the demands of your app.

In MongoDB 2.6 and earlier, you will find that Queues tend to rise and fall with Effective Lock %. This is because Queues refers specifically to the operations that are waiting on another operation (or series of operations) to release MongoDB’s database-level and global-level locks.

With MongoDB 3.0 (using the MMAP storage engine), locking is enforced at the collection level, and Effective Lock % is not reported as a server-level metric. This makes the Queues metric even more important. While it may not be clear from the Telemetry interface exactly which collection(s) is/are heavily locked, elevated queueing is usually a consequence of locking.

The focus on Queues is also preferable because, by design, locking is going to happen on any MongoDB that is receiving writes. As long as that locking isn’t resulting in queueing, it is usually not a concern.

Image of Telemetry charts

High locks leading to high queues

What is Effective Lock Percent?

MongoDB uses multi-granular reader-writer locking. Reads prevent a write from acquiring the lock, and a write prevents reads or other writes from acquiring the lock. But, reads do not block other reads. As well, each operation holds the lock at a granularity level appropriate for the operation itself.

In MongoDB 2.6 there are two granularity levels: a Global lock and Database lock for each database. In this scheme, operations performed on separate databases do not lock each other unless those operations also require the Global lock.

Effective Lock Percent in MongoDB 2.6 is a calculated metric that adds together the Global Lock % and the Lock % of the most-locked database at the time. Because of this computation, and because of the way operations are sampled, values greater than 100% may occur.    

In MongoDB 3.0 with the MMAP storage engine, MongoDB locks at the Global, Database, and Collection-level. A normal write operation holds the Database lock in MongoDB 2.6, but only holds a specific collection’s Collection lock in MongoDB 3.0. This improvement means separate collections can be concurrently read from or written to.

MongoDB 3.0 with the WiredTiger storage engine uses document-level locking for even greater parallelization. Writes to a single collection won’t block each other unless they are to the same document.

Note that locking operations can and do yield periodically, so incoming operations may still progress on a heavily locked server. For more detail, read MongoDB’s concurrency documentation.

What do I do if I see locking and queueing?

Locking is a normal part of databases so some level of locking and queueing is expected. First, consider if the locking and queueing is a problem. You should typically not be concerned with Effective Lock Percent values of less than 15%, but each app is different. Likewise, queueing can be fine as long as the app is not blocked on queued requests.

If you see a rise in Queues and Effective Lock % in Telemetry that corresponds to problems with your application, try the following steps:

  1. If queues and locks coincide with Page Faults, check out Telemetry Series: Page Faults–the previous blog in this series–for potential resolutions, such as optimizing indexes or ultimately increasing RAM.
  2. If locking and queueing don’t coincide with Page Faults, there are two potential causes:
    1. You may have an inefficient index. While poor indexing typically leads to page faulting, this is not the case if all of your data and indexes already fit into available RAM. Yet the CPU cost of collection scans can still cause a lock to be held for longer than necessary. In this case, reduce collection scanning using the index optimization steps in Telemetry Series: Page Faults.
    2. If operations are well-indexed, check your write operations and consider reducing the need for frequent incidents of:
      • updates to large documents
      • updates that require document moves
      • full-document updates (i.e., those that don’t use update operators)
      • updates using array update operators like $push, $pull, etc.
  3. If queuing and locking cannot be reduced by improving indexes, write strategies, or the data model, it is time to consider heavier hardware, and potentially sharding.

Importantly, queueing can occur because of a small number of long-running operations. If those operations haven’t finished yet, they won’t appear in the mongod logs.  Viewing and potentially killing the offending current operations can be a short-term fix until those operations can be examined for efficiency. To learn more about viewing and killing operations, refer to our documentation on Operation Management.

Have questions or feedback?

We’d love to hear from you as this Telemetry blog series continues. What topics would be most interesting to you? What types of performance problems have you struggled to diagnose?

Email us at support@mongolab.com to let us know your thoughts, or to get our help tuning your MongoLab deployment.

 

Telemetry Series: Page Faults

A key component of optimizing application performance is tuning the performance of the database that supports it. Each post in our Telemetry series discusses an important metric used by developers and database administrators to tune the database and describes how MongoLab users can leverage Telemetry, MongoLab’s monitoring interface, to effectively review and take action on these metrics.

Page Faults

Databases are optimized for working with data that is stored on disk, but usually cache as much data as possible in RAM in order to access disk as infrequently as possible. However, as it is cost-prohibitive to store in RAM all the data accessed by the application, the database must eventually go to disk. Because disks are slower than RAM, this incurs a significant time cost.

Effectively tuning a database deployment commonly involves assessing how often the database accesses disk with an eye towards reducing the need to do so. To that end, one of the best ways to analyze the RAM and disk needs of a MongoDB deployment is to focus on what are called Page Faults.

What is a Page Fault?

MongoDB manages documents and indexes in memory by using an OS facility called MMAP, which translates data files on disk to addresses in virtual memory. The database then accesses disk blocks as though it is accessing memory directly. Meanwhile, the operating system transparently keeps as much of the mapped data cached in RAM as possible, only going to disk to retrieve data when necessary.

When MMAP receives a request for a page that is not cached, a Page Fault occurs, indicating that the OS had to read the page from disk into memory.

What do Page Faults mean for my cluster?

The frequency of Page Faults indicates how often the OS goes to disk to read data. Operations that cause Page Faults are slower because they necessarily incur disk latency.

Page Faults are one of the most important metrics to look at when diagnosing poor database performance because they suggest the cluster does not have enough RAM for what you’re trying to do. Analyzing Page Faults will help you determine if you need more RAM, or need to use RAM more efficiently.

How does Telemetry help me interpret Page Faults?

Select a deployment and then look back through Telemetry over months or even years to determine the normal level of Page Faults. In instances where Page Faults deviate from that norm, check application and database logs for operations that could be responsible. If these deviations are transient and infrequent they may not pose a practical problem. However, if they are regular or otherwise impact application performance you may need to take action.

A burst in Page Faults corresponding to an increase in database activity.

A burst in Page Faults corresponding to an increase in database activity.

If Page Faults are steady but you suspect they are too high, consider the ratio of Page Faults to Operations. If this ratio is high it could indicate unindexed queries or insufficient RAM. The definition of “high” varies across deployments and requires knowledge of the history of the deployment, but consider taking action if any of the following are true:

  • The ratio of Page Faults to Operations is greater than or equal to 1.
  • Effective Lock % is regularly above 15%.
  • Queues are regularly above 0.
  • The app seems sluggish.

Note: Future Telemetry blog posts will cover additional metrics, such as Effective Lock % and Queues. See MongoDB’s serverStatus documentation for more information.

How do I reduce Page Faults?

How you reduce Page Faults depends on their source. There are three main reasons for excessive Page Faults.

  1. Not having enough RAM for the dataset. In this case, the solution is to add more RAM to the deployment by scaling either vertically to machines with more RAM, or horizontally by adding more shards to a sharded cluster.
  2. Inefficient use of RAM due to lack of appropriate indexes. The most inefficient queries are those that cause collection scans. When a collection scan occurs, the database is iterating over every document in a collection to identify the result set for a query. During the scan, the whole collection is read into RAM, where it is inspected by the query engine. Page Faults are generally acceptable when obtaining the actual results of a query, but collection scans cause Page Faults for documents that won’t be returned to the app. Worse, these unnecessary Page Faults are likely to evict “hot” data, resulting in even more Page Faults for subsequent queries.
  3. Inefficient use of RAM due to excess indexes. When the indexed fields of a document are updated, the indexes that include those fields must be updated. When a document is moved on disk, all indexes that contain the document must be updated. These affected indexes must enter RAM to be updated. As above, this can lead to thrashing memory.

Note: For assistance determining what indexes your deployment needs, MongoLab offers a Slow Query Analyzer that provides index recommendations to Shared and Dedicated plan users.

Have questions or feedback?

We’d love to hear from you as this Telemetry blog series continues. What topics would be most interesting to you? What types of performance problems have you struggled to diagnose?

Email us at support@mongolab.com to let us know your thoughts, or to get our help tuning your MongoLab deployment.

{ "comments": 1 }

MongoDB version 3.0 now GA on MongoLab

We’re excited to announce that MongoDB 3.0 is now available on all MongoLab plans. Since the release of version 3.0 was announced in March, we’ve done extensive testing to ensure that it is production-ready for MongoLab users. For those looking to upgrade or create a new MongoDB 3.0 plan, you can do so through our self-service UI. Version 3.0 offers several valuable improvements, including collection-level locking; a new, more secure user authentication mechanism (SCRAM-SHA-1); and the WiredTiger storage engine. Each of these three improvements is described in detail below.

There are two important items to note, as you consider upgrading to version 3.0:

1) A driver upgrade may be required when upgrading your database to 3.0. You can find a matrix of 3.0 compatible drivers in the MongoDB 3.0 release notes.

2) Our release of support for version 3.0 comes with the default MMAPv1 storage engine. Support for the new WiredTiger storage engine will come later, most likely with the release of MongoDB 3.2, where it is expected to become the default storage engine for MongoDB. For more information about our support for storage engines, please read the section entitled “WiredTiger storage engine” below.

Collection-level locking

The default storage engine in MongoDB 3.0 is MMAPv1, which experienced MongoDB users may recognize as the same storage engine underlying previous versions of MongoDB. Although the name has stayed the same, MongoDB now offers collection-level locking; in prior versions of MongoDB, the database-level lock was the finest-grain lock.

How will this impact you?  In versions of MongoDB prior to 3.0, database-level locking would lock the entire database any time an operation that required the write lock (e.g. insert, update, delete) was issued. With collection-level locking, a write operation on one collection will not block the database from servicing reads and writes on other collections.

The effects of collection-level locks on your database deployment will vary depending on your data model, but generally you should see performance improvements, particularly in write-heavy workloads that target more than one collection.

SCRAM-SHA-1 authentication

In MongoDB 3.0, SCRAM-SHA-1 has now replaced MONGODB-CR as the default authentication mechanism. For the security buffs, MongoDB has written an interesting blog post that speaks to the advantages of SCRAM (short for “Salted Challenge Response Authentication Mechanism”). Two notable benefits include improved security against “malicious servers,” and heightened resistance to “replay attacks.”

Depending on your driver version, you may need to upgrade your driver to a 3.0- (or SCRAM-) compatible version. If you’re unsure if your current driver version supports SCRAM, be sure to check out MongoDB’s release notes. Again, make sure you double-check your driver version before you upgrade, or your driver will start throwing errors and you will experience downtime!

WiredTiger storage engine

MongoDB 3.0 ships with two storage engines: the default MMAPv1 engine (with collection-level locking), and the new WiredTiger storage engine (with document-level locking). We’re very excited about WiredTiger and have already begun testing internally. We look forward to supporting WiredTiger for MongoLab production plans when it is expected to become the default storage engine in version 3.2.

Questions?

As you will discover, there are numerous changes and enhancements to 3.0. We recommend that you explore the full list of changes and improvements in the MongoDB 3.0 release notes. If you have any questions along the way, drop us a line at support@mongolab.com and we’d be happy to help!  For example, if your MongoLab deployment experiences high write loads, and you would like to discuss how best to leverage collection-level locking to enhance your performance, please drop us a line!

MongoLab Telemetry supports custom MongoDB metric alerts

We’re excited to announce that you can now use MongoLab Telemetry to configure per-metric alerts for your MongoLab deployments! These custom alerts allow you to stay updated on your database’s performance even when you’re not actively working with the database.

For each metric in your Telemetry dashboard you may define custom threshold values and alerting methods (email, PagerDuty, etc.).

For a Quick-start Guide and full docs, visit our documentation on Telemetry Alerts.