Thursday, January 5, 2017

Why Caching Matters?

Recall from our previous post that embedding can be used to either contain or cache data.

Now consider the following JSON object (or document):

var product =

  _id: "P_00001",
  Name: "Baseball Bat",
  Price: 20,
  Category: {
    _id: "C_00001",
    Name: "Sporting Goods",
    URL: "./Sporting_Goods/Index.html"
  }


How do you know whether the information embedded in the "Category" field is contained or cached?

As a human, you can infer that the Category of "Sporting Goods" is probably relevant for many Products, so the "Category" field probably represents cached data.

The real Category data probably exists in a collection of Categories elsewhere, but for performance or simplicity, the data was duplicated within the Product.

It is important to realize this distinction, because caching is intricately connected to one of the major reasons often cited for switching to NoSQL: storing larger objects that reduce the amount of server-side processing necessary to render a given page.


What's The Problem?

Consider that you want to change the URL for the Category "Sporting Goods". How do you accomplish this?

Without Schema

Without schema, you must face the fact that this information could be copied anywhere in your database? Are you going to query every document to find the cached copies?

Doing so would probably not be a very effective use of resources...


With Implied Schema

With implied schema, you can assume which fields store cached data.

For example, you can assume that the "Category" field of any Product caches a Category document.

For each database that you build, you could write code to manage updates to cached information.

But you would need to build this code flexibly enough to handle special cases...

For instance, if you cached Product data inside a Sales Order, you would not want to update that cached Price value, even if the Product's price changes, because the data would not match what the Customer actually paid.

Complicated cases require more update management code, which can introduce even more errors.


With ODM Libraries

ODM libraries take an entirely different approach. Instead of managing cached data stored in the database, they dynamically traverse references as data is read into memory on the web server.

One benefit of this approach is that it should be familiar to most software developers, since many SQL ORMs take the same approach.

However, if you use this approach, you are making a conscious decision to store smaller, more row-like objects, and you will then lose-out on some of the promised advantages of moving to NoSQL.

Specifically, this style of usage precludes the benefits of storing larger objects that contain lots of cached data.

Also, there are limitations for how this approach can be used in terms of referencing sub-documents from other collections.

Overall, this is a safe approach for converting from SQL to NoSQL, so we use this as the default in our 1Schema Database Converter.

But if you want to take advantage of storing large documents, keep reading...


With 1Schema's Experimental Caching

Perhaps you are either unconvinced of the value of using the ODM approach described above or your compelling reason for switching is tied to caching.

You may be scratching your head right now, wondering what the point of switch to NoSQL is. Many of the other scalability benefits of NoSQL can be achieved with modern SQL databases.

If you fall into this category, you should try our in-house approach to caching!

As mentioned above, if we choose to cache data within the database, we need to make sure that we know 1) what is cached where and 2) how to update it.

To handle the first point, we introduce conventions that allow us to clearly mark cached data and configure how this data is updated.

As for the second point, we auto-generate change propagation functions for each collection that caches data, so you do not need to do any extra work.

We seek to provide the power of caching without and fo the headache, so you can unleash the full power of NoSQL databases.

How Our Conventions Work

For caching data, we embed ID references within the root document, similar to how you use ID refereneces in SQL or Mongoose ODM.

However, at the point of the ID reference, instead of embedding the ObjectID directly, we embed an extra object around this ObjectID, so that we have a place where we can configure how the reference used to cache data.

This extra level of embedding also provides a clear visual indication as to which data is cached.

For example, here is how 1Schema would export our original example:

var product =

  _id: "P_00001",
  Name: "Baseball Bat",
  Price: 20,
  Category: {
    _id: "C_00001",
    OS_LAST_UPDATE_DATE: "2017-01-04",
    OS_MAX_CACHING_DEPTH: 1,
    OS_CACHED_DOC: {
      _id: "C_00001",
      Name: "Sporting Goods",
      URL: "./Sporting_Goods/Index.html"
    }
  }


Note the special fields "OS_LAST_UPDATE_DATE", "OS_MAX_CACHING_DEPTH", and "OS_CACHED_DOC". These fields configure how changes to the cached document are handled.

Additionally, the "OS_DO_NOT_UPDATE" field could be used to instruct our update management code to never update the cached value (see Price example above).

Furthermore, we are working on adding more special fields that allow you to further configure caching behavior based on dynamic constraints.


The Benefit to You

So by using the 1Schema approach, you get the following benefits:
  1. Easily differentiate between Contained Data and Cached Data
  2. Automatically update Cached Data without needing to write change handling code
  3. Leverage the full benefits of NoSQL to decrease the amount of processing necessary to render a page

No comments:

Post a Comment