Tuesday, December 6, 2016

Why NoSQL?

Traditional SQL databases are row-centric, and they derive their strengths of transactionality and consistency from this focus. Data is managed so that each individual row is always in a consistent state.

For applications where data integrity is critical, this is a good perspective to take on the system.

However, for web applications, often performance and scalability are more important than complete transactionality.

More simply put, it is often better to return data that is mostly up-to-date quickly than data that is 100% up-to-date slowly.

So NoSQL databases were created to address this need.

They store bigger chunks of data together so that less server processing is necessary to render a complete page view. However, some of that data may be duplicated from other places within the database, and different values in different places may differ at any given time.

With 1Schema.com, our goal is to provide a tool that can help you store your system's data in whole-page data models, without needing to worry about what data is up-to-date and what data has been previously duplicated.


Embedding Data in NoSQL

From a design perspective, the most noticeable difference between SQL databases and NoSQL databases is the ability in NoSQL to "embed" (or "nest") complex values within a parent row (or document).

Embedded information can either take the form of a nested object or a nested array.

Embedding can be used to either:

  1. Contain data that exists locally within the parent document, or
  2. Cache data that has been duplicated from elsewhere in the database
For this post, we are focused on the first use for embedding... containing data within a parent document.

To explain this, we will use the following example of a Sales Order for a bike shop:

Sales Order #001
  • Sales Agent                         = "Bob"
  • Customer                            = "Chris"
  • 1 x "Speedy Bike" at $60 each       = $60
  • 2 x "Protecto Helmet" at $20 each   = $40

SQL Perspective

In SQL, contained data is stored within sub-tables, connected to the main table via relationships.

So for the example above, we would need at least 2 tables:

  1. Sales_Order (parent table)
  2. Sales_Order_Item (child table)
These tables would be connected using a foreign key on the "SalesOrderID" column.

By configuring the settings for the foreign key, you could configure whether UPDATE and DELETE statements cause cascading changes.

However, the rows in these 2 tables are fundamentally different pieces of information and have separate lifespans.


NoSQL Perspective

NoSQL embraces the Object-Oriented perspective that the lifespan of a contained object is existentially tied to that of its root object (or document).

In the example above, it would be wise to use containment to store the "Name", "Quantity" and "Price" of each Item in the Sales Order within the Sales Order itself.

So the NoSQL document for this data might look like:

var salesOrder_001 =

  _id: "001",
  Name: "Order #001",
  Sales Agent: "Bob",
  Customer: "Chris",
  Items: [
    { Product_Name: "Speedy Bike", Quantity: 1, Price: 60 },
    { Product_Name: "Protecto Helmet", Quantity: 2, Price: 20 }
  ]


As opposed to the SQL case, the Sales Order Items are existentially tied to the Sale Order, since they exist as contained data.

If you delete the Sale Order, the Sales Order Items will automatically be deleted, as they exist within the Sales Order.

Since this follows the way you would manage objects in an OOP language, many developers prefer this situation. In fact, NoSQL databases actually follow the Aggregate pattern of Domain-Driven Design.


Importance & Implications

In this post, we explained how to use embedding to contain information within a parent document.

For containment, the embedded information actually exists within the parent document, so no data duplication was required.

However, to achieve the design goals of using a NoSQL database, you will sometimes need to cache data that actually exists elsewhere within your data model.

Caching by-definition requires data duplication.

We will talk more about this in our next post... How to use embedding to cache duplicated information.

For more, see our next post...

No comments:

Post a Comment