Tuesday, December 6, 2016

Why NoSQL?

Traditional SQL databases are row-centric, and they derive their strengths of transactionality and consistency from this focus. Data is managed so that each individual row is always in a consistent state.

For applications where data integrity is critical, this is a good perspective to take on the system.

However, for web applications, often performance and scalability are more important than complete transactionality.

More simply put, it is often better to return data that is mostly up-to-date quickly than data that is 100% up-to-date slowly.

So NoSQL databases were created to address this need.

They store bigger chunks of data together so that less server processing is necessary to render a complete page view. However, some of that data may be duplicated from other places within the database, and different values in different places may differ at any given time.

With 1Schema.com, our goal is to provide a tool that can help you store your system's data in whole-page data models, without needing to worry about what data is up-to-date and what data has been previously duplicated.


Embedding Data in NoSQL

From a design perspective, the most noticeable difference between SQL databases and NoSQL databases is the ability in NoSQL to "embed" (or "nest") complex values within a parent row (or document).

Embedded information can either take the form of a nested object or a nested array.

Embedding can be used to either:

  1. Contain data that exists locally within the parent document, or
  2. Cache data that has been duplicated from elsewhere in the database
For this post, we are focused on the first use for embedding... containing data within a parent document.

To explain this, we will use the following example of a Sales Order for a bike shop:

Sales Order #001
  • Sales Agent                         = "Bob"
  • Customer                            = "Chris"
  • 1 x "Speedy Bike" at $60 each       = $60
  • 2 x "Protecto Helmet" at $20 each   = $40

SQL Perspective

In SQL, contained data is stored within sub-tables, connected to the main table via relationships.

So for the example above, we would need at least 2 tables:

  1. Sales_Order (parent table)
  2. Sales_Order_Item (child table)
These tables would be connected using a foreign key on the "SalesOrderID" column.

By configuring the settings for the foreign key, you could configure whether UPDATE and DELETE statements cause cascading changes.

However, the rows in these 2 tables are fundamentally different pieces of information and have separate lifespans.


NoSQL Perspective

NoSQL embraces the Object-Oriented perspective that the lifespan of a contained object is existentially tied to that of its root object (or document).

In the example above, it would be wise to use containment to store the "Name", "Quantity" and "Price" of each Item in the Sales Order within the Sales Order itself.

So the NoSQL document for this data might look like:

var salesOrder_001 =

  _id: "001",
  Name: "Order #001",
  Sales Agent: "Bob",
  Customer: "Chris",
  Items: [
    { Product_Name: "Speedy Bike", Quantity: 1, Price: 60 },
    { Product_Name: "Protecto Helmet", Quantity: 2, Price: 20 }
  ]


As opposed to the SQL case, the Sales Order Items are existentially tied to the Sale Order, since they exist as contained data.

If you delete the Sale Order, the Sales Order Items will automatically be deleted, as they exist within the Sales Order.

Since this follows the way you would manage objects in an OOP language, many developers prefer this situation. In fact, NoSQL databases actually follow the Aggregate pattern of Domain-Driven Design.


Importance & Implications

In this post, we explained how to use embedding to contain information within a parent document.

For containment, the embedded information actually exists within the parent document, so no data duplication was required.

However, to achieve the design goals of using a NoSQL database, you will sometimes need to cache data that actually exists elsewhere within your data model.

Caching by-definition requires data duplication.

We will talk more about this in our next post... How to use embedding to cache duplicated information.

For more, see our next post...

Monday, December 5, 2016

Why Schema?

Modern NoSQL offerings commonly promote themselves as being "schema-less", as if this is a great feature.

For most use cases, though, there is at very least an "implied" schema for the information stored. Just because the schema is not enforced does not mean that it does not exist.

From the standpoint of 1Schema.com, we look at modern NoSQL offerings as instead providing "optional enforcement of schema".

For example, think of using MongoDB out of the box (no enforcement) vs. using MongoDB with Mongoose ODM (almost SQL-like enforcement).

From a flexibility stand-point, the strength of NoSQL is actually more based on its ability to embed sets of nested information (which we will discuss in future posts). Still, this embedded information probably also has an implied schema.


1Schema schema
This is a 1Schema schema for Sales Orders.


So next time you start a NoSQL project, as yourself:
  • Do I care about collaborating with other engineers on the project?
  • Do I care about communicating what data is supposed to be stored?
  • Do I expect certain values to exist when I use the data?
  • Do I care about managing how data is duplicated within the database?

... For the last point, check back to see how 1Schema.com can help clarify how and why data is duplicated within your NoSQL database.