1Schema.com

Monday, January 9, 2017

Convert from SQL to NoSQL in 60 Seconds

Today, we are launching the 1Schema Database Converter, which you can use to easily convert your existing SQL schema and data to a NoSQL database technology of your choosing.

We built this converter to ease the transition when switching between database technologies.

We want to help you find the technology that works best for your particular application!

What you need to get started:

Before you begin, you will need to export the SQL code for your existing database as a plain text (not binary).

This file must contain all the CREATE TABLE and ALTER TABLE statements necessary to create your Tables, Primary Keys, Unique Constraints, and Foreign Keys.

Also, the file may include data, so long as all data is encoded as INSERT statements and the total size of your file remains under 5 MB. Your data will be transformed and included in your converted database script.

If you need to convert more data than 5 MB will allow, please contact us for help.

If you want to use the same database that we use for this tutorial, download our SQL file:
SQL Server Demo DB
PostgreSQL Demo DB

What to think about next:

So you have your SQL schema file... you can now proceed to the 1Schema Converter and convert your file.

However, before doing so, it is worth thinking about how you want to configure the 1Schema Converter so that you get the desired result.

So for each SQL Table, consider how you want the Foreign Keys to be mapped during conversion:

Does the FKEY indicate the "Parent Table", inside which the Current Table should be embedded (aka nested)?
For all other FKEYs, should the list of related IDs from the Current Table be stored in the Parent Table?
EXPERIMENTAL: Should IDs be traversed and cached within the database?

The answer to these questions will directly determine the settings that you should select in the 1Schema Converter.

If these questions are not immediately clear, do not worry, as the default conversion settings will provide you with a solid starting point.

How to use the 1Schema Converter:

You are now ready to use the 1Schema Converter to convert your SQL database:

1) Go to "http://www.1schema.com/DatabaseConverter" and you should see the following page:

Select the "SQL Database Type" that corresponds to the source of your input file (for our demo, we will use "Microsoft SQL Server ®")
Select the "NoSQL Database Type" that corresponds to the type of output you want (currently, only "MongoDB ®" is implemented)
Select the "Schema Conversion Mode" depending on whether you want to use the experimental code to store cached data in the database
Select the SQL file that contains your original schema

2) Wait until the grid below populates

3) Use the grid to select your desired settings

4) Scroll to the bottom of the page and click the "Convert Database" button.

5) Download your converted file.

NOTE: Your browser may warn you about downloading JavaScript files. Please download your converted file anyways.

5) Load your converted file into your NoSQL database.

Sunday, January 8, 2017

Thoughts on Converting from SQL to NoSQL

From our previous posts, you know that while we do support conversion to NoSQL using ODM libraries, we invested significant effort building our own customized approach to managing cross-collection references.

We want to take this opportunity to clarify some important points, so you understand why we developed our own approach and how this approach can help you achieve your goals.

We also explain how you can use the 1Schema Database Converter to help perform this process quickly, easily, and correctly.

The Use-Case for Consideration:

We are focused on the use case where an engineering team is trying to optimize performance and scalability of a website or app by switching to a NoSQL database technology.

In this case, the promise of NoSQL technologies is that by storing larger clusters of information (including nested information), the amount of server processing necessary to render a page view can be minimized.

Ideally, you could render a page view with only 1 read from the database (although this goal is often not possible in practice).

The cost of this approach is that more disk space is used to store these data structures, as information is often duplicated in multiple places.

However, since disk space is relatively cheap, it can be extremely beneficial to sacrifice disk space to improve processing time.

If you believe in the benefits of this approach, you should try converting your SQL database using our custom document caching techniques, which solve the problems laid out below.

The Problem with this Use-Case:

Often, when SQL-based engineers start learning about NoSQL technologies, they see the ability to embed (or nest) information within a given document (i.e. NoSQL equivalent of a row) as a really promising feature.

They come from a world of flat rows, where storing nested information requires creating a separate child table and creating a foreign key constraint between the two tables.

The idea of just adding nested information is often quite appealing.

However, the shine usually wears off when they start to realize that they need to nest the same information in multiple places.

To prevent the confusion caused by duplication, they often change their NoSQL design to use a more normalized approach to avoid duplication.

BUT THIS DEFEATS THE PURPOSE OF SWITCHING TO NOSQL!!!

This lack of clarity really hurts NoSQL adoption, because here is where many engineers revert to using normalized schemas in traditional SQL databases.

So the question becomes, how do we

Take advantage of NoSQL technologies to reduce server processing and increase performance and scalability AND
Still know what the heck is going on

How 1Schema Solves This:

At 1Schema, we built our custom NoSQL converter from the ground-up to help you differentiate between data that is truly embedded within your root document (contained data) and data that you are copying there to improve performance (cached data).

NOTE: For a more detailed explanation of the difference between contained data and cached data, see this post...

All contained data actually exists within its parent document, so you know to go to the parent document to update it. In the 1Schema Converter, you specify a single "Parent Table" if you want that your table to be contained within the parent table. The options for "Parent Table" will be obtained by examining the Foreign Key Constraints specified for a particular table.

Conversely, all cached data is copied locally by our update management code, but actually exists elsewhere. All Foreign Key Constraints other than those used to specify the "Parent Table" can be used to configure caching.

Caching actually occurs when our change management code traverses an ID reference and copies the relevant information locally. Furthermore, the change management code will automatically update this data when the document in the foreign collection changes.

By using caching, you can accomplish the goal of larger clusters of information that reduce the processing necessary to render a page view, but prevent the confusion that duplication often causes within a data model.

The catch is that you are not guaranteed this cached data is always up-to-date, since it is copied from the place that it is actually stored.

However, our converter provides you with code that automatically updates your cached data, and you can configure this code to run as you desire.

Also, please note that to simplify the number of settings in the converter, you can either specify to use caching or not for a particular table, but you cannot choose individual foreign keys to use. It is all or nothing.

If you need more granularity on which ID references are used for caching, you should sign-up for the beta release of our visual Schema Designer.

Concluding Thoughts

At 1Schema, we do not advocate either normalization or denormalization... instead, we want to give you control of this choice on a case-by-case basis.

In fact, it can actually be quite beneficial to start with a normalized SQL schema and proceed from there.

Then you can intelligently choose when to denormalize specific parts of your data model, depending on the criteria important to you.

Thursday, January 5, 2017

Why Caching Matters?

Recall from our previous post that embedding can be used to either contain or cache data.

Now consider the following JSON object (or document):

var product =
{
  _id: "P_00001",
  Name: "Baseball Bat",
  Price: 20,
  Category: {
    _id: "C_00001",
    Name: "Sporting Goods",
    URL: "./Sporting_Goods/Index.html"
  }
}

How do you know whether the information embedded in the "Category" field is contained or cached?

As a human, you can infer that the Category of "Sporting Goods" is probably relevant for many Products, so the "Category" field probably represents cached data.

The real Category data probably exists in a collection of Categories elsewhere, but for performance or simplicity, the data was duplicated within the Product.

It is important to realize this distinction, because caching is intricately connected to one of the major reasons often cited for switching to NoSQL: storing larger objects that reduce the amount of server-side processing necessary to render a given page.

What's The Problem?

Consider that you want to change the URL for the Category "Sporting Goods". How do you accomplish this?

Without Schema

Without schema, you must face the fact that this information could be copied anywhere in your database? Are you going to query every document to find the cached copies?

Doing so would probably not be a very effective use of resources...

With Implied Schema

With implied schema, you can assume which fields store cached data.

For example, you can assume that the "Category" field of any Product caches a Category document.

For each database that you build, you could write code to manage updates to cached information.

But you would need to build this code flexibly enough to handle special cases...

For instance, if you cached Product data inside a Sales Order, you would not want to update that cached Price value, even if the Product's price changes, because the data would not match what the Customer actually paid.

Complicated cases require more update management code, which can introduce even more errors.

With ODM Libraries

ODM libraries take an entirely different approach. Instead of managing cached data stored in the database, they dynamically traverse references as data is read into memory on the web server.

One benefit of this approach is that it should be familiar to most software developers, since many SQL ORMs take the same approach.

However, if you use this approach, you are making a conscious decision to store smaller, more row-like objects, and you will then lose-out on some of the promised advantages of moving to NoSQL.

Specifically, this style of usage precludes the benefits of storing larger objects that contain lots of cached data.

Also, there are limitations for how this approach can be used in terms of referencing sub-documents from other collections.

Overall, this is a safe approach for converting from SQL to NoSQL, so we use this as the default in our 1Schema Database Converter.

But if you want to take advantage of storing large documents, keep reading...

With 1Schema's Experimental Caching

Perhaps you are either unconvinced of the value of using the ODM approach described above or your compelling reason for switching is tied to caching.

You may be scratching your head right now, wondering what the point of switch to NoSQL is. Many of the other scalability benefits of NoSQL can be achieved with modern SQL databases.

If you fall into this category, you should try our in-house approach to caching!

As mentioned above, if we choose to cache data within the database, we need to make sure that we know 1) what is cached where and 2) how to update it.

To handle the first point, we introduce conventions that allow us to clearly mark cached data and configure how this data is updated.

As for the second point, we auto-generate change propagation functions for each collection that caches data, so you do not need to do any extra work.

We seek to provide the power of caching without and fo the headache, so you can unleash the full power of NoSQL databases.

How Our Conventions Work

For caching data, we embed ID references within the root document, similar to how you use ID refereneces in SQL or Mongoose ODM.

However, at the point of the ID reference, instead of embedding the ObjectID directly, we embed an extra object around this ObjectID, so that we have a place where we can configure how the reference used to cache data.

This extra level of embedding also provides a clear visual indication as to which data is cached.

For example, here is how 1Schema would export our original example:

var product =
{
  _id: "P_00001",
  Name: "Baseball Bat",
  Price: 20,
  Category: {
    _id: "C_00001",
    OS_LAST_UPDATE_DATE: "2017-01-04",
    OS_MAX_CACHING_DEPTH: 1,
    OS_CACHED_DOC: {
      _id: "C_00001",
      Name: "Sporting Goods",
      URL: "./Sporting_Goods/Index.html"
    }
  }
}

Note the special fields "OS_LAST_UPDATE_DATE", "OS_MAX_CACHING_DEPTH", and "OS_CACHED_DOC". These fields configure how changes to the cached document are handled.

Additionally, the "OS_DO_NOT_UPDATE" field could be used to instruct our update management code to never update the cached value (see Price example above).

Furthermore, we are working on adding more special fields that allow you to further configure caching behavior based on dynamic constraints.

The Benefit to You

So by using the 1Schema approach, you get the following benefits:

Easily differentiate between Contained Data and Cached Data
Automatically update Cached Data without needing to write change handling code
Leverage the full benefits of NoSQL to decrease the amount of processing necessary to render a page

Tuesday, December 6, 2016

Why NoSQL?

Traditional SQL databases are row-centric, and they derive their strengths of transactionality and consistency from this focus. Data is managed so that each individual row is always in a consistent state.

For applications where data integrity is critical, this is a good perspective to take on the system.

However, for web applications, often performance and scalability are more important than complete transactionality.

More simply put, it is often better to return data that is mostly up-to-date quickly than data that is 100% up-to-date slowly.

So NoSQL databases were created to address this need.

They store bigger chunks of data together so that less server processing is necessary to render a complete page view. However, some of that data may be duplicated from other places within the database, and different values in different places may differ at any given time.

With 1Schema.com, our goal is to provide a tool that can help you store your system's data in whole-page data models, without needing to worry about what data is up-to-date and what data has been previously duplicated.

Embedding Data in NoSQL

From a design perspective, the most noticeable difference between SQL databases and NoSQL databases is the ability in NoSQL to "embed" (or "nest") complex values within a parent row (or document).

Embedded information can either take the form of a nested object or a nested array.

Embedding can be used to either:

Contain data that exists locally within the parent document, or
Cache data that has been duplicated from elsewhere in the database

For this post, we are focused on the first use for embedding... containing data within a parent document.

To explain this, we will use the following example of a Sales Order for a bike shop:

Sales Order #001

Sales Agent = "Bob"
Customer = "Chris"
1 x "Speedy Bike" at $60 each = $60
2 x "Protecto Helmet" at $20 each = $40

SQL Perspective

In SQL, contained data is stored within sub-tables, connected to the main table via relationships.

So for the example above, we would need at least 2 tables:

Sales_Order (parent table)
Sales_Order_Item (child table)

These tables would be connected using a foreign key on the "SalesOrderID" column.

By configuring the settings for the foreign key, you could configure whether UPDATE and DELETE statements cause cascading changes.

However, the rows in these 2 tables are fundamentally different pieces of information and have separate lifespans.

NoSQL Perspective

NoSQL embraces the Object-Oriented perspective that the lifespan of a contained object is existentially tied to that of its root object (or document).

In the example above, it would be wise to use containment to store the "Name", "Quantity" and "Price" of each Item in the Sales Order within the Sales Order itself.

So the NoSQL document for this data might look like:

var salesOrder_001 =
{
  _id: "001",
  Name: "Order #001",
  Sales Agent: "Bob",
  Customer: "Chris",
  Items: [
    { Product_Name: "Speedy Bike", Quantity: 1, Price: 60 },
    { Product_Name: "Protecto Helmet", Quantity: 2, Price: 20 }
  ]
}

As opposed to the SQL case, the Sales Order Items are existentially tied to the Sale Order, since they exist as contained data.

If you delete the Sale Order, the Sales Order Items will automatically be deleted, as they exist within the Sales Order.

Since this follows the way you would manage objects in an OOP language, many developers prefer this situation. In fact, NoSQL databases actually follow the Aggregate pattern of Domain-Driven Design.

Importance & Implications

In this post, we explained how to use embedding to contain information within a parent document.

For containment, the embedded information actually exists within the parent document, so no data duplication was required.

However, to achieve the design goals of using a NoSQL database, you will sometimes need to cache data that actually exists elsewhere within your data model.

Caching by-definition requires data duplication.

We will talk more about this in our next post... How to use embedding to cache duplicated information.

For more, see our next post...

Monday, December 5, 2016

Why Schema?

Modern NoSQL offerings commonly promote themselves as being "schema-less", as if this is a great feature.

For most use cases, though, there is at very least an "implied" schema for the information stored. Just because the schema is not enforced does not mean that it does not exist.

From the standpoint of 1Schema.com, we look at modern NoSQL offerings as instead providing "optional enforcement of schema".

For example, think of using MongoDB out of the box (no enforcement) vs. using MongoDB with Mongoose ODM (almost SQL-like enforcement).

From a flexibility stand-point, the strength of NoSQL is actually more based on its ability to embed sets of nested information (which we will discuss in future posts). Still, this embedded information probably also has an implied schema.

This is a 1Schema schema for Sales Orders.

So next time you start a NoSQL project, as yourself:

Do I care about collaborating with other engineers on the project?

Do I care about communicating what data is supposed to be stored?

Do I expect certain values to exist when I use the data?

Do I care about managing how data is duplicated within the database?

... For the last point, check back to see how 1Schema.com can help clarify how and why data is duplicated within your NoSQL database.