5 Relational (TE 1.0) vs. NoSQL (TE 2.0)

Introduction: Relational is no Longer the Default MO

Whereas the coding (programming) side of system development has witnessed more or less constant innovation over the years, the data management side of things has long been dominated by the relational model, originally developed by E.F. Codd in the 1970s while at IBM (Codd 1970, 1982). One (imperfect) indicator of the dominance and penetration of the relational model in business IT systems is the ratio of coverage the relational vs. NoSQL database technologies receive in (Business) Information System Analysis and Design textbooks. Table 1 contains an inventory of a few of those books, namely the ones sitting on the shelves of one of us. Not a scientific sample, but telling nonetheless.

Table 1: Coverage of relational vs. NoSQL in some System Analysis and Design textbooks.
Textbook Pages relational Pages NoSQL
Valacich, J.S., George, J.F., Hofer, J.A. (2015) Essentials of Systems Analysis and Design. Pearson. Chapter 9: 45 pp. 0 pp
Satzinger, J., Jackson, R., Burd, S. (2016) Systems Analysis and Design in a Changing World. CENGAGE Learning. Chapter 9 : 35 pp. 0 pp.
Tilley, S., Rosenblatt, H. (2017) Systems Analysis and Design. CENGAGE Learning. Chapter 9: 45 pp. 0 pp.
Dennis, A., Wixom, B.H., Roth, R.A. (2014) System Analysis and Design. 6th ed. John Wiley & Sons. Chapter 6 & 11: 40 pp. 1 pp.
Dennis, A., Wixom, B.H., Tegarden, D. (2012) System Analysis & Design with UML Version 2.0. John Wiley & Sons. Chapter 9: 40 pp. 2 pp.
Dennis, A., Wixom, B.H., Tegarden, D. (2015) System Analysis & Design. An Object-Oriented Approach with UML. 5th ed. John Wiley & Sons. Chapter 9: 22 pp 2 pp.

To be clear, pointing out the almost total absence of NoSQL coverage in these texts is not meant as a critique. The relational database remains the dominant work horse of transaction processing and hence, remains at the heart of business computing. But whereas for a very long time it was essentially the only commonly available viable option for all of business computing—transaction processing or not—new, equally viable and commonly available non-relational alternatives for non-transaction processing are making quick inroads.

Figure 1: Lookup structure of a key-value store (source: https://en.wikipedia.org/wiki/File:KeyValue.PNG)

Of the five texts listed in Table 1, only Dennis, Wixom & Tegarden (2012, 2015) give explicit attention to these non-relational or NoSQL options. They recognize three types: key-value stores, document stores and columnar stores. Key-value stores are databases which function like lookup tables where values to be looked up are indexed by a key as indicated by Figure 1. As the Wikipedia page on key-value stores shows, quite a few implementations are available these days. One of the better-known ones is the open-source implementation Riak. Columnar (or column-oriented) databases are databases which transpose the rows and columns of relational databases. Rows become columns and vice versa. This type of representation is meant to imply faster processing for what in the relational world would be column-oriented operations, which in the columnar database become row-oriented operations. Wikipedia contains a list of implementations of this type of database.

Which brings us to the document store, a type of NoSQL database which is most relevant for us because that is what TE 2.0 uses. Document stores, aka Document Management Systems, have quite a history. Already in the 1980s several systems were commercially available to manage an organization’s paper documents, followed by systems for managing electronic documents. However, whereas these systems typically enforced their own binary representation on documents or stored the documents in their native formats and then kept their metadata for sorting and searching, many modern NoSQL document stores store documents as text in a standard format such as XML or JSON. RavenDB and MongoDB, for example, store documents in JSON format. The fact that there typically are no constraints on the data other than these text-formatting ones is important because it implies that as long as a text complies with the format, it is considered a valid ‘document’ regardless of its information content. Hence, from the document store’s perspective, the JSON text {“foo”: “bar”} is as much a ‘document’ as a complex JSON structure representing a TeachEngineering lesson.

This notion of a document as a structured text shares characteristics with the notion of classes and objects in object-oriented programming (OOP) but also with tables in a relational database. But whereas OOP objects typically live in program memory and are stored in CPU-specific or byte-compiled binary formats, NoSQL documents exist as text. And whereas in relational databases we often distribute the information of an entity or object over multiple tables in order to satisfy normal form, in NoSQL document stores —as in key-value stores— we often duplicate data and  neither maintain nor worry about the database integrity constraints so familiar from relational databases.

So, What Gives?

One might ask why these NoSQL data stores came to the fore anyway? What was so wrong with the relational model that made NoSQL alternatives attractive? And if the NoSQL databases are indeed ‘no SQL,’[1] how does one interact with them?

Let us first say that on the issue of which of these alternatives is better, the dust has by no means settled. The web is rife with articles, statements and blog posts which offer some angle, often highly technical, to sway the argument one way or the other. Yet a few generally accepted assessments can be made (refer to Buckler (2015) for a similar comparison).

  • There are few if any functions which can be fulfilled by one that cannot be fulfilled by the other. Most if not everything that can be accomplished with a NoSQL database can be accomplished with a relational database and vice versa. However…
  • Modern, industry-strength relational databases are very good at providing so-called ‘data consistency;’ i.e., the property that everyone looking at the data sees the same thing. Whereas such consistency is particularly important in transaction processing —a bank balance should not be different when looked at by different actors from different locations, consistency is less important in non-transaction data processing. A good example would be pattern-finding applications such as those used in business intelligence or business analytics. A pattern is not stable—and hence, not a real pattern—if a single observation changes it. For example, when analyzing the 30-day log of our web server, any single request taken from the log should not make a difference. And if it does, we did not have a stable pattern anyway.

    NoSQL databases are not always equipped with the same consistency-enforcing mechanisms as the relational ones such as locking and concurrency control. For example, in the NoSQL database RavenDB, operations on individual documents are immediately consistent. However, queries in RavenDB at best guarantee ‘eventual consistency;’ i.e., the notion that, on average or eventually, different observers see the same thing. Although eventual consistency is insufficient for transaction processing, it is often sufficient in non-transactional contexts.

  • Because unlike relational databases NoSQL databases are not encumbered with 40 years of engineering consistency, their licenses are a lot less expensive, especially in cases where lots of data must be distributed over several or lots of different machines and different locations. Whereas that does not help when consistency is needed, it can weigh significantly if eventual consistency is good enough.
  • Relational databases enforce relational integrity (read: primary and foreign key constraints). NoSQL databases do not. But what if we are willing to sacrifice some automatic integrity checking for simpler application code or faster running programs? As we mentioned in the previous chapter, the TE 2.0 JSON representation of documents contains a lot(!) of duplicated data when compared with the TE 1.0 representation, and yet, the 2.0 system runs faster and the codes are less complicated. Of course, making a correction to all that duplicated data would require more work, but if these corrections are rare and can be accomplished without service interruption…
  • NoSQL databases tend to be light weight; i.e., no complicated and distributed client-server systems must be set up (hence, their cost advantage). However, this does not imply that complicated systems cannot be built with these systems. On the contrary, NoSQL databases are often used in highly distributed systems with multiple levels of data duplication, and complex algorithms must often be devised to retrieve information from these various locations to then be locally aggregated.
  • Relational databases have predefined schemas meaning that only specific types of data can be stored in predefined tables. NoSQL databases do not have this constraint. Referring to the JSON databases such as RavenDB and MongoDB again, one is free to store the {“foo”: “bar”} document alongside a complex TeachEngineering lesson. One can easily conceive of some (dis)advantages either way.
  • So-called joins; i.e., querying data across multiple tables in a single SQL query, are the bread and butter of relational databases. NoSQL databases do not have joins. Of course, not having joins while having a database in normal form would mean a lot of extra programming, but since in a noSQL database we are more than welcome to ignore normal form, not having joins does not have to be a problem.[2]
  • Relational databases traditionally scale vertically by just adding more CPU’s, memory and storage space. NoSQL databases scale horizontally by adding more machines.
  • Querying. Whereas SQL is the universal language for interacting with relational databases, no such language is available for NoSQL databases. Since NoSQL is not a standard, there is no standard querying protocol. Therefore, most NoSQL implementations have their own syntax and API. However, because of SQL’s installed base and popularity, some NoSQL databases offer SQL-like querying protocols. Moreover, since many software applications —databases or not— offer some sort of data retrieval mechanism, efforts to develop supra or ‘über’ query languages are underway. We have already seen one of these in the previous chapter, namely Yahoo Query Language (YQL), a SQL-like language which can be used to query across Yahoo services. Similarly, Microsoft has developed its Language INtegrated Query (LINQ), a SQL-like language for programmatic querying across its .NET products.

We also must stress that as relational and NoSQL technologies evolve, the lines between them are blurring. For instance, PostgreSQL, a mature and popular open-source relational database supports JSON as a first-class data type. This affords developers the option of using both relational and NoSQL patterns where appropriate in the same application without needing to select two completely separate database management systems.

TE 1.0: XML/Relational

In this section we discuss the XML/relational architecture used in TE 1.0. If you just want to learn about the TE 2.0 JSON/NoSQL version, feel free to skip this section. However, this section contains some concepts and ideas which you might find interesting, and it assists a better understanding of the relational-NoSQL distinctions.

When we designed TE 1.0 in 2002 NoSQL databases were just a research topic, and using a relational database as our main facility for storing and retrieving data was a pretty much a forgone conclusion. What was nice at the time too was the availability of MySQL; a freely available and open-source relational database.

We had also decided to use XML as the means for storing TeachEngineering content; not so much for fast querying and searching, but as a means to specify document structure and to manage document validity (refer to Chapter 3 for details on XML and validity checking). Consequently, we needed a way to index the content of the documents as written by the curriculum authors into the MySQL relational database.

At the time that we were designing this indexing mechanism, however, the structure of the TeachEngineering documents was still quite fluid. Although we had settled on the main document types (curricular units, lessons and activities) and most of their mandatory components such as title, summary, standard alignments, etc., we fully expected that components would change or be added in the first few years of operations. Moreover, we considered it quite likely that in the future entirely new document types might have to be added to the collection.[3] As mentioned above, however, relational databases follow a schema and once that schema is implemented in a table structure and their associated integrity constraints set up, and once tables fill with data and application codes are written, database schema changes become quite burdensome. This then created a problem. On the one hand we knew that a relational database would work fine for storing and retrieving document data, yet the documents’ structure would remain subject to changes in the foreseeable future and relational databases are not very flexible in accommodating these changes.

After some consideration, it was decided to implement an auto-generating database schema; i.e., implement a basic database schema which, augmented with some basic application code, could auto-generate the full schema (Reitsma et al., 2005). With this auto-generation concept we mean that database construction; i.e., the actual generation of the tables, integrity constraints and indexes occurs in two steps:

  • The first step consists of a traditional, hardwired schema of meta (master) tables which are populated with instructions on how the actual document database must be constructed.
  • In a second step, a program extracts the instructions from the meta tables, builds the data tables accordingly and then processes the documents, one by one, to index them into those data tables.

This approach, although a little tricky to set up, has the advantage that the entire document database schema other than a small set of never-changing meta tables, is implemented by a program and requires no human intervention; i.e., no SQL scripts for schema generation have to be written, modified or run. Better still, when structural changes to the database are needed, all we have to do is change a few entries in the meta tables and let the program generate a brand-new database while the existing production database and system remain operational. When the application codes that rely on the new database structure are ready, just release both those codes and the new database and business continues uninterrupted with a new database and indexing structure in place.

Data Tables

The following discusses a few of the data tables in the TE 1.0 database:

  • Although the various document types have certain characteristics in common; e.g., they all have a title and a summary, the differences between them gave sufficient reason to set up document-type specific data tables. Hence, we have a table which stores activity data, one which stores lesson data, one which stores sprinkle data, etc. However, since we frequently must retrieve data across document types; e.g., ‘list all documents with target grade level x,’ it can be beneficial to have a copy of the data common to these document types in its own table. This, of course, implies violating normal form, but such denormalization can pay nice dividends in programming productivity as well as code execution speed.
  • K-12 standards have a table of their own.
  • Since the relationship between TeachEngineering documents and K-12 educational standards is a many-to-many one, a document-standard associative table is used to store those relationships. Foreign key constraints point to columns in the referenced tables.
  • Since the TeachEngineering collection consists of several hierarchies of documents; for instance, a curricular unit document referring to several lessons and each lesson referring to several activities (see Chapter 1), we must keep track of this hierarchy if we want to answer questions such as ‘list all the lessons and activities of unit x.’ Hence, we keep a table which stores these parent-child relationships.
  • TeachEngineering documents contain links to other TeachEngineering documents and support materials as well as links to other pages on the web. In order to keep track of what we are pointing to and of the status of those links, we store all of these document-link relationships in a separate table.
  • Several auxiliary tables for keeping track of registered TeachEngineering users, curriculum reviews, keywords and vocabulary terms, etc. are kept as well.

Meta tables

To facilitate automated schema generation, two meta tables, Relation and Types were defined.

Relation table (definition)
Field Type Null Key Default
id int(10) unsigned NO PRI NULL
groupname varchar(100) NO
component varchar(100) NO
Relation table (sample records)
id Groupname component
22 Activity cost_amount
6 Activity edu_std
72 Activity engineering_connection
18 Activity grade_target
21 Activity keywords
5 Activity summary
70 Activity time_in_minutes
4 Activity title
52 child_document link_text
50 child_document link_type
49 child_document link_url
46 Vocabulary vocab_definition
45 Vocabulary vocab_word
Types table (definition)
Field Type Null Key Default
id int(10) unsigned NO PRI NULL
name varchar(100) NO
expression varchar(250) NO
cast enum(‘string’,’number’,’group’,’root’) NO string
nullable enum(‘yes”no’) YES YES NULL
datatype varchar(50) YES NULL
Types table (sample records)
id name expression cast nullable datatype
19 child_document /child_documents/link group NULL NULL
24 cost_amount /activity_cost/@amount number yes float
25 cost_unit /activity_cost/@unit string yes varchar(100)
6 edu_std /edu_standards/edu_standard group NULL NULL
1 edu_std_id /@identifier string yes varchar(8)
33 engineering_category /engineering_category_TYPE/@category string yes varchar(250)
14 grade_lowerbound /grade/@lowerbound number yes int(10)
13 grade_target /grade/@target number no int(10)
15 grade_upperbound /grade/@upperbound number yes int(10)
26 keywords /keywords/keyword group NULL NULL
27 keyword string yes Varchar(250)

Records in the Relation table declare the nesting (hierarchy) of components. For instance, an activity has a title, a summary, etc. Similarly, any reference to a child document has a link_text, a_link type and a link_url. Note that this information is similar to that contained in the documents’ XML Schema (XSD). Essentially, the Relation table declares all the needed data tables (groupname) and those tables’ columns (component).

The Types table in its turn declares for each component its datatype, nullability as well as an XPath expression to be used to extract its content from the XML document. For example, the cost associated with an activity (cost_amount) is a float, can be null (yes) and can be extracted from the XML document with the XPath expression /activity_cost/@amount. Similarly, the grade_target of a document is an int(10), cannot be null (no) and can be extracted with the XPath expression /grade/@target. Note that array-like datatypes such as edu_standards are not a column in a table. Instead, they represent a list of individual standards just as keywords is a list of individual keywords. They have an XPath expression but no datatype.[4]

Between the Relation and the Types tables, the entire data schema of the database is declared and as a side effect, for each column in any of the tables, we have the XPath expression to find its document-specific value. Hence, it becomes relatively straightforward to write a program which reads the content from these two tables, formulates and runs the associated SQL create table statements to generate the tables, uses the XPath expressions to extract the values to be inserted into these tables from the XML documents, and then uses SQL insert statements to populate the tables.

What is particularly nice about this approach is that all that is needed to integrate a brand new document type into the collection is to add a few rows to the Types and Relation tables. The auto-generation process takes care of the rest. Hence, when in 2014 a new so-called sprinkle document type was introduced —essentially an abbreviated activity— all that had to be done was to add the following rows to the Types and Relation tables:

Records added to the Relation table to store sprinkle document data
id groupname component
84 sprinkle Title
86 sprinkle total_time
87 sprinkle sprinkle_groupsize
88 sprinkle total_time_unit
89 sprinkle grade_target
90 sprinkle grade_lowerbound
91 sprinkle grade_upperbound
93 sprinkle sprinkle_cost_amount
96 sprinkle Link
97 sprinkle time_in_minutes
98 sprinkle engineering_connection
103 sprinkle Summary
104 sprinkle dependency
105 sprinkle translation
Records added to the Types table for storing sprinkle document data
id name expression Cast nullable datatype
34 sprinkle /sprinkle Root NULL NULL
35 sprinkle_groupsize /sprinkle_groupsize Number yes int(10)
36 sprinkle_cost_amount /sprinkle_cost/@amount Number yes float

Adding these few records resulted in the automatic generation of a sprinkle table with the requisite columns, their declared datatypes and nullabilities. Extraction of sprinkle information from the sprinkle XML documents to be stored in the tables was done automatically through the associated XPath expressions. Not a single manual SQL create table or alter table statement had to be issued, and no changes to the program which generates and populates the database had to be made.

TE 2.0: JSON/NoSQL

The process described in the previous section worked fine for almost 13 years during which time it accommodated numerous changes to the documents’ structure such as the addition of the sprinkle document type. During those years the collection grew from a few hundred to over 1,500 documents. Yet when the decision was made to rebuild the system, the architectural choice of having a separate store of XML-based documents to be indexed into a relational database was questioned in light of the newly available NoSQL document stores. Why not, instead of having two more or less independent representations of the documents (XML and relational database), have just one, namely the document database; i.e., a database which houses the documents themselves. If that database of documents can be flexibly and efficiently searched, it would eliminate a lot of backend software needed to keep the document repository and the database in sync with each other. Better still, when rendering documents in a web browser one would not have to retrieve data from both sources and stitch it all together anymore. Instead, it could just come from a single source.

To illustrate the latter point consider the way a document was rendered in TE 1.0. Figure 2 shows a section of an activity on heat flow. In the Related Curriculum box the activity lists its parent, the Visual Art and Writing lesson. This ‘parental’ information, however, is not stored on the activity itself because in TeachEngineering, documents declare their descendants but not their parents. In TE 1.0 we rendered this same information; i.e., the lesson from which the activity descended. However, whereas in TE. 1.0 most of the rendered information came directly from the XML document, the document’s parent-child information was retrieved from the database. Hence, two independent sources of information had to be independently queried and retrieved, each across networks and different computers, only to then be stitched together into a single HTML Web page.

We could have, in TE 1.0, stored not just the typical lookup information of documents in the database; e.g., title, target grade, required time, summary, keywords, etc., but also the entire text of documents. Had we done that we would have only had to access a single source of information for rendering. Except… we did not. In hindsight, perhaps, we should have?

Fast forward to 2015, TE 2.0 and the availability of NoSQL document databases such as MongoDB and RavenDB. Now, we no longer have to separate document content from document searching since the basic data of these databases are the documents themselves. Hence, we have everything we want to know about these documents in a single store. In addition, these databases, partly because their data structures are so simple (no multi-table normalized data structures and no referential constraints), are really fast!

Figure 2: Heat flow activity lists Related Curriculum

Earlier in this chapter, we discussed how RavenDB provides eventual consistency for document queries. Clearly, consistency is something that requires careful consideration when designing an application. In a highly transactional application, receiving out-of-date data from queries could be quite problematic. For instance, a query on a bank account or credit card balance or on the number of available seats on a future airplane ride must be correct at all times. But for a system such as TeachEngineering, eventual consistency is just fine. Curriculum documents do not change that often, and even if they do, it is perfectly acceptable if queries return a previous version for a (limited) period of time. Similarly, when a new document is made available it would be perfectly acceptable if that document is not immediately available everywhere in the world. Moreover, due to the relatively small number of documents stored in TE 2.0’s RavenDB database, the’ eventual’ in ‘eventual consistency’ means in practice that queries return up-to-date results in a matter of seconds after a change to is made.

Some Practice with two JSON Document Stores: RavenDB and MongoDB

In the remainder of this chapter we work through some practice examples running against two well-known JSON document stores: RavenDB and MongoDB. We will run the same examples against both databases so that you acquire a feeling how interacting with them is different, yet how the concepts behind them in terms of non-relational JSON storage are quite similar if not identical. For both document stores we first use a very simple example of four very simple JSON documents. The example (swatches) is taken from MongoDB’s documentation pages. Next, we run a more realistic (but still very small) example of six TeachEngineering so-called ‘sprinkle’ documents. Sprinkles are abbreviated versions of TeachEngineering activities.

Exercise 5.1: A Little RavenDB

One of the better-known and freely available JSON document databases is RavenDB. RavenDB was written for .NET[5] but provides other ways for interacting with it as well.

We will install RavenDB locally. Since RavenDB (as of version 3.5.3) is available for Windows only, we assume a Windows machine.

  • Download the installer from https://ravendb.net/downloads and run it. This installs RavenDB and starts the RavenDB server (the RavenDB server will be restarted at subsequent reboots as well).

For our example, take a look at the following JSON array containing four swatch documents[6]. Each document represents a fabric swatch with the following characteristics:

  • name
  • qty (quantity)
  • tags, each having:
    • string describing the material; e.g., cotton or wool
    • size containing:
      • h (height)
      • w (width)
      • uom (unit of measurement)
[
{ "name": "cotton_swatch", "qty": 100, "tags": ["cotton"], "size": { "h": 28, "w": 35.5, "uom": "cm" } },
{ "name ": "wool_swatch", "qty": 200, "tags": ["wool"], "size": { "h": 28, "w": 35.5, "uom": "cm" } },
{ "mame ": "linen_swatch", "qty": 300, "tags": ["linen"], "size": { "h": 28, "w": 35.5, "uom": "cm" } },
{ "name ": "cotton_swatch", "qty": 100, "tags": ["cotton"], "size": { "h": 50, "w": 50, "uom": "cm" } }
]

These are the four ‘documents’ which we will programmatically store and query in RavenDB.

We will be using Visual Studio and the C# language.[7][8] Our program will communicate with RavenDB over HTTP at http://localhost:8080.[9]

For this example we will be using the standard (de)serialization approach; i.e., we will hold the documents in objects of the Swatch class in our C# program and then programmatically store them in RavenDB. RavenDB will store them as JSON strings (documents). After they are stored we run a few queries against the database.

In order to hold the swatches as objects, we need a class definition for them. We can, of course, figure this out by ourselves, but let us use the services of others and have http://json2csharp.com/ do it for us. Simply enter the above JSON array in and click Generate. The result:

public class Size
{
  public int h { get; set; } //we will use a ‘decimal’ instead
  public double w { get; set; } // we will use a ‘decimal’ instead
  public string uom { get; set; }
}

public class RootObject //we will use ‘Swatch’ instead
{
  public string name { get; set; }
  public int qty { get; set; }
  public List<string> tags { get; set; }
  public Size size { get; set; }
}

With these class definitions we can write our Raven program. However, in order for our program to interact with RavenDB, we must first install the required RavenDB Client package on our machine:

  • Project –> Manage NuGet Packages…
  • Find RavenDB.Client
  • Click Install
  • Click OK

Here is the (console) C# program:

using System;
using System.Collections.Generic;
using System.Linq;
using Raven.Client.Indexes;
using Raven.Client.Document;

namespace RavenDB
{
  public class Swatch //Define the Swatch class
  {
    public string Id { get; set; }
    public string Name { get; set; }
    public int Qty { get; set; }
    public List<string> Tags { get; set; }
    public Size Size { get; set; }
  }

  public class Size //Define the Size class (every Swatch has a Size)
  {
    public decimal W { get; set; }
    public decimal H { get; set; }
    public string Uom { get; set; }
  }

  public class SwatchIndex : AbstractIndexCreationTask<Swatch>
  {
    // Method for creating the index. The index is RavenDB’s catalog of items it uses to conduct retrievals/searches
    public SwatchIndex()
    {
      Map = items => from item in items
        select new
        {
          item.Name,
          item.Qty,
          item.Tags,
          item.Size.H,
          item.Size.W,
          item.Size.Uom
        };
    }
  }

  //Program which interacts with RavenDB
  class Program
  {
    //inputs or retrieves JSON data from Raven
    static void Main(string[] args)
    {
      var Items = new List<Swatch> //Create a list of Swatches
      {
        new Swatch { Id = "items/1", Name = "cotton_item", Qty = 100, Tags = new List<string> { "cotton" }, Size = new Size { H = 28, W = 35.5m, Uom = "cm"}},
        new Swatch { Id = "items/2", Name = "wool_item", Qty = 200, Tags = new List<string> { "wool" }, Size = new Size { H = 28, W = 35.5m, Uom = "cm" }},
        new Swatch { Id = "items/3", Name = "linen_item", Qty = 300, Tags = new List<string> { "linen" }, Size = new Size { H = 28, W = 35.5m, Uom = "cm" }},
        new Swatch { Id = "items/4", Name = "cotton_item", Qty = 100, Tags = new List<string> { "cotton" }, Size = new Size { H = 50, W = 50, Uom = "cm" }}
      };

      var store = new DocumentStore //Make a new document store
      {
        Url = "http://localhost:8080", //We will communicate with Raven over HTTP at localhost:8080
        DefaultDatabase = "foo" //We will communicate with database ‘foo’
      };

      store.Initialize(); //Initialize the store
      // This next statement is not appropriate for production use, but is used to ensure that the index is up to date before returning query results
      store.Conventions.DefaultQueryingConsistency = ConsistencyOptions.AlwaysWaitForNonStaleResultsAsOfLastWrite;

      new SwatchIndex().Execute(store); //Run index creation

      using (var session = store.OpenSession()) //Open the store's session
      {
        foreach (var item in Items) //Loop through the Items
        {
          session.Store(item); //Store the item in the session
        }
        session.SaveChanges(); //Save all pending changes

        //Query for all swatches named "cotton_item" and store them in a List
        var cottonSwatches = session
          .Query<Swatch, SwatchIndex>()
          .Where(x => x.Name == "cotton_item")
          .ToList();

        Console.WriteLine("Swatches with name 'cotton_item':");
        foreach (var swatch in cottonSwatches) //Write them out
        {
          Console.WriteLine(swatch.Id);
        }

        //Query for all swatches with a Quantity > 100 and store them in a List
        var highQuantitySwatches = session
          .Query<Swatch, SwatchIndex>()
          .Where(x => x.Qty > 100)
          .ToList();

        Console.WriteLine("Swatches with quantity > 100:");
        foreach (var swatch in highQuantitySwatches) //Write them out
        {
          Console.WriteLine(swatch.Id);
        }

        Console.WriteLine("Hit a key");
        Console.ReadKey();
      }
    }
  }
}

So how does all this work?

  • We first define the classes Swatch and Size in accordance with the class structures returned from http://json2csharp.com/.
  • We then define an index for swatches: SwatchIndex. In RavenDB an index specifies the associations of the properties documents with those documents so that at some point in the future we can query them. For instance, if we ever want to find documents that have a certain Name or Qty, we must build an index for those properties. Generating the index directs RavenDB to extract the specified properties from each document and to store them to be queried later. Internally, RavenDB uses Lucene.Net, an open-source search engine library, to do its indexing and querying.
  • We then load up a list of Swatch objects (called Items).
  • Each of the Swatches in the Items list is then sent to RavenDB for storage.
    • Note that we do not(!) send the Swatches as JSON structures to RavenDB. We could do that too, but in this case we let RavenDB figure out how to make JSON structures from the Swatch objects and store them.
      foreach (var item in Items) //Loop through the Items
      {
        session.Store(item); //Store the item in the session
      }
  • After storing the Swatches in RavenDB, we formulate a few queries and send them to RavenDB for retrieval. The results are printed to the console.

    The queries take a form which may look a little odd:

    var cottonSwatches = session
      .Query<Swatch, ItemIndex>()
      .Where(x => x.name == "cotton_item")
      .ToList();

    and

    var highQuantitySwatches = session
      .Query<Swatch, ItemIndex>()
      .Where(x => x.qty > 100)
      .ToList();

    These queries use RavenDB’s Query method, which supports .NET’s Language INtegrated Query or LINQ. LINQ has a conceptual similarity with SQL (select … from … where …), but it is, of course, not at all relational. Instead, it is a general-purpose language for querying a great variety of data structures. In Microsoft’s own words: “.NET Language-Integrated Query defines a set of general purpose standard query operators that allow traversal, filter, and projection operations to be expressed in a direct yet declarative way in any .NET-based programming language.“

Note, by the way, that when you run this program several times in a row, the number of swatches stored in RavenDB remains the same. This is because we explicitly specify a value for the Id property of each swatch. By convention, RavenDB treats the Id property as the unique identifier for each document (similar to a primary key in a relational database table). Hence, each time you run the program the existing documents are overwritten. Alternatively, if the Id property was not specified for each document, RavenDB would automatically generate one. Each subsequent run of the program would then add new versions of the documents to the database, each with a unique value for the Id property.

Exercise 5.2: A Little MongoDB

Another NoSQL/JSON document database we can practice with is the open source NoSQL database MongoDB.

We will install the MongoDB server locally. We assume a Windows machine, but installs for other platforms are available as well.

  • Download the MongoDB Server from https://www.mongodb.com/download-center?jmp=nav#community
    • Run the downloaded installation script (Select the Complete (not the Custom) version and uncheck the Install Compass option).
  • Find the folder/directory where MongoDB has been installed (typically c:\Program Files\MongoDB) and navigate to its \Server\ version_no\bin folder where the executables mongod.exe and mongo.exe are stored.
    • mongod.exe starts the server instance.
    • mongo.exe starts a mongo command shell through which we can send commands to mongod.
  • By default, MongoDB relies on being able to store data in the c:\data\db folder. Either create this folder using the Windows File Browser, or pop up a Windows command line and run the command:
    mkdir c:\data\db
  • If you have not already done so, run mongod.exe to start the MongoDB server.

We will communicate with MongoDB in two modes: first, by sending it commands directly using its command shell and then programmatically using C#.

Exercise 5.3: MongoDB Command Shell

  • Run mongo.exe to start a MongoDB command shell. Enter all commands mentioned below in this command shell).

As in RavenDB, MongoDB structures its contents hierarchically in databases, collections within databases and records (called ‘documents’) within collections.

  • Start a new database foo and make it the current database:
    use foo
  • Create a collection fooCollection in database foo:
    db.createCollection("fooCollection")

Now we have a collection, we can add documents to it. We will use the swatches documents again (they came directly from MongoDB’s own documentation pages).

  • Add four documents to fooCollection:
db.fooCollection.insertOne( { "swatch": "cotton_ swatch", "qty": 100, "tags": ["cotton"], "size": { "h": 28, "w": 35.5, "uom": "cm" } })
db.fooCollection.insertOne( { "swatch": "wool_ swatch", "qty": 200, "tags": ["wool"], "size": { "h": 28, "w": 35.5, "uom": "cm" } })
db.fooCollection.insertOne( { "swatch": "linen_ swatch", "qty": 300, "tags": ["linen"], "size": { "h": 28, "w": 35.5, "uom": "cm" } })
db.fooCollection.insertOne( { "swatch": "cotton_ swatch", "qty": 100, "tags": ["cotton"], "size": { "h": 50, "w": 50, "uom": "cm" } })

Now let us start querying foo.fooCollection

  • Let us see what we have in the collection. Since we pass no criteria to the find() method, all documents in fooCollection are returned:
    db.fooCollection.find()
  • Which swatches made of cotton do we have?
    db.fooCollection.find( { "tags": "cotton" } )
  • Of which swatches do we have more than 100?
    db.fooCollection.find( { "qty": { $gt: 100 } } )

Exercise 5.4: MongoDB Programmatic Interaction

Now that we have played with command-driven MongoDB, we can try our hand at having a program issue the commands for us. We will use C# as our language, but MongoDB can be programmatically accessed from other languages as well.

A note before we start. When you google for C# examples of MongoDB interactions, you will find a lot of code with the async and await keywords. These codes are typically written for applications which will asynchronously process MongoDB queries. With ‘asynchronous’ we mean that these applications will not sequentially queue up commands, executing them in order and waiting for one to come back before the next one is sent. Instead, they send commands to MongoDB whenever they need to and process the replies in any order in which they come back. This approach makes a lot of sense in production environments where lots of queries must be issued and different queries demand different amounts of time to complete. However, for our simple examples, we will just use more traditional synchronous communication. We issue a single command and wait until it returns before we send the next one.

Instead of hardcoding the JSON documents, we will pick them up from a file. Store the following JSON in the file swatches.json:

[
{ "swatch": "cotton_ swatch", "qty": 100, "tags": ["cotton"], "size": { "h": 28, "w": 35.5, "uom": "cm" } },
{ "swatch": "wool_ swatch", "qty": 200, "tags": ["wool"], "size": { "h": 28, "w": 35.5, "uom": "cm" } },
{ "swatch": "linen_swatch", "qty": 300, "tags": ["linen"], "size": { "h": 28, "w": 35.5, "uom": "cm" } },
{ "swatch": "cotton_swatch", "qty": 100, "tags": ["cotton"], "size": { "h": 50, "w": 50, "uom": "cm" } }
]

To validate the syntax of the above JSON segment, check it with an on-line JSON parser such as http://jsonparseronline.com.

We once again will be using Visual Studio and a console app to run these examples. However, in order for our program to interact with MongoDB, we must first install the required MongoDB and Json.Net packages:

  • Project –>
  • Manage NuGet Packages
  • Find and select Newtonsoft.Json, MongoDB.Driver and MongoDB.Bson
  • Click Install.
  • Click OK.

Here is the (console) C# program for inserting the four documents stored in the swatches.json file into the swatchesCollection:

using System;
using System.IO;
using MongoDB.Bson;
using MongoDB.Driver;
using Newtonsoft.Json.Linq;

namespace mongodb_c_sharp
{
  class Program
  {
    // This code assumes that the swatches.json file is stored at C:\swatches.json.
    // If you stored it elsewhere, just change the jsonFilePath string below
    const string jsonFilePath = "C:\\swatches.json";

    // We will use the “swatches” database
    const string dbName = "swatches";

    // We will use the “swatchesCollection” collection
    const string collectionName = "swatchesCollection";

    static void Main(string[] args)
    {
      // Open a connection to mongoDB. We are assuming localhost
      string connectString = "mongodb://localhost";

      MongoClient client = null;
      try { client = new MongoClient(connectString); }
      catch
      {
        Console.WriteLine("Error connecting to MongoDB. Is mongod.exe running?");
        Environment.Exit(1); // Exit on failed connection
      }

      // Attach to the swatches database (create it if it does not yet exist)
      IMongoDatabase database = client.GetDatabase(dbName);

      // Get the swatchesCollection collection
      var collection = database.GetCollection<BsonDocument>(collectionName);

      // Read the JSON file into a string
      StreamReader reader = new StreamReader(jsonFilePath);
      string json_str = reader.ReadToEnd();

      // Load the JSON into a JArray
      JArray arr = JArray.Parse(json_str);

      // Loop through the JArray and insert each of its documents into the database
      foreach (JObject obj in arr)
      {
        BsonDocument doc = BsonDocument.Parse(obj.ToString());
        collection.InsertOne(doc);
      } // end of foreach

    } // end of Main()
  } // end of Program class
} // end of Namespace

If you still have your MongoDB command shell open (if not, just start it again —see previous section on how to do this), let us see what the program accomplished:

use swatches

db.swatchesCollection.find()

Next, we run the program which queries swatchesCollection for those swatches of which we have more than 100 (qty > 100). Here is the (console) program:

using System;
using MongoDB.Bson;
using MongoDB.Driver;

namespace mongodb_c_sharp
{
  class Program
  {
    // We will use the "swatches" database
    const string dbName = "swatches";

    // We will use the "swatchesCollection" collection
    const string collectionName = "swatchesCollection";
    static void Main(string[] args)
    {
      // Open a connection to mongoDB. We are assuming localhost
      string connectString = "mongodb://localhost";

      MongoClient client = null;
      try { client = new MongoClient(connectString); }
      catch
      {
        Console.WriteLine("Error connecting to MongoDB. Is mongod.exe running?");
        Environment.Exit(1); // Exit on failed connection
      }

      // Attach to the swatches database
      IMongoDatabase database = client.GetDatabase(dbName);

      // Get the swatchesCollection collection
      var collection = database.GetCollection<BsonDocument>(collectionName);

      // Find swatches with “qty” > 100
      // First, build a filter
      var filter = Builders<BsonDocument>.Filter.Gt("qty", 100);
      // Then apply the filter to querying the collection
      var resultList = collection.Find(filter).ToList();

      // Write the results to the console
      foreach (var result in resultList)
        Console.WriteLine(result);

      Console.WriteLine("Hit a key");
      Console.ReadKey();

    } // end of Main()
  } // end of Program class
} // end of Namespace

Exercise 5.5: MongoDB Programmatic Interaction: TeachEngineering Example – Sprinkles

Now we have run through a basic (swatches) example, we can apply the same approach to a small sample of more complex but conceptually identical TE 2.0 JSON documents. Sprinkles are abbreviated versions of TeachEngineering activities. Our sample contains six TE 2.0 sprinkle documents and is stored at http://faculty.bus.oregonstate.edu/reitsma/sprinkles.json.

Download the content of http://faculty.bus.oregonstate.edu/reitsma/sprinkles.json and store it in a file called sprinkles.json.[10] Please take a moment to study the content of the file. Notice that it is a JSON array containing six elements; i.e., six sprinkles. Also notice that although the elements are all sprinkle documents and they all have the same structure, their elements are not always specified in the same order. For instance, whereas the third and following documents start with a Header element followed by a Dependencies element, the first two documents have their Header specified elsewhere.

First, just like in the swatches case, we will programmatically import the data. We use the exact same program as we used for loading the swatches data, except for some changes to the following constants:

  • Set the const dbName to sprinkles.[11]
  • Set the const collectionName to sprinklesCollection.
  • Set the const jsonFilePath to c:\\sprinkles.json.

Assuming that your MongoDB server (mongod.exe) is running, running the program will load the six sprinkles into the sprinkleCollection.

Now the reverse: programmatically finding the number of sprinkles for which the required time >= 50 minutes (Time.TotalMinutes >= 50). Again, we use pretty much the exact same program as we used for retrieving swatches data, except for a few simple changes:

  • Set the const dbName to sprinkles.
  • Set the const collectionName to sprinklesCollection.
  • Change the line which defines the query filter as follows:
    var filter = Builders<BsonDocument>.Filter.Gte("Time.TotalMinutes", 50);
  • Remove (comment out) the foreach() loop and replace it with the following:
    Console.WriteLine(resultList.Count);
  • The result should be 5 (check it manually against the sprinkles.json file).

Exercise 5.6: RavenDB Programmatic Interaction: TeachEngineering Example — Sprinkles

Let us now repeat the sprinkle example we just ran against MongoDb, but this time run it against RavenDB. When you look through the program you will see that unlike what we did in our earlier (swatches) example, this time we do not generate a search index prior to searching. Instead, we follow the MongoDB example above and search through the documents in real-time without such an index. Of course, in a production environment, generating the indexes ahead of time would save (significant) search time, but for our case here we tried keeping the MongoDB and RavenDB examples similar.

using System;
using System.IO;
using Raven.Client.Document; // Do not forget to add the RavenDB NuGet package!
using Raven.Json.Linq;

namespace RavenDB
{
  // Program which interacts with RavenDB using a few TeachEngineering 'sprinkle' documents
  class Program
  {
    //inputs or retrieves JSON data from Raven
    static void Main(string[] args)
    {
      // This code assumes that the sprinkles.json file is stored at C:\sprinkles.json
      // If you stored it elsewhere, change the jsonFilePath string below
      const string jsonFilePath = "C:\\sprinkles.json";

      var store = new DocumentStore // Make a new document store
      {
        Url = "http://localhost:8080", // We will communicate with Raven over HTTP at localhost:8080
        DefaultDatabase = "sprinklesCollection" // We will communicate with database ‘sprinklesCollection’
      };

      store.Initialize(); //Initialize the store

      // This next statement is not appropriate for production use,
      // but is used to ensure that the index is up to date before returning query results
      store.Conventions.DefaultQueryingConsistency =
            ConsistencyOptions.AlwaysWaitForNonStaleResultsAsOfLastWrite;

      // Read the JSON file into a string
      StreamReader reader = new StreamReader(jsonFilePath);
      string json_str = reader.ReadToEnd();

      // Load the JSON into a JArray
      var arr = RavenJArray.Parse(json_str);

      // Loop through the JArray and insert each of its documents into the database
      foreach (RavenJObject obj in arr)
      {
        store.DatabaseCommands.Put(null, null, obj, null);
      } // end of foreach

      // Query for sprinkles greater than or equal to 50 minutes in length
      // Note that unlike in the swatches example, we do not create an explicit index ahead of time.
      // RavenDB will automatically create an index for us. This works for ad-hoc queries,
      // but in a typical application it is best to create the indexes ahead of time.
      using (var session = store.OpenSession())
      {
        var results =
        session.Advanced.DocumentQuery<object>().WhereGreaterThanOrEqual("Time.TotalMinutes", 50);
        Console.WriteLine(results.Count());
      }

      Console.WriteLine("Hit a key");
      Console.ReadKey();
    }
  }
}

Once again, running this program should result in a search result of ‘5.’

Summary and Conclusion

In this chapter we took a look at how TeachEngineering evolved from a system with a relational database at its core and XML representing documents (TE 1.0), to a system running of off a NoSQL database with JSON representing documents (TE 2.0). Whereas the relational/XML model worked fine during the 12 years of TE 1.0, the new NoSQL/JSON alternative provides the advantage that JSON is used as both the document and the database format. This unification of representation significantly reduces system complexity from a software engineering point of view.

We also argued that whereas the property of ‘consistency’ which is well entrenched in industry-strength relational databases, is a crucial importance in transactional settings, ‘eventual consistency’ is plenty good for an application such as TeachEngineering. As such, we can forgo much of the concurrency control facilities built into relational databases and instead rely on a less advanced but also much less expensive and easier to maintain NoSQL database.

Similarly, since the data footprint for a system such as TeachEngineering is quite small ─the current size of the RavenDB database is a mere 170 MB─ replicating some data in a controlled and managed way is quite acceptable. This again implies that we can forego the mechanisms for adhering to and maintaining strict normal form, and that in turn implies that we do not need sophisticated data merge and search methods such as relational table joins.

None of the NoSQL material we have discussed and practiced here deters from the value and utility of relational databases. For systems which require ‘consistency’, relational databases with their sophisticated built-in concurrency controls continue to be a good and most often the best choice. But if ‘eventual consistency’ is good enough then you should at least consider a modern NoSQL database a possible candidate for storing your data.

References

Buckler, C. (2015) SQL vs. NoSQL: The Differences. Available: https://www.sitepoint.com/sql-vs-nosql-differences/. Accessed: 12/2016.

Codd, E. F. (1970) A Relational Model of Data for Large Shared Data Banks. Communications of the ACM. 13. 377-387.

Codd, E. F. (1982) Relational database: A Practical Foundation for Productivity. Communications of the ACM. 25. 109-117.

Dennis, A., Wixom, B.H., Roth, R.A. (2014) System Analysis and Design. 5th ed. John Wiley & Sons.

Dennis, A., Wixom, B.H., Tegarden, D. (2012) System Analysis & Design with UML Version 2.0. John Wiley & Sons.

Dennis, A., Wixom, B.H., Tegarden, D. (2015) System Analysis & Design. An Object-Oriented Approach with UML. 5th ed. John Wiley & Sons.

Reitsma, R., Whitehead, B., Suryadevara, V. (2005) Digital Libraries and XML-Relational Data Binding. Dr. Dobbs Journal. 371. 42-47.

Satzinger, J., Jackson, R., Burd, S. (2016) Systems Analysis and Design in a Changing World. CENGAGE Learning.

Tilley, S., Rosenblatt, H. (2017) Systems Analysis and Design. CENGAGE Learning.

Valacich, J.S., George, J.F., Hofer, J.A. (2015) Essentials of Systems Analysis and Design. Pearson.


  1. Some interpret the term NoSQL as ‘No SQL.’ Others interpret it as ‘Not Only SQL.’
  2. Although standard instructional texts on relational database design emphasize design for normal form, practitioners frequently denormalize (parts of) a relational database for the same reason; namely to write easier and/or faster code.
  3. Although much later than originally expected, in 2014 a brand-new document type, the so-called ‘sprinkle’ was indeed added.
  4. XPath is a language for extracting content from XML pages.
  5. At the time of this writing RavenDB 4.0 is in beta and supports Linux.
  6. Swatch: a sample or example item; typically a sample piece of fabric or other type of material.
  7. Visual Studio is Microsoft’s .NET Integrated Development Environment (IDE). Although it is available in several configurations, for these examples we will be using the freely available Community Edition (2015/17).
  8. There is a public RavenDB playground available for testing and experimentation. It can be accessed at http://live-test.ravendb.net/. For more information, visit https://ravendb.net/docs/article-page/3.5/csharp/start/playground-server.
  9. The RavenDB server has its own HTTP server listening at localhost:8080. You do not have to install a separate web server. You can visit this address in your web browser to see the management interface for RavenDB.
  10. We could, of course, write code which retrieves the JSON over HTTP (from the given URL) rather than first storing the JSON on the local file system. However, for the sake of being able to use pretty much the exact same code as in the previous examples, we will read the JSON from a local file and insert it into the database.
  11. We can, of course, use the swatches database to store sprinkle data, but in order to keep the examples independent of each other, we use a new database.