~~Title: StreamScape | Not Only SQL~~
~~NOTOC~~

<html><font color=#990000 size="+2"><b>Dataspaces: Not-Only SQL</b></font></html>

The Application Data Engine features a data management technology called Application Dataspaces™, that allows user to query and process massive amounts of data in-memory using a network of commodity hardware machines. Dataspaces support familar query processing mechanisms like SQL, DDL and Compose-able User Defined Functions (UDF) as well as Declarative Schema, Referential Integrity Constraints, Projections (JOINs), Cross-product and Set-difference (UNIONs) based on relational algebra and relational set theory.

Application Dataspaces also support a super-set of capabilities not found in classic relational databases, such as Event Triggers, Actors, Stream-sets, Stream Query, Reference Links and non-relational User Defined Data Types (Semantic Types) that can be queried using Dataspace Query Language extensions to standard ANSI SQL.

=== Dataspaces are a Not-Only SQL Store ===

No-SQL and NOSQL databases are characterized by their capacity to store non-tabular (document-oriented) data and the ability to so without first defining a database schema. The Application Dataspaces™ allow you to store arbitrary data such as as key-value pairs ((Also referred to as Tuples)), Objects, XML or JSON documents. Data does not need to conform to a tabular, pre-defined schema from the perspective of the database management system. Due to this schema-less nature, Streamscape supports a scale-out approach to growth, increasing data and I/O capacity by adding more servers to a cluster; and without any change to application software. By contrast, traditional relational database management systems scale up by adding more capacity including CPU, memory and disk to accommodate growth.

Pure SQL is declarative: it lets users describe what they want rather than how to compute it.  This simplifies the query language at the cost of flexibility. Data is stored into a fixed data structure (schema) and the result comes out as a fixed schema (always tabular).

Relational databases store information in relations which must be defined, or modified, before data can be stored. A relation is simply a table of rows, where each row in a given relation has a fixed set of columns (also known as a Tuple Set). These columns are consistent across each row in a relation. Tables can be further connected through cross-table references (JOINS or FOREIGN KEYS). One table could hold rows of all individual citizens residing in a town. Another table could have rows consisting of parent, child and relationship fields. The first two fields could be references to rows in the citizens table while the third field describes the parental relationship between the persons in the first two fields such as father or mother.

In contrast to pure SQL which follows the ''Relational Set Theory'', Not-Only SQL (NOSQL) databases often allow users to store data of arbitrary structure in the data store wherein there are no fixed-schema Tuple Sets.  One row may contain a discrete set of 3 entities, whereas another row may contain a discrete set of 4, 5, or more entities.  Exact means of storage and referece to the element are typically implementation-specific.

In case of Dataspaces, StreamScape's engine stores elements a discrete types in a row, conforming to Relational Set Theory, however allows the specofc type elements to be arbitrary schema objects, meaning that a data type of  Assets may be a list that containg an undetermined sequence of elements or elements that are completley dissimilar. For example a list of Assets can contain assets of type Desk, Computer and Automobile; each of which is a completley different type in itself.  This allows Dataspaces to organize data in a familiar way and lets users access and query such data using traditional Relational Algebra techniques (like ''SELECT'', ''JOIN'' or ''UNION'') as well as apply aggregate functions like SUM, AVERAGE or COUNT to similar elements; and also have full access to dissimilar elements and be able to query and process such elements using Relational Algebra techniques where appropriate.  For example a user can ''SELECT'' a row that contains a list of Asset objects by several Key Columns, then filter out similar elements from an Asset List into a Transient Table collection and perform a Standard Mean Deviation on the filetered objects.

For more on these topics have a look here at prevailing [[http://nosql-database.org/| NoSQL and NOSQL]] technologies.  A more formal definition can be found here [[https://en.wikipedia.org/wiki/NoSQL| here]].

\\

=== Dataspaces support Schema-On-Read ===

Schema on read means declarative data structure (schema)..  a declarative schema means the software takes a data blob and lets you declare at read 
(SELECT) time how you want the data to look.. is it 2 columns?  1 column?  is it a tree structure or a list?  It does not follow strict relational calculus
rules.  Users can describe the shape of what they want in addition to the content.  Users can also describe how to interpret the content. So the
user tells the software how to compute the results to some extent.

An SQL-only solution, by definition cannot declare new schema on-the-fly because relational databases cannot do that.  What
they can do is create some computed columns and omit others.  They can do some joins.  Projections. So if you have some DB schema, they can 
join some stuff and give back another table.  All RDBMS can do that and none of them claim to support schema-on-read.  Because they are
SQL-only datastores and to change the data schema you have to dump and re-load the data.

In our case, Dataspaces support File Tables, Record Types and non-columnar data types.. like a Trade object or an Account etc..  We are a Not-Only SQL
datastore.   And when we SELECT these objects we can return them as complex (non-tabular) types, sub-types or super-sets.   We can
flatten a tree structure, turn a table into an array of columns, create multiple object types out of a single file.  And we can do all that as 
part of the SELECT syntax at query time.
