~~Title: What is Reactive Data?~~

~~CLOSETOC~~
<html><font color=#990000 size="+2"><b>What is Reactive Data?</b></font></html>

{{gallery> wiki:general_moc:reactive-data-layer-lg.jpg?559x450&859X667crop&lightbox}}

So what is **<color purple>Reactive Data</color>** and why do we need an RDP((Reactive Data Processing)) platform?  **<color purple>Reactive Data Processing</color>** is a <wrap hi>technique for data management and integration</wrap> that uses the [[wp>Reactive programming | Reactive Programming]] paradigm.  What makes the **<color purple>reactive</color>** approach different is the so-called [[wp> Actor model | Actor]] model, wherein computation is triggered by changes in data or in response to data streams coming from other applications.  Reactive Data Processing means that <wrap hi>the changing state of data and external systems drives application logic</wrap>.  

Why the paradigm shift? As enterprise architectures evolved, critical data used for business decisions became fragmented, voluminous, noisy((Data that often includes non-relevant information)) and distributed; constantly on-the-move and scattered across multiple systems or data centers.  Increasingly, sub-sets of information were shipped between systems and imported from cloud or external apps to <wrap hi>enable reactive and collaborative decision making</wrap>.  This changing landscape, invalidated many prior assumptions about system design, driving the need for new data processing techniques and a simpler way to work with [[ http://en.wikipedia.org/wiki/Data_at_Rest#mediaviewer/File:3_states_of_data.jpg | data in-motion]]; allowing users to model, normalize and query data as it moves thru an organization.

Reactive data provides critical context for business decisions and process automation by responding to streams of information from a variety of sources such as geospatial data, appliance sensors, financial transactions or changes in customer and prospect status.  It allows users to apply rules and filters to much broader volume and categories of information, extracting relevant content and reducing the noise of enterprise data deluge.  The reactive data layer <wrap hi>shields application developers from technical aspects of enterprise data management</wrap> and <wrap hi>makes it easy to integrate, aggregate and consume information from disparate sources</wrap>; allowing applications or devices to query, analyze, receive and interact with such data in real-time on a global scale.

==== The Actor Model ====

Processing in-flight data flowing between enterprise applications and databases has been made difficult by the limitations of multi-tiered, client/server architecture.  Legacy systems relied on powerful, centralized servers to process client requests and host application data.  It was expected that business logic would not change often and critical data would reside in one place along with application logic.  Over time, <wrap hi>as information became fragmented across applications and data silos the volume and variety of data in-motion increased</wrap>. Enterprise architecture evolved to include messaging, routing and transformation capabilities, at the cost of increased complexity.  Sharing data between systems and making changes to logic became difficult and error-prone.  The resulting systems would become brittle, relying on expensive hardware to scale.

The [[wp>Actor model | actor model]] addressed issues of complexity, scale and cost by getting rid of the central server.  Instead, groups of smaller programs would be developed incrementally to handle specific tasks.  The programs would communicate asynchronously by passing sequences of data((Also called event streams)) and run on low-cost commodity hardware.  In contrast to [[wp>Client–server model | Client-Server]] systems where apps sent requests to a server and waited to receive a response, **<color purple>actors</color>** could simultaneously consume and produce data assuming the role of client or server as needed, freeing the apps to perform other tasks while requests were processed.  Actors may be deployed in the same application or scattered across a network of processes and machines; invoked directly or as part of a process flow. Users could declare dependencies between actors, making it possible for one application to **<color purple>observe</color>** data changes or events produced by another and **<color purple>select</color>** the bits that it wanted to process (act on).  This <wrap hi>simplified the task of working with in-flight data and made implementation of logic changes easy and cost-effective</wrap>. Most importantly, the actor model <wrap hi>allowed users to build scale-able, concurrent data processing systems that took advantage of modern [[wp>Multi-core processor|multi-core]] computer architectures</wrap>. 

As reactive application development matured diverse implementations followed, ranging from UI frameworks such as [[http://elm-lang.org/ | Elm]] and [[https://baconjs.github.io/ | Bacon.js]] to full featured reactive programming languages like [[http://www.scala-lang.org/ | Scala]] and [[http://www.erlang.org/ | Erlang]].  Regardless of implementation, <wrap hi>reactive programming concepts are radically transforming the industry</wrap>.  Large companies such as [[http://techblog.netflix.com/2013/02/rxjava-netflix-api.html | Netflix]], [[https://msdn.microsoft.com/en-us/data/gg577609.aspx|Microsoft]] and [[http://facebook.github.io/react/index.html|Facebook]] as well as start-ups like [[https://typesafe.com/ | Typesafe]] are betting big on going reactive.  As of this writing more than 9,000 individuals from various organizations have signed the [[http://www.reactivemanifesto.org/|Reactive Manifesto]], validating this emerging paradigm.  

==== The Data Broker ====

Modern applications are chatty((In constant communication with one or more networked applications)) and data-driven, with more than half of the app logic often devoted to building queries and formatting results. Transient ((The terms <color brown>transient</color>, <color brown>in-motion</color> and <color brown>in-flight</color> data are used here interchangeably)) <wrap hi>application data has the greatest variety of structure and format as it tends to represent current system state</wrap>, changing frequently in response to external events.  Enterprise data managed by storage systems like relational databases, [[wp> NoSQL]] or distributed file frameworks (ie. Hadoop) are classified as [[wp> Data at Rest | data at-rest]] and typically hold system of record or historical data, representing reference information. Enterprise data is typically gathered from a variety of sources and rarely used in its entirety by an application.  Size and complexity of reference data often requires the use of multiple techniques to filter and [[wp> Data curation | curate]] the information, leading to [[wp> big data]] challenges.

Integrating at rest and in-motion data requires a new type of data processing engine; a **<color purple>data broker</color>** that can access multiple types of data through a variety of channels and protocols. A data broker would apply the actor model to data management and integration, providing **<color purple>application data services</color>** that can serve up structured or unstructured data, navigate database systems and allow users to define protocol and format for application communications.  This unified view of information, sometimes referred to as [[wp>Bitemporal data|bitemporal data]] is a critical aspect of modern decision support and process automation systems. 

Although reactive programming makes large-scale data analysis possible and reduces cost, writing functional reactive programs is not easy and requires a specialized skill set. It's time for a new way to write data-driven applications, This new way should be radically simple. It should make it possible to build a prototype in a day or two, and a real production app in a few weeks. It should make everyday things easy, even when those everyday things involve hundreds of servers, millions of users, and integration with dozens of other systems. It should be built on collaboration, specialization, and division of labor, and it should be accessible to the maximum number of people.
   
\\
   
<WRAP group>
<WRAP third column>
<WRAP round box 500px center>

<html><a x-desc="x" title= 
         "sample explanation">   
    <font color=#990000 style="font-weight:bold;">Asynchronous Logic Execution</font><br>
    </a>
    </html>

Asynchronous  typically implemented via Future function calls.  This allows a program to easily produce events without concern for how data emitted by the producer will be 
eventually processed.
   
Simplicity is the primary benifit of asynchronous processing. Separation of concerns between data producers and data consumers makes of a simpler design and allows several developrs to work on the same problem, accelerating development.
   
=> how do you handle this in networked systems?
  
Another significant benifit of Asynchrony is Concurrency, being able to use multiple threads in a single logic flow in order to take advantage of multi-core systems and acheive a so-called hardware synchrony, allow for much more efficient use of multi-core hardware 
</WRAP> 
</WRAP>

<WRAP third column>
<WRAP round box 500px center>

<html><a y-desc="x" title= 
         "sample explanation">   
    <font color=#990000 style="font-weight:bold;">Declarative Dependencies</font><br>
    </a>
    </html>

Declarative dependencies (not procedural) mean that you specify a relationship between data elements once and the software takes care of the rest.  Contrast that to procedural dependencies of an Enterprise Service Bus.. you have to  write code or use Aspects to trigger some logic based on changes in data, create a message, specify a data structure and send the message.  The dependent entity then has to subscribe to a message, parse its contents based on some data model and apply them to its local data.
    
Challenge in synchronizing data models between between referenced data and dependent systems (actors). Assymetric schema between source and target..
    
how do you handle this in networked systems?
    
Reactive data systems lets computers take care of data dependency management rather than leaving that work to the programmers 
</WRAP>
</WRAP>

<WRAP third column>
<WRAP round box 500px center>

<html><a y-desc="x" title= 
         "sample explanation">   
    <font color=#990000 style="font-weight:bold;">Native Stream Processing</font><br>
    </a>
    </html>

Actors essentially work on streams of events generated by referenced (watched) components, much the same way that Complex Event Processing engines and Messaging Application Subscribers do.  A Reactive Programming language presents an abstraction layer above messaging, making it easier to work with data streams, but does not provide structured query or data definition facilities.
 
This makes it difficult to write actors that perform correlation or aggregation of data and requires additional components or frameworks for handling projections and data relationships.  This is perhaps the most crippling limitation of Reactive Programming frameworks and one that makes wide-spread adoption by the business community very difficult.
</WRAP>
</WRAP>
</WRAP>   
   
==== The Theory of Reactivity ====

**<color brown>A=VC<sup>2</sup></color>** 

Okay so this is not a real formula, but should serve as a reminder of the relationship between  of Actionable Information = (Data Velocity x Data Content)<sup>2</sup>

Einstein's  moment of inspiration was the realization that light was a particle and a wave.. in other words, light has mass and velocity which can be measured..  we have the same realization with data.  Data is a wave and a particle, meaning that to process information effectively you have to engage in a series of steps (or stages for SEDA folks) that Stream and Buffer the data.  For example [wp> Map Reduce] is a popular data processing pattern that requires programmers to read chunks of data send them to a group of applications for processing and the applications in turn send their results to the consuming application that aggregates all the bits and presents them to the user.  This is a classic Buffer->Stream->Buffer pattern, wherein data are collected (buffered) during the mapping phase, then passed to the reducers (streamed) and the results are again collected (buffered) and sorted.  Similar to light, modern data is constantly undergoing changes from a collection at rest that can be queried (particle) to a stream in motion that can be filtered (wave).

Data in Motion is the realization that all information has mass (content) and velocity.  This means that data velocity can be measured to gain new insight. And technologies that allow a business to analyze data on the move and measure its velocity give companies new , real-time insight into the state of their business..

Actionable information is data Velocity times Content Squared, meaning that there is unavoidable increase in data volume that occurs as part of data analysis..  

As such, technologies that can process data without storing it are becoming extremely valuable to organisations across all business domains.

**<color brown>Data Change Drives the Business</color>** 

at a high level, data processing networks allow organizations to 'democratize their data' by moving critical sets of information out of departmental siloes  closer to the analytical applications used by employees.  Rather than attempt to draw insights from all the stuff sitting around in departmental soilos, decision makers can query a network that spans  multiple data sources, combining the information in real-time based on appliction needs.
 
A data processing network facilitates organizational transparency makes for ..
 
With RDP, how data is queried also changes.  You are no longer limited to simple request and response of the client/server model.  Users can declare FUTURE queries and specify 
what system or application results will be delivered to.  Or they may define Event Triggers on data collections in the network that push results to the user only when certain conditions are met.  For example data from system A will only be pushed to the user's application when corresponding information appears in system B.
 
As organizations become increasingly data-driven, the need to aggregate from disparate data siloes
and the need to react to changes in data is driving the next generation of analytics technologies.
  
http://bpm.com/bpm-today/podcast/688-nathaniel-palmer-interviews-jeffrey-g-mcmillan-managing-director-morgan-stanley-wealth-management
  
This is one of the reasons why Apache projects such as Storm and Spark are becoming so interesting.  Such technolocies are intended to bridge the gap between general purspose 
reactive languages such as Scala and the ability to query and work with data structures
==== The Network is the Database ====

Putting it all together, a Reactive Data Platform allows Data Analysts, Architects and Application developers to solve a set of new problems presented by Big Data, addresses the challenges of Data Fragmentation an analysis of Data in Motion by making use of Reactive Programming concepts.

the resulting data processing network makes is radically simple to query and manage in-flight data.  StreamScape's powerful query capabilities allow users to work with structured data, object or documents and ask complex questions of such data without restriction on data size or structure.

A reactive data layer sits on top of enterprise data, cloud silos and applications providing a real-time abstraction layer for applications and devices that need to analyze, share and integrate information from such disparate sources.

Instead of moving data into a central warehouse for analysis the Reactive Data Platform allows users to move computation logic to the data and seamlessly merge the results of asynchronous and parallel computations into a single result without programming.

With Reactive Data Processing the network becomes the database.


<html>
<head>
  <script>
  jQuery(function() {
    jQuery( "a[x-desc]" ).tooltip({
      position: {
        my: "center bottom-20",
        at: "center top",
        using: function( position, feedback ) {
          jQuery( this ).css( position );
          jQuery( "<div>" )
            .addClass( "arrow" )
            .addClass( feedback.vertical )
            .addClass( feedback.horizontal )
            .appendTo( this );
        }
      }
    });
  });
  </script>
  
  <script>
  jQuery(function() {
    jQuery( "a[y-desc]" ).tooltip({
      position: {
        my: "center bottom-20",
        at: "center top",
        using: function( position, feedback ) {
          jQuery( this ).css( position );
          jQuery( "<div>" )
            .addClass( "arrow" )
            .addClass( feedback.vertical )
            .addClass( feedback.horizontal )
            .appendTo( this );
        }
      }
    });

    jQuery( "a[y-desc]").tooltip({ content: '<img src="/dokuwiki/_media/icons_large/ds.png" />' });
  });
  </script> 
  
  <script>
  jQuery(function() {
    jQuery( "a[z-desc]" ).tooltip({
      position: {
        my: "center bottom-20",
        at: "center top",
        using: function( position, feedback ) {
          jQuery( this ).css( position );
          jQuery( "<div>" )
            .addClass( "arrow" )
            .addClass( feedback.vertical )
            .addClass( feedback.horizontal )
            .appendTo( this );
        }
      }
    });

    jQuery( "a[y-desc]").tooltip({ content: '<img src="/dokuwiki/_media/icons_large/ds.png" />' });
  });
  </script>   
  
  <style>
  .ui-tooltip, .arrow:after {
    background: gray;
    border: 2px solid white;
  }
  .ui-tooltip {
    padding: 10px 20px;
    padding-top: 50px;
    padding-bottom: 30px;
    color: white;
    border-radius: 10px;
    font: bold 12px "Helvetica Neue", Sans-Serif;
  /* text-transform: uppercase; */
    box-shadow: 0 0 7px black;
    background-image: url(/dokuwiki/_media/icons_large/cep.png);   
    background-repeat: no-repeat;
    background-size:   32px 32px;
    background-position: 16px 10px; 
    white-space: pre-line;
    min-width:330px;
    max-width:330px;
  }
  .arrow {
    width: 70px;
    height: 16px;
    overflow: hidden;
    position: absolute;
    left: 50%;
    margin-left: -35px;
    bottom: -16px;
  }
  .arrow.top {
    top: -16px;
    bottom: auto;
  }
  .arrow.left {
    left: 20%;
  }
  .arrow:after {
    content: "";
    position: absolute;
    left: 20px;
    top: -20px;
    width: 25px;
    height: 25px;
    box-shadow: 6px 5px 9px -9px black;
    -webkit-transform: rotate(45deg);
    -moz-transform: rotate(45deg);
    -ms-transform: rotate(45deg);
    -o-transform: rotate(45deg);
    transform: rotate(45deg);
  }
  .arrow.top:after {
    bottom: -20px;
    top: auto;
  }
  </style>
  
</head>

<img src="/dokuwiki/_media/icons_large/bowlerhat-transp.png" alt="Smiley face" height="46" width="46" style="margin-left:-6px;">
<a href="/dokuwiki/start" style="margin-left:-1em; font-weight:bold; color:#990000">Back</a>
</html>