Oops! The input is malformed!
Originally published November 7, 2013
A few months ago, David Torres wrote about emerging data systems architectures supporting today’s enterprise data management landscape. Companies have access to more data at a faster pace than ever before. The Internet of Things and wearable computing devices as well as new customer data coming from social media bring both opportunities and challenges for businesses trying to harness the data to drive business value. Organizations must update the architectures underpinning their data management programs in order to meet this tidal wave of data head on and deliver relevant and rapid business insights.
While the promise of big data is great, data without discovery is useless. Companies must have a data architecture that can respond to and scale with the ever-growing amounts of data coming into the organization as well as support exploration so that true insights can be drawn from the data.
In order to facilitate this information discovery, as well as prevent data loss, next-generation data architectures need to support a foundational data store (either on premise, cloud-based or in hybrid model) that is both fault tolerant and scalable. This new storage model enables data refinery and exploration scenarios that allow companies to gain insights via information production scenarios that were previously not possible in a structured/relational data storage model.
For example, within an enterprise context, this new data storage model can play a role of persisted event store, allowing companies to remove schema-then-capture technology constraints of relational database systems. As the enterprise canonical data model evolves, a foundational data store implemented using Hadoop HDFS does not have to. We can continue adding daily aggregated message logs without operational impact to the data persistence model, as storage is not schema constrained.
Additionally, an HDFS-based event store can enable downstream data exploration scenarios with compatible data analysis and visualization capabilities that can surface the right corresponding data to draw out additional insights. These new insights can then drive data refinement scenarios, where data would be moved from the foundational data store into the existing relational data warehouse to enable information distribution via available enterprise data publishing and visualization capabilities.
So now the question becomes how do we migrate from the “old” approach to a next-generation data architecture with as little disruption to business as possible? The message to end users should be that we are augmenting the data systems environment with new storage models, data interpretation and movement mechanisms. We are not changing the existing relational data warehouse, but rather providing additional capabilities to enable them to better tap into data arriving in both existing and new formats (i.e., deriving customer sentiment analysis from social media unstructured data, etc.).
It’s also worth noting that foundational storage is not designed to solve data velocity scenarios. There isn’t just more data – it’s now arriving at lightning speed. For instance, IDC estimates the installed base of Internet-connected things will reach 212 billion by the end of 2020. That’s hundreds of billions of new data streams coming into the organization every minute. This calls for a new approach to data processing.
To that end, Nathan Marz and James Warren are working on an interesting book on scalable, real-time data systems (available sometime in early 2014) that advocates for a “need for speed” layer to deliver on these needs. Companies should think about the right combination of tools to develop this speed layer in their architecture. In my opinion, I think some combination of NoSQL storage models along with the existing complex event processing (CEP) solutions will likely play a part in the manifestation of this solution layer.
To close, it’s clear that we are going to have to continue to challenge ourselves and the way we think about enterprise data management while upgrading existing programs and governance models to account for big data. For example, how do we deal with privacy scenarios that did not exist before, and what are the architectures that are going to support these new realities?
To David’s earlier point, as we deflate the hype surrounding big data, we will have to evolve data systems architectures to accommodate for technology platforms that are playing catch up with capabilities to enable new business scenarios. Data quality will remain a challenge, and we will continue to have frank discussions with business leaders as we think of ways to accelerate insights to keep up with the ever-growing data landscape.
Recent articles by Timur Mehmedbasic