We use cookies and other similar technologies (Cookies) to enhance your experience and to provide you with relevant content and ads. By using our website, you are agreeing to the use of Cookies. You can change your settings at any time. Cookie Policy.

Stream Processing Conclusion of a 5-Part Series on Technologies Changing our View of the Data Warehouse

Originally published April 5, 2012

The information management paradigm of the past decade has been to transfer data into databases to make it accessible to decision makers and knowledge workers. To reduce query loads and processing demands on operational systems, data from those operational systems is typically transferred into data warehouses where it is becomes part of the information value chain.

In the value chain (data → information → knowledge → action → results) that leads from data availability to business results, making data available is just a small part of the cycle. One of the goals of information management is to use the information gleaned from the data to achieve business results. One technology that has made that information more accessible, albeit in a different paradigm, is stream processing. With stream processing, and other forms of operational business intelligence (some covered in this series), data is processed before it is stored (if it is ever stored), not after.

Data velocity can severely tax a store-then-process solution, thereby limiting the availability of that data for business intelligence and business results. Stream processing brings the processing into real-time and eliminates data load cycles and at least some of the manual intervention. Stream processing is often the only way to process high velocity and high volume data.

Since all data can contribute to the information management value chain and speed is essential, many companies are turning to processing directly on the data stream with a process-first approach. It’s more important to process data than store it; and with data stream processing, multiple streams can be analyzed at once. It has become very common to spread the operational workload around by processing different transactions of the same type in multiple systems. With stream processing, all of these streams can be processed at the same time. The added complexity of multiple streams is sometimes referred to as complex event processing (CEP).

Every organization has data streams. It’s data “in play” in the organization, having been entered into an input channel and in the process of moving to its next organizational landing spot, if any. Two fundamental approaches are taken with stream processing:

  1. Gather the intelligence from the transaction

  2. Use the transaction to trigger a business activity
Every sales transaction increases company sales, reduces inventory and indicates a marketing success. Updating these indicators, perhaps in a master data management (MDM) store, immediately codifies business drivers and prepares the organization for more successful strategic decisions. However, many companies start stream processing with a need for the immediacy of related business actions. For example, financial institutions can look across streams to immediately determine illegal or high-risk activity and stop the transaction. Healthcare companies can analyze procedures and symptoms and bring real-time useful feedback into the network immediately. Retailers can make smart next-best offers to their customers, and manufacturers can detect anomalous activity on the production line. Every organization that uses sensor technology is generating high velocity actionable data that is a good candidate for stream processing. Made possible by stream processing, the successful real-time intervention into any of these processes easily translates into savings of millions of dollars per year in large companies with high velocity/high volume transaction data.

Candidate workflows for data stream processing should be analyzed through the lens of the need for real-time processing. There are limitations as to what “other” data can be brought to bear on the transaction and used in the analysis. Integration cannot be physical. After all, these streams have high velocity, and real-time analysis has to be done on a continuous basis. As the worlds covered in this series (MDM, Hadoop, data virtualization, columnar databases, and stream processing) collide, I expect that the MDM hub, containing highly summarized business subject-area information, will be the entity added to stream processing.

From a risk perspective, organizations need to consider whether a transaction should be or should not be allowed. While some transactions may appear to be risky, many others could appear to be suspicious due to lack of an up-to-date customer profile, which increasingly will be in an MDM store. Regardless, the point is that it is very possible to combine stream processing with master data through data virtualization.

Stream processing accommodates a SQL-like programming interface that has extended SQL in the form of time series and pattern matching syntax that allows analysis to take place on “the last n minutes” or “the last n rows” of data. Processing a “moving window” of cross-system, low latency, high-velocity data delivering microsecond insight is what stream processing is all about.

These technologies changing our view of the data warehouse are all increasingly found in all vendor portfolios. The trick will be deploying them effectively for business results and getting them to “play nice.” Conversations over the next decade should be about workload allocation and acceptance of new ways to manage information such as stream processing for real-time demands.

  • William McKnightWilliam McKnight
    William is the president of McKnight Consulting Group, a firm focused on delivering business value and solving business challenges utilizing proven, streamlined approaches in data warehousing, master data management and business intelligence, all with a focus on data quality and scalable architectures. William functions as strategist, information architect and program manager for complex, high-volume, full life-cycle implementations worldwide. William is a Southwest Entrepreneur of the Year finalist, a frequent best-practices judge, has authored hundreds of articles and white papers, and given hundreds of international keynotes and public seminars. His team's implementations from both IT and consultant positions have won Best Practices awards. He is a former IT Vice President of a Fortune company, a former software engineer, and holds an MBA. William is author of the book 90 Days to Success in Consulting. Contact William at wmcknight@mcknightcg.com.

    Editor's Note: More articles and resources are available in William's BeyeNETWORK Expert Channel. Be sure to visit today!

Recent articles by William McKnight



Want to post a comment? Login or become a member today!

Be the first to comment!