The information management paradigm of the past decade has been to transfer data into databases to make it accessible to decision makers and knowledge workers. To reduce query loads and processing demands on operational systems, data from those operational systems is typically transferred into data warehouses where it is becomes part of the information value chain.
In the value chain (data → information → knowledge → action → results) that leads from data availability to business results, making data available is just a small part of the cycle. One of the goals of information management is to use the information gleaned from the data to achieve business results. One technology that has made that information more accessible, albeit in a different paradigm, is stream processing. With stream processing, and other forms of operational business intelligence (some covered in this series), data is processed before it is stored (if it is ever stored), not after.
Data velocity can severely tax a store-then-process solution, thereby limiting the availability of that data for business intelligence and business results. Stream processing brings the processing into real-time and eliminates data load cycles and at least some of the manual intervention. Stream processing is often the only way to process high velocity and high volume data.
Since all data can contribute to the information management value chain and speed is essential, many companies are turning to processing directly on the data stream with a process-first approach. It’s more important to process data than store it; and with data stream processing, multiple streams can be analyzed at once. It has become very common to spread the operational workload around by processing different transactions of the same type in multiple systems. With stream processing, all of these streams can be processed at the same time. The added complexity of multiple streams is sometimes referred to as complex event processing (CEP).
Every organization has data streams. It’s data “in play” in the organization, having been entered into an input channel and in the process of moving to its next organizational landing spot, if any. Two fundamental approaches are taken with stream processing:
- Gather the intelligence from the transaction
- Use the transaction to trigger a business activity
Every sales transaction increases company sales, reduces inventory and indicates a marketing success. Updating these indicators, perhaps in a master data management (MDM) store
, immediately codifies business drivers and prepares the organization for more successful strategic decisions. However, many companies start stream processing with a need for the immediacy of related business actions. For example, financial institutions can look across streams to immediately determine illegal or high-risk activity and stop the transaction. Healthcare companies can analyze procedures and symptoms and bring real-time useful feedback into the network immediately. Retailers can make smart next-best offers to their customers, and manufacturers can detect anomalous activity on the production line. Every organization that uses sensor technology is generating high velocity actionable data that is a good candidate for stream processing. Made possible by stream processing, the successful real-time intervention into any of these processes easily translates into savings of millions of dollars per year in large companies with high velocity/high volume transaction data.
Candidate workflows for data stream processing should be analyzed through the lens of the need for real-time processing. There are limitations as to what “other” data can be brought to bear on the transaction and used in the analysis. Integration cannot be physical. After all, these streams have high velocity, and real-time analysis has to be done on a continuous basis. As the worlds covered in this series (MDM
, data virtualization
, columnar databases
, and stream processing) collide, I expect that the MDM
hub, containing highly summarized business subject-area information, will be the entity added to stream processing.
From a risk perspective, organizations need to consider whether a transaction should be or should not be allowed. While some transactions may appear to be risky, many others could appear to be suspicious due to lack of an up-to-date customer profile, which increasingly will be in an MDM store. Regardless, the point is that it is very possible to combine stream processing with master data through data virtualization
Stream processing accommodates a SQL-like programming interface that has extended SQL in the form of time series and pattern matching syntax that allows analysis to take place on “the last n minutes” or “the last n
rows” of data. Processing a “moving window” of cross-system, low latency, high-velocity data delivering microsecond insight is what stream processing is all about.
These technologies changing our view of the data warehouse are all increasingly found in all vendor portfolios. The trick will be deploying them effectively for business results and getting them to “play nice.” Conversations over the next decade should be about workload allocation and acceptance of new ways to manage information such as stream processing for real-time demands.
SOURCE: Stream Processing
Recent articles by William McKnight