But, back to DataRush. DataRush was originally conceived as a redesign of Pervasive's data integration tool, acquired from Data Junction in 2003. However, it was soon recognized that the underlying function could be applied to other data-intensive tasks such as analytics. Pervasive CTO, Mike Hoskins, described DataRush as a toolkit and engine that enables ordinary programmers to create parallel-processing applications simply and easily using data flow techniques to design them and without having to worry about the complexities of parallel-processing design, such as timing and synchronization between parallel tasks.
Now, of course, there's nothing new about parallel processing or the inherent difficulties it presents to programmers. It's been at the heart of large-scale data warehousing, particularly through the use of MPP (massively parallel processing) systems, for a number of years. Mike's point, however, was that parallel processing is about to go mainstream. The technology shift enabling that has been underway for a few years now--the growing availability of multi-core processors and servers since the mid-2000s. 4-core processors are already common on desktop machines, while processors with 32 cores and more are already available for servers. Multiply that by the number of sockets in a typical server, and you have massive parallelism in a single box--if you can use it. The problem is that with existing applications designed for serial processing, the only benefit to be gained from such multi-core servers at present it in supporting multiple concurrent users or tasks or in what's known as "embarrassingly parallel" applications where there are no inter-task dependencies. DataRush's claim to fame is that it moves data-intensive parallel processing from high-end, expensive and complex MPP clusters and specialist programmers to commodity, inexpensive and simple SMP multi-core servers and ordinary developers.
Of course, Pervasive is not alone in trying to tackle the issues involved in software development for parallel-processing environments. But their approach, coming from the large-scale data integration environment, makes a lot of sense in BI.
However, to see the really significant implications, we need to see this development in the context of other technological advances. There is the emergence of solid-state disks (SSDs) and the growing sizes and dropping costs of core memory that remove or reduce the traditional disk I/O bottleneck. The decades-old supremacy of traditional relational databases is being challenged by a variety of different structures, some broadly relational and others distinctly not. Add to this the explosive growth of data volumes, especially soft or "unstructured" information. Pervasive, along with other small and medium-sized software vendors, is pushing information processing to an entirely new level.
Posted August 18, 2010 7:58 AM
Permalink | No Comments |




Leave a comment