Blog: Lou Agosta Subscribe to this blog's RSS feed!

Lou Agosta

Greetings and welcome to my blog focusing on reengineering healthcare using information technology. The commitment is to provide an engaging mixture of brainstorming, blue sky speculation and business intelligence vision with real world experiences – including those reported by you, the reader-participant – about what works and what doesn't in using healthcare information technology (HIT) to optimize consumer, provider and payer processes in healthcare. Keeping in mind that sometimes a scalpel, not a hammer, is the tool of choice, the approach is to be a stand for new possibilities in the face of entrenched mediocrity, to do so without tilting windmills and to follow the line of least resistance to getting the job done – a healthcare system that works for us all. So let me invite you to HIT me with your best shot at

About the author >

Lou Agosta is an independent industry analyst, specializing in data warehousing, data mining and data quality. A former industry analyst at Giga Information Group, Agosta has published extensively on industry trends in data warehousing, business and information technology. He is currently focusing on the challenge of transforming America’s healthcare system using information technology (HIT). He can be reached at

Editor's Note: More articles, resources, and events are available in Lou's BeyeNETWORK Expert Channel. Be sure to visit today!

April 2010 Archives

Datameer takes its name from the sea - the sea of data - as in the French la mer or German, das Meer.


I caught up with Ajay Anand, CEO, and Stefan Groschupf, CTO. Ajay earned his stripes as Director of Cloud Computing and Hadoop at Yahoo. Stefan is a long-time open source consultant, and advocate, and cloud computing architect from EMI Music.


Datameer is aligning with datameerlogo.JPGthe two trends of Big Data and Open Source. You do not need an industry analyst to tell you that data volumes continue to grow, with unstructured data growing at a rate of almost 62% CAGR and structured less, but a still substantial 22% (according to IDC). Meanwhile, open source has never looked better as a cost effective enabler of infrastructure.


The product beta is launched with McAfee, nurago, a leading financial services company and a major telecommunications service provider  in April with the summer promising to deliver early adopters with the gold product shipping in the autumn. (Schedule is subject to changes without notice.) 


The value proposition of Datameer Analytics Solution (DAS) is  helping users perform advanced analytics and data mining with the same level of expertise required for a reasonably competent user of an Excel spreadsheet.


As is often the case, the back story is the story. The underlying technology is Hadoop. Hadoop is an open source standard for highly distributed systems of data. It includes both storage technology and execution capabilities, making it a kind of distributed operating system, providing a high level of virtualization. Unlike a relational database where search requires chasing up and down a binary tree, Hadoop performs some of the work upfront, sorting the data and performing streaming data manipulation. This is definitely not efficient for small gigabyte volumes of data. But when the data gets big - really big - like multiple terabytes and petabytes, then the search and data manipulation functions enjoy an order of magnitude performance improvement. The search and manipulation are enabled by the MapReduce algorithm.  MapReduce has been made famous by the Google implementation as well as the Aster Data implementation of it. Of course, Hadoop is open source. MapReduce takes a user defined mapping function and a user defined reduce function and performs key pair exchange, executing a process of grouping, reducing, and aggregation at a low level that you do not want to have to code yourself. Hence, the need for and value in a tool such as DAS. It generates the assembly level code required to answer business and data mining questions that business wants to ask of the data. In this regards, DAS functions rather like a Cognos or BusinessObjects front-end in that it presents a simple interface in comparison to all the work being done "under the hood". Clients who have to deal with a sea of data now have another option for boiling the ocean without getting steamed up over it.

Posted April 15, 2010 9:21 AM
Permalink | No Comments |

I caught up with Ben Werther, Director of Product Marketing, for a conversation about business developments at Greenplum and Greenplum's major new release.


According to Ben, Greenplum has now surpassed more than 100 enterprise customers and is enjoying revenue growth of about 100%, albeit from a revenue base that befits a company of relatively modest size. They also claim to be adding new enterprise customers faster than either Teradata or Netezza.


What is particularly interesting to me is that with its MAD methodology Greenplum is building an agile approach to development that directly addresses the high performance of its massively parallel processing capabilities. This is an emerging trend in high end parallel databases that is receving new impetus. More on this shortly. Meanwhile, release 4.0 includes enterprise class DBMS functionality such as -

-        Complex query optimization

-        Data loading

-        Workload Management

-        Fault-Tolerance

-        Embedded languages/analytics

-        3rd Party ISV certification

-        Administration and Monitoring

From the perspective of real world data center operations, the workload management features are often neglected but are critical path for successful operations and growth. Dynamic query balancing is a method used on mainframes for the most demanding workloads, and Greenplum has innovated in this area, with its solution now being "patent pending".


Just in case scheduling does not turn you on, a more sexy initiative is to be found in fault tolerance. Given that Greenplum is an elephant hunter, favoring large and high end installations, this is news you can use. Greenplum Database 4.0 enhances fault tolerance using a self-healing physical block replication architecture. Key benefits of this architecture are:

-        Automatic failure detection and failover to mirror segments

-        Fast differential recovery and catchup (while fully online / read-write)

-        Improved write performance and reduced network load

 Greenplum has also made is easier to update single rows against on-going queries. While data warehouses are mostly inquiry-intensive, it has been a well known secret that update activity is common in many data warehousing scenarios, driven by business changes to dimensions and hierarchies.


At the same time, Greenplum is announcing a new product - Chorus - aimed at the enterprise data cloud market. Public cloud computing has the buzz. What is less well appreciated is that much of the growth is in enterprise cloud computing - clouds of networked data stores with (relatively) user friendly frontends within the (virtual) four walls of a global enterprise such as a telecommunications company, bank, or related firm.



E N T E R P R I S E  D A T A  C L O U D

This shows the Enteprrise Data Cloud schematically with the Greenplum database on top of the virtualized commodity hardware, operating system, public Internet tunnel, and Chorus abstraction layer. Chorus aims at being the source of all the raw data (often 10X size of the EDW); providing a self-service infrastructure to support multiple marts and sandboxes; and, finally, furnishing a rapid analytic iteration, and business led solution. Chorus enables security, providing extensive, granular access control over who is authorized to view and subscribe to data within Chorus; collaboration, facilitating the publishing, discovery, and sharing of data and insight using a social computing model that appears familiar and easy-to-use. Chorus takes a data-centric approach, focusing on the necessary tooling to manage the flow and provenance of data sets as they are created/shared within a company.

One more thing. Even given the blazingly fast performance of massively parallel processing data warehousing, heterogeneous data requires management. It is becoming an increasingly critical skill to surround one's data and make it accessible with a useable, flexible method of data management. Without a logical, rational method of organizing data, the result is just more proliferating, disconnected  islands of information. Greenplum's solution to this challenge? Get MAD!

Of course, this is a pun, standing for a platform capable of supporting the magnetic, agile, and deep principles of MAD Skills. "Magnetic" does not refer to disk, though there is plenty of that. This conforms to data warehousing orthodoxy in one respect only - it agrees to get all the data into one repository; but it does not subscribe to the view that it must all be conformed or rendered consistent. This is where the "agile" comes in - deploying a flexible, stabe-by-stage process and in parallel. A laboratory approach to data analysis is encouraged with cleansing and structuring being staged within the same repository. Analysts are given their own "sandbox" in which to explore and test out hypotheses about buying behavior, trends, and so on. Successful solutions are generalized as best practices. In effect, given the advances in technology, the operational data store is a kludge that is no longer required. Regarding the "deep," advanced statistical methods are driven close to the data. For example, one Greenplum customer had to calculate the ordinary least square (OLS is a method of fitting a curve to data) by exporting the data into the statistical language R for calculation and then importing it back, a process that required several hours. This regression was moved into the database thanks to the capability of Greenplum and ran significantly faster due to much less data movement. In another example involving highly distributed data assembled by Chorus, T-Mobile assembled data from a number of large untapped sources (cell phone towers, etc), as well as data in the EDW and others source systems, to build a new analytic sandbox; ran a number of analyses including generating a social graph from call detail records and subscriber data; and discovered behavior where T-Mobile subscribers were seven times more likely to churn if someone in their immediate network left to another service provider. This work would ordinarily require months of effort just to provision databases and discover and assemble the data sources, but was completed within two weeks while deploying a one petabyte production instance of Greenplum Database and Greenplum Chorus. As the performance bar goes up, methodologies and architectures (such as Chorus) are required to sprint ahead in order to keep up. As already noted and in summary, with its MAD methodology, Greenplum is building an agile approach to development that promises to keep up with the high performance bar of its massively parallel processing capabilities. An easy prediction to make is that the competitors already know about it and are already doing it. Really!?


Posted April 14, 2010 7:46 AM
Permalink | No Comments |

There are so many challenges that it is hard to know where to begin. For those providers (hospitals and large physician practices) that have already attained a basic degree of automation there is an obvious next step - performance improvement. For example, if an enterprise is operating eClinic Works (ECW) or similar run-your-provider EHR system, then it makes sense to take the next step and get one's hand on the actual levers and dials
that drive revenues and costs.

Hospitals (and physician practices) often do not understand their actual costs, so they are struggling to control and reduce the costs of providing care. They are unable to say with assurance what services are the most profitable, so they are unable to concentrate on increasing market share in those services. Often times when the billing system drives provider performance management, the data, which is adequate for collecting payments, is totally unsatisfactory for improving the cost-effective delivery of clinical services. If the billing system codes the admitting doctor as responsible for the revenue, and it is the attending physician or some other doctor who performs the surgery, then accurately tracking costs will be a hopeless data mess. The amount of revenue collected by the hospital may indeed be accurate overall; but the medical, clinical side of the house will have no idea how to manage the process or improve the actual delivery of medical procedures.

Thumbnail image for Thumbnail image for riverlogicjpg.JPG

Into this dynamic, enters River Logic's Integrated Delivery System (IDS) Planner ( The really innovative thing about the offering is that it models the causal relationship between activities,
resources, costs, revenues, and profits in the healthcare context. It takes what-if analyses to new levels, using its custom algorithms in the theory of constraints, delivering forecasts and analyses that show how to improve performance (i.e., revenue, as well as other key outcomes such as quality) based on the trade-offs between relevant system constraints. For example, at one hospital, the operating room was showing up as a constraint, limiting procedures and related revenues; however, careful examination of the data showed that the operating room was not being utilized between 1 PM and 3 PM. The  way to bust through this constraint was to charge less for the facility, thereby incenting physicians to use it at what was for them not an optimal time in comparison with golf or late lunches or siesta time. Of course, this is just an over-simplified tip of the iceberg.


IDS Planner enables physician-centric coordination, where costs, resources, and activities are tracked and assessed in terms of the workflow of the entire, integrated system. This creates a context of physician decision-making and its relationship to costs and revenues. Doctors appreciate the requirement to control costs, consistent with sustaining and improving quality, and they are eager to do so when shown the facts. When properly configured and implemented, IDS Planner delivers the facts. According to River Logic, this enabled the Institute for Musculosketal Health and Wellness at the Greenville Hospital System to improve profit  by more than $10M a year by identifying operational discrepancies, increase physician-generated revenue over $1,700 a month, and reduce accounts receivable by 62 down to 44 days (and still falling), which represents the top 1% of the industry.  Full disclosure: this success was made possible through a template approach with some upfront services that integrated the software with the upstream EHR system, solved rampant data quality issues, and obtained physician "buy in" by showing this constituency that the effort was win-win.

The underlying technology for IDS Planner is based on the Microsoft SQL Server (2008) database and Share Point for web-enabled information delivery.

In my opinion, there is no tool on the market today that does exactly what IDS Planner does in the areas of optimizing provider performance.River Logic's IDS Planner has marched ahead of the competition, including successfully getting the word out about its capabilities. The obvious question is for how long? The evidence is that this is a growth area based on the real and urgent needs of hospitals and large provider practices. There is no market unless there is competition; and an overview of the market indicates offerings
such as Mediware's InSight (, Dimensional Insight ( with a suite of the same name, Vantage Point HIS  ( once again with a product of the same name. It is easy to predict that sleeping giants such as Cognos (IBM) and Business Objects (SAP) and Hyperion (Oracle) are about to reposition the existing performance management capabilities of these products in the direction of healthcare providers. Microsoft is participating, though mostly from a data integration perspective (but that is another story), with its Amalga Life Science offering with a ProClarity frontend. It is a buyer talking point whether and how these offerings are able to furnish useable software algorithms that implement a robust approach to identifying and busting through performance constraints. In every case, all the usual disclaimers apply. Software is a proven method of improving productivity, but only if properly deployed and integrated into the enterprise so that professionals can work smarter. Finally, given market dynamics in this anemic economic recovery, for those end-user enterprises with budget, it is a buyer's market. Drive a hard bargain. Many sellers are hungry for it and are willing to go the extra mile in terms of extra training, services, or payment terms.

Posted April 5, 2010 11:33 AM
Permalink | No Comments |