Blog: Lou Agosta Subscribe to this blog's RSS feed!

Lou Agosta

Greetings and welcome to my blog focusing on reengineering healthcare using information technology. The commitment is to provide an engaging mixture of brainstorming, blue sky speculation and business intelligence vision with real world experiences – including those reported by you, the reader-participant – about what works and what doesn't in using healthcare information technology (HIT) to optimize consumer, provider and payer processes in healthcare. Keeping in mind that sometimes a scalpel, not a hammer, is the tool of choice, the approach is to be a stand for new possibilities in the face of entrenched mediocrity, to do so without tilting windmills and to follow the line of least resistance to getting the job done – a healthcare system that works for us all. So let me invite you to HIT me with your best shot at

About the author >

Lou Agosta is an independent industry analyst, specializing in data warehousing, data mining and data quality. A former industry analyst at Giga Information Group, Agosta has published extensively on industry trends in data warehousing, business and information technology. He is currently focusing on the challenge of transforming Americas healthcare system using information technology (HIT). He can be reached at

Editor's Note: More articles, resources,and events are available in Lou's BeyeNETWORK Expert Channel. Be sure to visit today!

Recently in data quality Category

Amid its 39th quarter of consecutive profitability, Pervasive has launched a new SaaS version of its flagship Data Integrator (DI) product called DI "Cloud Edition". In short, as part of a process described by CTO Mike Hoskins as "innovating below the water line," the Pervasive software development team has service-enabled the platform for SOA. This enables the DI product to bring its extensive connectivity to the cloud, either on the Pervasive DataCloud onAmazon's EC2, or on any cloud.


At the same time the architecture was undergoing a major upgrade, innovation was also occurring above the waterline. Significant functionality was added in the areas of data profiling and data matching. Both profiling and matching are now exploiting the DataRush parallel processing engine. According to Jim Falgout (DataRush Chief Technologist), DataRush continues to set records for high performance.The latest record broken was on September 27, 2010 as it delivered an order of magnitude greater throughput than earlier biological algorithm results, performing 986 billion cell updates/second on 10 million protein sequences using a 384 core SGI Ultrix UV1000 in 81.1 seconds.[1] Wow! This enables the entire platform to deliver the basics for a complete data governance infrastructure to those enterprises that know the answer to the question "What are your data quality policies and procedures?" When combined with the data mining open source offering called "KNIME" (Konstanz Information Miner), which featured prominently in numerous use cases at the conference, Pervasive is now a triple threat across data integration, quality, and predictive analytics.


In addition to software innovation, end-user enterprises will be interested to learn that Pervasive is also delivering pricing innovations. Client (end user) enterprises will be unambiguously pleased to hear about this one, unlike some uses of the phrase "pricing innovations" that have meant gaming the system to implement price increases. Instead of per connector pricing, the classic approach for data integration vendors, according to which each individual connector costs an additional fee, Pervasive provides all of its available connectors in DI v10 Cloud Edition for a single reasonable monthly charge. The figure that was given to me was $1000 a month for the software, including all the available connectors. And remember, Pervasive is famous since its days doing business as Data Junction for the diversity of connectors, across the entire spectrum from xbase class to high end enterprise adapters. For those enterprises confronting the dual challenges of a short timeline to results and a large number of heterogeneous data sources, the recommendation is clear: check this out.

[1] Or further details see "Pervasive DataRush on SGI® Altix® UV 1000 Shatters Smith-Waterman Throughput Record by 43 Percent":

Posted November 4, 2010 10:23 AM
Permalink | No Comments |

Healthcare and healthcare information technology (HIT) continue to be a data integration challenge for payers, providers, and consumers of healthcare services. This post will explore the role of Pervasive's Data Integrator software in the solution envisioned by one of the partners presenting at the iNExt, Misys Open Source Solutions (MOSS). However,first we need to take a step back and get oriented to the opportunity and the challenge.

Enabling legislation by the federal government in the form of the American Recovery and Reinvestment Act (ARRA) of 2009 provides financial incentives to healthcare providers (hospitals, clinics, group practices) that install and demonstrate the meaningful use of software to (1) capture and share healthcare data by the year 2011 (2) enable clinical decision support using the data captured in (1) by 2013 and (3) improve healthcare outcomes (i.e., cause patients to get well) against the track record laid down in (1) by 2015. Although the rules are complex and subject to further refinement by government regulators, one thing is clear. Meaningful use requires the ability of the electronic medical record (EMR/EHR) systems to exchange significant volumes of data and do so in a way that preserves the semantics (i.e., the meaning of the data).

Misys Open Source Solutions (MOSS) is jumping with both feet into the maelstrom of competing requirements, standards, and technologies with a plan to make a difference. MOSS first merged with Allscripts and then separated from them as Allscripts merged with Eclipsys. MOSS is now a standalone enterprise on a mission to harmonize medical records. Riding on Apache 2.0 and leveraging the Open Health Tools (OHT), MOSS is inviting would-be operators and participants in health information exchanges (HIE) to download its components and make the integrated healthcare enterprise (IHE) a reality. As of this writing, the need is great and so is the vision. However, the challenges are also formidable and extend from the requirements for patient identification, document sharing, record location, audit trail production and authentication, subscription services, clinical care documentation and persistence in a repository. Since the software is open source and comes at no additional cost, MOSS's revenue model relies on fees earned for providing maintenance and support.

Whether the HIE is public orprivate the operator confronts the challenge of translating between a dizzying array of healthcare standards - HIPAA, HL7, X12, HCFA, NCPDP, and so on. With literally hundreds of data formats in legacy as well as modern systems out there, those HIEs that are able to provide a platform for interoperability are the ones that will survive and prosper by value-added services rather than just forwarding data. The connection with Pervasive is direct since the incoming data formats may be in HL7 version 2.3 and the outbound format in version 2.7. Pervasive is cloud-enabled and, not to put too fine a point on it, has more data conversion formats than you can shake a stick at. Pervasive is a contributor to the solution at the data integration, data quality, and data mining levels of the technology stack being envisioned by MOSS. Now and in the future, an important differentiator between winners and runner-ups amongst HIEs will be the ability to translate between diverse data formats. In addition, this work must be done with velocity and in significant volume, while preserving the semantics. This approach will add value to the content rather than just acting as a delivery service.

As noted in the companion post to this one, Pervasive has a cloud-enabled version of its data integration product, Version 10, Cloud Edition. DataRush, the parallel processing engine, continues to set new records for high performance parallel processing across large numbers of cores. Significant new functionality in data profiling and data matching is now available as well, making Pervasive a triple threat across data integration, data quality, and when open source data mining from Knime is included, data mining. [For example, NCPDP = National Council of Prescription Drug Programs; HCFA = Healthcare Finance Authority (the precursor to the Centers for Medicare and Medicaid); HIPAA = Health Insurance Portability Accountability Act; HL7 = Health Language Seven; X12 = a national insurance standard format for electronic data exchange of health data.

Posted November 4, 2010 10:04 AM
Permalink | No Comments |

There are so many challenges that it is hard to know where to begin. For those providers (hospitals and large physician practices) that have already attained a basic degree of automation there is an obvious next step - performance improvement. For example, if an enterprise is operating eClinic Works (ECW) or similar run-your-provider EHR system, then it makes sense to take the next step and get one's hand on the actual levers and dials
that drive revenues and costs.

Hospitals (and physician practices) often do not understand their actual costs, so they are struggling to control and reduce the costs of providing care. They are unable to say with assurance what services are the most profitable, so they are unable to concentrate on increasing market share in those services. Often times when the billing system drives provider performance management, the data, which is adequate for collecting payments, is totally unsatisfactory for improving the cost-effective delivery of clinical services. If the billing system codes the admitting doctor as responsible for the revenue, and it is the attending physician or some other doctor who performs the surgery, then accurately tracking costs will be a hopeless data mess. The amount of revenue collected by the hospital may indeed be accurate overall; but the medical, clinical side of the house will have no idea how to manage the process or improve the actual delivery of medical procedures.

Thumbnail image for Thumbnail image for riverlogicjpg.JPG

Into this dynamic, enters River Logic's Integrated Delivery System (IDS) Planner ( The really innovative thing about the offering is that it models the causal relationship between activities,
resources, costs, revenues, and profits in the healthcare context. It takes what-if analyses to new levels, using its custom algorithms in the theory of constraints, delivering forecasts and analyses that show how to improve performance (i.e., revenue, as well as other key outcomes such as quality) based on the trade-offs between relevant system constraints. For example, at one hospital, the operating room was showing up as a constraint, limiting procedures and related revenues; however, careful examination of the data showed that the operating room was not being utilized between 1 PM and 3 PM. The  way to bust through this constraint was to charge less for the facility, thereby incenting physicians to use it at what was for them not an optimal time in comparison with golf or late lunches or siesta time. Of course, this is just an over-simplified tip of the iceberg.


IDS Planner enables physician-centric coordination, where costs, resources, and activities are tracked and assessed in terms of the workflow of the entire, integrated system. This creates a context of physician decision-making and its relationship to costs and revenues. Doctors appreciate the requirement to control costs, consistent with sustaining and improving quality, and they are eager to do so when shown the facts. When properly configured and implemented, IDS Planner delivers the facts. According to River Logic, this enabled the Institute for Musculosketal Health and Wellness at the Greenville Hospital System to improve profit  by more than $10M a year by identifying operational discrepancies, increase physician-generated revenue over $1,700 a month, and reduce accounts receivable by 62 down to 44 days (and still falling), which represents the top 1% of the industry.  Full disclosure: this success was made possible through a template approach with some upfront services that integrated the software with the upstream EHR system, solved rampant data quality issues, and obtained physician "buy in" by showing this constituency that the effort was win-win.

The underlying technology for IDS Planner is based on the Microsoft SQL Server (2008) database and Share Point for web-enabled information delivery.

In my opinion, there is no tool on the market today that does exactly what IDS Planner does in the areas of optimizing provider performance.River Logic's IDS Planner has marched ahead of the competition, including successfully getting the word out about its capabilities. The obvious question is for how long? The evidence is that this is a growth area based on the real and urgent needs of hospitals and large provider practices. There is no market unless there is competition; and an overview of the market indicates offerings
such as Mediware's InSight (, Dimensional Insight ( with a suite of the same name, Vantage Point HIS  ( once again with a product of the same name. It is easy to predict that sleeping giants such as Cognos (IBM) and Business Objects (SAP) and Hyperion (Oracle) are about to reposition the existing performance management capabilities of these products in the direction of healthcare providers. Microsoft is participating, though mostly from a data integration perspective (but that is another story), with its Amalga Life Science offering with a ProClarity frontend. It is a buyer talking point whether and how these offerings are able to furnish useable software algorithms that implement a robust approach to identifying and busting through performance constraints. In every case, all the usual disclaimers apply. Software is a proven method of improving productivity, but only if properly deployed and integrated into the enterprise so that professionals can work smarter. Finally, given market dynamics in this anemic economic recovery, for those end-user enterprises with budget, it is a buyer's market. Drive a hard bargain. Many sellers are hungry for it and are willing to go the extra mile in terms of extra training, services, or payment terms.

Posted April 5, 2010 11:33 AM
Permalink | No Comments |

I had a chance to talk with Yves de Montcheuil, VP of Marketing, about current events at Talend and its vision of the future.

Talend addresses data integration across a diverse array of industry verticals. Its inroads in healthcare will be of interest to readers of this blog. As noted elsewhere, healthcare is a data integration challenge ( healthcare data integration). For example, at Children's Hospital and Medical Center of Omaha (NE), heterogeneous systems are the order of the day. The ambulatory EMR generates tons of documents. These need to be added to its legal medical record system, MedPlus Chartmaxx. On occasion, some of those documents error out before being captured to the patient's chart in Chartmaxx. This is clinical information impacts clinician decision making, and must be filed to the appropriate patient's record in a timely manner, supporting patient care quality. Talend synchronizes such processes across clinical systems. It providers data transformations, notifications and, in this case, exception processing, furnishing a level of functionality that previously required a larger and more expensive ETL tool from a larger and more expensive software vendor. This is the tip of the iceberg; and Talend is now the standard at the enterprise for data integration and data quality. This is obviously also the process in which to perform data quality activities - data profiling, data validation, and data correction. Data validation occurs inside the data stream, and any suspect data is flagged and included in a report that is then processed for reconciliation. The ability to perform data quality controls and corrections across them makes the processing of data faster and smoother. It should be noted that, although I drilled down on this example, Talend has numerous high profile wins in healthcare (accessible on its web site here.)


Taking a strategy from the play book of its larger competitors, but without the pricing mark up, Talend is developing a platform that includes data quality in the form of Talend Data Profiler and Talend Data Quality, the latter, of course, actually able to validate and correct the errors surfaced. The obvious question is what is the next logical step?

Several possibilities are available. However, the one engaged by Talend - and its a good one - is the announcement (here) of the acquisition of a master data management (MDM) software firm, Amalto Technologies, and plans to make it a part of its open source distribution in 2010. This is a logical move for several reasons. First, data integration and data quality (rationalization) are on the critical path to a consistent, unified view of customers, products, providers, and whatever master data dimensions turn you on. The data warehouse is routinely referred to as a single version of the truth. Now it turns out that there is no single version of data warehousing truth without a single version of customer, product, location, and calendar (and so on) truth to support the data warehouse. (This deserves a whole post in itself, so please stand by for update on that.)

While the future is uncertain, I am betting on the success of Talend for several reasons. First, the approach at Talend - and open software in general - simplifies the software acquisition process (and this regardless of any price consideration). Instead of having to negotiate with increasingly stressed out (and scarce) sales staff, who need to qualify you as a buyer with $250K or $500K to invest, the prospect sets his own agenda, downloading the software and building a prototype at its own pace. If you like the result and want to scale up - and comments about the quality of the software are high, though, heavens knows, like any complex artifact, there is a list of bug fixes - then a formal open source distribution is available - for a fee, of course - with a rigorous, formal service level agreement and support. Second, according to Gartner's November 25, 2009 Magic Quadrant for Data Integration, available on the Talend web site for a simple registration, Talend has some 800 customers. I have not verified the accuracy of this data, though there are logos aplenty on the Talend web site, including many in healthcare, and all the usual disclaimers apply. Talend is justifiably proud and is engaging in a bit of boasting here as open source gets the recognition it has for some time deserved Third, Talend is turning the crank - in a positive sense of the word - with a short cycle for enhancements, currently every six months or so. With a relatively new and emerging product, this is most appropriate, though I expect that to slow as functionality reaches a dynamic equilibrium a couple of years from now. There are some sixty developers in China - employees of Talend, not out sourced developers - reporting to a smaller design/development team of some 15 architects in France. Leaving aside the formal development of the defined distribution of the software for the moment, the open source community provides the largest focus group you can imagine, collecting and vetting requests and requirements from the community. As in so many areas of the software economy, Talend is changing the economics of data integration - and soon MDM - in a way that benefits end-user enterprises. Watch for the success of this model to propagate itself virally - and openly - in other areas of software development.  Please let me hear from you about your experiences with Talend, data integration, and open source in all its forms.

Posted December 11, 2009 9:47 AM
Permalink | 2 Comments |

AMB logo.JPG

I had the opportunity to sit down with Steve Meister, President, and Paul Henkins, Director, of AMB and get the update on the data quality software innovations in the pipeline - and in production at client sites. Although these guys have been in the data quality market for about ten years, a few years ago they the company bet on new development using web services and C# when the market was still skeptical about the long-term viability of the language. Today they are repeating the rewards in terms of leap-frog capabilities that deliver comprehensive, integrated metadata-driven instream discovery (profiling), pattern analysis, probabilistic (fuzzy) matching, drill back to source, repository-based reporting, and anomaly correction.

AMB has been making solid inroads in the public sector and healthcare in both the provider and payer markets. Insurance companies, including healthcare payers, have been clients from the start. In addition, its partnering relationships have gotten it "out there" on a global basis through internal applications at large software providers. For example, AMB is "under the hood" with Sartori Software's postal and list cleansing services, which gets it out to over 5000 clients without their necessarily knowing about it. PDM is integrated with Microsoft SQL Server Integration Services (SSIS).

Any data quality project quickly surfaces a wealth of tactical details about valid values, tolerances, and metadata. However, the payoff occurs at the mission critical business level. Business people "get" data quality when it is expressed as policy-based governance - it is about all the key master data dimensions - customers, patients, providers, products, services, diagnoses, procedures, and all the related transactional details.

In any market - including the challenging ones we face today - information is that which reduces uncertainty. Data whose quality is suspect increases uncertainty, and that is a situation incompatible with enterprise success. Much of the AMB product demo is necessarily about features and functions that make the technology easy to use and powerfully productive for stake holders such as data stewards and business analysts. However, the management at AMB understands that it will succeed by automating data governance and information management at the level of the enterprise. One lesson learned? Quality attracts buyers and prospects, and one of the challenges faced by AMB will be to manage its growth, choose its clients wisely, and build for long-term success, even while hitting the number quarterly.

Based on a live, interactive demo, this is one of the most usable interfaces and products that I have seen in years. Fuzzy, probabilistic matching is now mainstream and the products delivers it in high performance algorithms that meet the need of real time and near real time web services. Note this runs as a web services engine, not an API, which means performance benefits and a flexible architecture that accommodates the demanding interactive environment. Of course, it also works well for less demanding batch processing.

Although the current pricing is aimed at "getting a footprint" in the site, PDM supports enterprise environments (and enterprise pricing where applicable) for direct sourcing for profiling and quality, including DB2/UDB, DB2/400, DB2/z-OS Mainframe, Flat files, MS SQL, Oracle, and Teradata. PDM is one of the first supported products for SQL Server 2008 (as source, repository, and SSIS) with Version 6.0.5. PDM delivers the Profiling/Analysis results as a standard relational database repository rather than a vendor proprietary database. Repository database options include the de facto open options - MS SQL Server, DB2/UDB and Oracle. An actual open source option such as MySQL would be a nice to have as the open source movement marches on.

The "predictive" aspect of the "PDM" branding is based on the results of profiling that builds a rule to flag outliers that exceed the tolerance from a base-line. One healthcare client - call them Acme Dental for the time being - discovered a highly suspect $400k reimbursement that in itself would pay the cost of the entry level starter-kit 40-times over. Whether an inadvertent error or something more sinister, such outliers still requires follow up by a human auditor. So standby for update.

Posted November 24, 2009 1:33 PM
Permalink | No Comments |