We use cookies and other similar technologies (Cookies) to enhance your experience and to provide you with relevant content and ads. By using our website, you are agreeing to the use of Cookies. You can change your settings at any time. Cookie Policy.

Healthcare and Big Data – Drive Insights, Optimize Costs, Innovate Care

Originally published February 2, 2012

In the last three quarters of 2011 a significant debate emerged in terms of healthcare and cost. There are almost 80 million Baby Boomers approaching retirement, and economists forecast this trend will likely bankrupt Medicare and Medicaid in the near future. While healthcare reform and its new laws ignite a number of important changes, the core issues are not resolved. It’s critical we fix our system now, or else our $2.6 trillion in annual healthcare spending will grow to $4.6 trillion by 2020 – one-fifth of our gross domestic product (GDP).

Data Rich and Information Poor

Healthcare has always been data rich. Medicine has developed so quickly in the past 30 years that along with preventive and diagnostic developments, we have generated a lot of data: clinical trials, doctors’ notes, patient therapies, pharmacists’ notes, medical literature and, most importantly, structured analysis of the datasets in analytical models.

On the payor side, while insurance rates are skyrocketing, insurance companies are trying hard to vie for wallet share, and the most interesting observation here is the strong influence of social media.

On the provider side, the small amount of physicians and specialists available versus the growing need for them is becoming a larger problem. Additionally, obtaining second and third expert opinions for any situation to avoid medical malpractice lawsuits has created a need for sharing knowledge and seeking advice. At the same time, however, there are several privacy laws being passed to protect patient privacy and data security.

On the therapy side, we have several smart machines capable of sending readings to multiple receivers, including doctors’ mobile phones. We have become successful in reducing or eliminating latencies and have many treatment alternatives, but we do not know where best to apply them. Treatments that can work well for some do not work well for others. We do not have statistics that can point to successful interventions, where and who they have worked on, or predict how and where to apply the same in a suggestion or recommendation to a physician.

There is a lot of data available, but not all of this data is being harnessed into powerful information. Clearly healthcare remains one of our nation’s data-rich- yet-information-poor industries. It is clear that we must start producing better information, at a faster rate and on a larger scale.

Before we can reduce costs and deliver meaningful improvements in outcomes, we must have meaningful information. The challenge is while the data is available today, the systems that need to harness the same have not been available.

Big Data and Healthcare

Big data is information that is both traditionally available (doctors’ notes, clinical trials, insurance claims data, drug information), and new data generated from social media, forums, hosted sites (e.g., WebMD) and machine data. The characteristics of the big data here are:
  • Volume – the data sizes are varied and range from megabytes to multiple terabytes

  • Velocity – the data production by machines, doctors’ notes, nurses’ notes and clinical trials are all produced at different speeds and are highly unpredictable

  • Variety – the data is available or produced in a variety of formats and not all formats are based on similar standards
Over the past five years, we have had a number of technology innovations to handle the Web2.0-based data environments, including Hadoop, NoSQL, data warehouse appliances (iteration 3.0 and more) and columnar databases. There are several analytical models that have been made available, and late last year the Apache Software Foundation released a collection of statistical algorithms called Mahout. With so many cool innovations, we can definitely create a powerful information processing architecture that will address multiple issues that face data processing in healthcare today:
  • Solving complexity

  • Reducing latencies

  • Agile analytics

  • Scalable and available systems

  • Usefulness (right information available to right resource at right time)

  • Improving collaboration

Potential Solutions

How will big data solutions fix the healthcare situation? A prototype solution flow is shared here. While this is not a complete production system flow, there are several organizations working on such models in small and large footprints.

Figure 1: Prototype Solution Flow

In an integrated system, we can intelligently harness different types of data using architectures like those of Facebook or Amazon to create a scalable solution. Using a textual processing engine like FRT Textual ETL (extract, transform, load), we can enable small and medium enterprises (SMEs) to write business rules in English. The textual data, image and video data can be processed using any of the open source foundation tools. Data output from all these integrated processors will produce a rich dataset and also generate an enriched column-value pair output. We can use the output along with existing enterprise data warehouse (EDW) and analytical platforms to create a strong set of models utilizing analytical tools and leveraging Mahout algorithms.

When we use metadata-based integration of data, and create different types of solutions including evidence-based statistics, clinical trial vs. clinical diagnosis types of insights, patient dashboards for disease state management based on machine output, etc., we can create information that is rich, auditable and reliable. This information can be used to provide better care, reduce errors, and create more confidence in sharing data with physicians in a social media outlet, thus providing more insights and opportunities. We can convert research notes from doctors that have been dormant into real data and insights, as well as create a global search database that will provide more collaboration and possibilities to open vistas to share research in “gene therapies” for several diseases.

When we can provide better cures and improve the quality of care, we can manage patient health in a more agile manner. Such a solution will be a huge step in reducing healthcare costs and fixing a broken system.

Eventually this integrated data can also provide lineage into creating auditing systems for a patient based on insurance claims, Medicaid and Medicare. It will also help isolate fraud, which is a big revenue leak, and will create the ability to predict population-based spend required based on disease information from each state. Additionally, integrated data will help drive metrics and goals to improve efficiency and ratios.

While all of these are lofty goals, big data-based solution approaches will help create a foundational step toward solving the healthcare crisis. There are several issues to confront in the data space, such as quality of data, governance, electronic health record (EHR) implementation, compliance, safety and regulatory reporting. Following an open source type of approach, if a consortium can be formed to tackle this at the US Health and Human Services Department, a lot of associated bureaucracy can be minimized. More vendor-led solution developments from the private and public sectors will create unified platforms that can be leveraged to create this blueprint.


While big data cannot fix healthcare on its own, it can provide the foundational platform toward creating a holistic solution. In my personal experience, my team presented a health consortium with a feasible solution. Perhaps in the future, we will have a global health platform where we can solve much more than costs for healthcare.
  • Krish KrishnanKrish Krishnan
    Krish Krishnan is a worldwide-recognized expert in the strategy, architecture, and implementation of high-performance data warehousing solutions and big data. He is a visionary data warehouse thought leader and is ranked as one of the top data warehouse consultants in the world. As an independent analyst, Krish regularly speaks at leading industry conferences and user groups. He has written prolifically in trade publications and eBooks, contributing over 150 articles, viewpoints, and case studies on big data, business intelligence, data warehousing, data warehouse appliances, and high-performance architectures. He co-authored Building the Unstructured Data Warehouse with Bill Inmon in 2011, and Morgan Kaufmann will publish his first independent writing project, Data Warehousing in the Age of Big Data, in August 2013.

    With over 21 years of professional experience, Krish has solved complex solution architecture problems for global Fortune 1000 clients, and has designed and tuned some of the world’s largest data warehouses and business intelligence platforms. He is currently promoting the next generation of data warehousing, focusing on big data, semantic technologies, crowdsourcing, analytics, and platform engineering.

    Krish is the president of Sixth Sense Advisors Inc., a Chicago-based company providing independent analyst, management consulting, strategy and innovation advisory and technology consulting services in big data, data warehousing, and business intelligence. He serves as a technology advisor to several companies, and is actively sought after by investors to assess startup companies in data management and associated emerging technology areas. He publishes with the BeyeNETWORK.com where he leads the Data Warehouse Appliances and Architecture Expert Channel.

    Editor's Note: More articles and resources are available in Krish's BeyeNETWORK Expert Channel. Be sure to visit today!

Recent articles by Krish Krishnan



Want to post a comment? Login or become a member today!

Be the first to comment!