Blog: Lou Agosta Subscribe to this blog's RSS feed!

Lou Agosta

Greetings and welcome to my blog focusing on reengineering healthcare using information technology. The commitment is to provide an engaging mixture of brainstorming, blue sky speculation and business intelligence vision with real world experiences – including those reported by you, the reader-participant – about what works and what doesn't in using healthcare information technology (HIT) to optimize consumer, provider and payer processes in healthcare. Keeping in mind that sometimes a scalpel, not a hammer, is the tool of choice, the approach is to be a stand for new possibilities in the face of entrenched mediocrity, to do so without tilting windmills and to follow the line of least resistance to getting the job done – a healthcare system that works for us all. So let me invite you to HIT me with your best shot at LAgosta@acm.org.

About the author >

Lou Agosta is an independent industry analyst, specializing in data warehousing, data mining and data quality. A former industry analyst at Giga Information Group, Agosta has published extensively on industry trends in data warehousing, business and information technology. He is currently focusing on the challenge of transforming America’s healthcare system using information technology (HIT). He can be reached at LAgosta@acm.org.

Editor's Note: More articles, resources, and events are available in Lou's BeyeNETWORK Expert Channel. Be sure to visit today!

Recently in Data Warehousing Category

The answer is clinical data warehousing, decision support, and analytics. What's the question? Wellpoint (one of the leading Blue Cross branded health insurance companies) is reportedly contracting to use IBM's computing grand challenge system nicknamed "Watson" (after IBM's founder) to address a list of clinical issues in medical diagnosis, treatment, and (potentially) cost. In the spirit of Jeopardy!, the question is will it advance in the direction of enabling comparative effectiveness research (CER) and pay for performance (P4P) while enhancing the quality of medical outcomes? Healthcare consumers tend to get a tad nervous when they suspect that insurance companies are going to deploy a new computer system as part of the physician payment approval process, nor (let us be clear) has anyone actually said that will happen in this case.

The diagnosis of a disease is part science, part intuition and artistry. The medical model trains doctors and healthcare specialists using an apprentice system (in addition, of course, to long schooling and lab work). The hierarchical nature of disease diagnosis has long invited automation using computers and databases. Early expert medical systems such as MYCIN at Stanford or CADUCEUS at Carnegie-Mellon University were initially modest sized arrays of if-then rules or semantic networks that grew explosively in resource consumption, time-to-manage, and cost and complexity of usability. They were compared in terms of accuracy and speed with the results generated by real world physicians. The matter of accountability and error was left to be worked out later. Early results were such that automated diagnoses was as much work, slower, and not significantly better - though the automation would occasionally be surprisingly "out of the box" with something no one else had imagined. One lessons learned? Any computer system is better managed and deployed like an automated co-pilot rather than a primary locus of decision making or responsibility.

Work has been ongoing at universities and research labs over the decades and new results are starting to emerge based on orders of magnitude improvements in computing power, reduced storage costs, ease of administration, and usability enhancements. The case in point is IBM's Watson, which has been programmed to handle significant aspects of natural language processing, play jeopardy (it beat the humans), and, as they say in the corporate world, other duties as assigned.

Watson generates and prunes back hypotheses in a way that simulates what human beings do in formulating a differential diagnoses. However, the computer system does so in an explicit, verbose, and even clunky way using massive parallel processing whereas the human expert distills the result out of experience, years of training, and unconscious pattern matching. Watson requires about eight refrigerator size cabinets for its hardware. The human brain still occupies a space about the size of a shoe box.

Still, the accomplishment is substantial. An initial application being considered is having Watson scan the vast medical literature on treatments and procedures to match evidence-based outcomes to individual persons or cohorts with the disease in question. This is where Waton's strengths in natural language processing, formulating hypotheses, and pruning them back based on confidence level calculations - the same strengths that enabled it to win at Jeopardy - come into play. In addition, oncology is a key initial target area because of the complexity of the underlying disorder as well as the sheer number of individual variables. Be ready for some surprises as Watson percolates up innovative approaches to treatment that are expensive and do not necessarily satisfy anyone's cost containment algorithm. Meanwhile, there are literally a million new medical articles published each year, though only a tiny fraction of them are relevant to any particular case. M.D.s are human beings and have been unable to "know everything" there is to know about a specialty for at least thirty years. In short,  Watson just could be the optimal technology for finding that elusive needle in a haystack - and doing so cost effectively.

A medical differential diagnosis in medicine is a set of hypotheses that subsequently have to be first exploded, pruned, and finally combined based on confidence and prior probability to yield an answer. This corresponds to the so-called Deep Question and Answering Architecture implemented in Watson. Within five years, similar technologies will have been licensed and migrated to clinical decision support systems from standard EMR/EHR vendors.

While your clinical data warehouse may not be running 3,000 Power 750 cores and terabytes of self-contained data in a physical footprint about the size of eight refrigerators, some key lessons learned are available even for a modest implementation of clinical data warehousing decision support:

  • Position the clinical data warehouse as a physician's assistant (think: co-pilot) to answer questions, provide a "sanity check," and fill in the gaps created by explosively growing treatments.
  • Plan on significant data preparation (and attention to data quality) to get data down to the level of granularity required to make a differential diagnoses. ICD-10 (currently mandated for 10/2013 but likely to slip), will help a lot, but may still have gaps.
  • Plan on significant data preparation (and more attention to data quality) to get data down to the level of granularity required to make a meaningful financial decision about the effectiveness of a given treatment or procedure. Pricing and cost data is dynamic, changing over time. New treatments start out expensive and become less costly. Time series pricing data will be critical path. ICD-10 (currently mandated for 10/2013 but likely to slip) will help but will need to be augmented significantly into new pricing data structures and even then but may still have gaps.
  • Often there is no one right answer in medicine - it is called a "differential diagnosis" - prefer systems that show the differential (few of them today do, though reportedly Watson can be so configured) and trace the logic at a high level for medical review.
  • Continue to lobby for tort and liability reform as computers are made part of the health care team, even in an assistant role. Legal issues may delay, but will not stop implementation in the service of better quality care.
  • Look to natural language interfaces to make the computing system a part of the health care team, but be prepared to work with a print out to a screen till then.
  • Advanced clinical decision support, rare in the market at this time, is like a resident in psychiatry, in that it learns from its right and wrong answers using machine learning technologies as well as "hard coded" answers from a database of semantic network.
  • This will take "before Google (BG)" and "after Google (AG)" in medical training to a new level. Watson-like systems will be available on a smart phone or tablet to residents and attendings at the bedside.

Finally, for the curious, the cost of the hardware and customized software for some 3,000 Power 750 cores (commercially available "off the shelf"), terabytes of data and including time and effort of a development team of some 25 people with Ph.D.s working for four years (the later being the real expense), my back of the envelope pricing (after all this is a blog post!) weighs in at least in the ball park of $100 million. This is probably low, but I am embarrassed to price it higher. This does not include the cost of preparing the videos and marketing. One final thought. The four year development time of this project is about the length of time to train a psychiatrist in a standard residency program.

Bibliography

  1. "Wellpoint's New Hire. What is Watson?" The Wall Street Journal. September 13, 2011. http://online.wsj.com/article_email/SB10001424053111903532804576564600781798420-lMyQjAxMTAxMDEwMzExNDMyWj.html?mod=wsj_share_email

  2. IBM: "The Science Behind and Answer": http://www-03.ibm.com/innovation/us/watson/

 


Posted September 14, 2011 4:06 PM
Permalink | No Comments |

The cruel economy strikes again, and Mark Hurd is out as CEO at Hewlett-Packard Co. (H-P). First lesson learned here? Always be sure that your expense reports are accurate. If the participant in the event was "Jodie Fisher," then the documentation should say "Jodie Fisher". And, of course, while $20K - the alleged amount of over payment for marketing services reportedly not performed - is a tad north of dinner for 25 people at Charlie Trotter's (Chicago), we are not talking Enron (fortunately).  H-P said its investigation showed that Mr. Hurd did not violate H-P's sexual harassment policy, but other anomalies were uncovered.[1] There is no excuse for it, nor will any be suggested here. Zero tolerance strikes again - as well it should. And yet ...

 

Lesson number two, and this is a tough one. As Presidential candidate Jimmie Carter said in a notorious interview with Playboy magazine (in 1975), "In my heart I have lusted after woman." Much as the tabloids might miss the details of a juicy sexual adventure, this is not one. If this is not the case of a sexual phantasy gone astray, then I would not know one. Make no mistake. Mark Hurd is no Bill Clinton, who after all got to keep his job (along a strict party line vote). This is a severe penalty to pay for the "crime" of having a sexual phantasy. The cynic in me momentarily says Mr. Hurd ought to have been fired for failing to score, but that is an idea for which I apologize in advance to him, knowing that he is family man. Leave it to a numbers guy to try to impress a girl with ... well, numbers. No doubt he will schedule some time off to repair the damage done, and that is as it should be.

 

The third lesson? Life imitates art? Not quite. Life imitates Reality TV. And this is the sad part. Not to dump on the would be  femme fatal in this case, who, after all, was literally a Hollywood star and a contestant in a Reality TV show, she anticipated a settlement and reportedly has one. Not having a single fact in the matter, I infer that this is the governance paradigm in reality TV. Write a letter; get even more publicity (and maybe a settlement or at least another TV show); no harm done. Not quite. Now she says she's sorry that Mr. Hurd lost his job because he turned over her letter alleging sexual harassment to the H-P Board. I believe it. Rumors that she has reportedly invited him to appear with her on the next episode of The Apprentice have been debunked. While this has rich comic possibilities, I fear that I have crossed the line between reality and phantasy (again), just like this entire episode.

 

And that is the final lesson here. Strict corporate governance is appropriate. Corporate governance is different than The Lives of the Rich and Famous. The pendulum swings back-and-forth between wild parties and zero tolerance. Mr. Hurd's mistake was momentarily to confuse the non-reality of Reality TV with trust and integrity between human beings. Yet he ought to have known that, in many contexts, a mere conversation about sex (or anything remotely sexual) is a method of gaining power and maximizing revenue (through a settlement), especially among those who traditionally have lacked power. Poor judgment in choosing friends? You bet! The unintended result? Corporate America has lost a powerful and articulate leader, albeit an imperfect one, at a time when leadership is in short supply. We eagerly await his reflections (next book?) on life in the corporate jungle, business leadership, and the cruel economy; and if Mr. Hurd requires an editor or a foil for his ideas, then I hope he will reach out. He has a listening here.



[1] WSJ.com: August 8, 2010, "Mark Hurd Failed to Follow HP Code," BEN WORTHEN and JOANN S. LUBLIN, http://online.wsj.com/article/SB10001424052748704268004575417800832885086.html?mod=WSJ_article_MoreIn_Business


Posted August 9, 2010 9:54 AM
Permalink | No Comments |

What happens when an irresistible force meets an immovable object? We are about to find out. The irresistible force of BI, eDiscovery, compliance, fraud detection, governance, risk management, and other analytic and regulatory mandates is heading straight toward the immovable rock of year-to-year 10% reductions in information technology budgets.

BMMSoft.JPG 

The convergence of the markets for structured and unstructured data has been heralded many times, but maybe the time has come. We think that the new generation of solutions with increasing overlap of structured and unstructured data and multi-functionality will emerge and that BMMsoft EDMT® Server is the pioneer in that space. Looking into the crystal ball, what will happen is that an increasing overlap already underway will disrupt incumbents across these diverse markets.

The world wide BI Market as defined by Gartner is sized at $8.8B.[1] Realistically that includes a lot of Business Objects (SAP), SAS applications, and IBM solutions so the database part of that is probably closer to $6 billion.[2] The document management software market is estimated at nearly $3 billion.[3] While email archiving is relatively new and growing rapidly due to near federal regulations, it has now reached the $1 billion "take off" point. In short, at nearly $10 billion total, a product that addressed requirements across all three of these markets with a reasonable prospect of response from even one third of the enterprises, would have an outside boundary of over $3 billion. This is a substantial market under any interpretation.

 

In the meantime, the exiting markets for these three classes of products is fragmented into silos of the traditional data warehousing vendors, email archiving, and document management, the latter sometimes including compliance and governance software. The first are well known in the market - extending from such stalwarts as HP, IBM, Oracle, Microsoft, SAP, to data warehousing appliances and column-oriented databases - and will not be repeated here (though one new developments will be noted below). Document management systems include IBM FileNet Business Process Manager (www.ibm.com), EMC Documentum (www.emc.com), OpenText LiveLink ECM (www.opentext.com), Autonomy Cardiff Liquid Office (www.cardiff.com). Strictly speaking, risk management is considered a separate market from document management. Risk management and compliance offerings include Aventis (www.aventis.com), BWise (www.bWise.com), Cura (www.curasoftware.com), Protiviti (www.protiviti.com), Compliance 360 (www.compliance360.com) and IBM, which has at least two offerings one based on Lotus Notes and one based on FileNet. This list is partial and could easily be expanded with many best of breed offerings. The result? Fragmentation. Diversity, though not in a positive sense. Many offerings instead of a comprehensive approach to unified access and unified analysis.

 

Five years from now data will be as heterogeneous as ever and the uses of data even more so, but individual products - single instance products, not solutions - will characterize a transformed market for database management that traverses the boundaries between email archiving, document management, and data warehousing with agility that is only dreamt about in today's world. Video clips are now common on social networking sites such as Facebook and YouTube. Corporate sponsorship of such opportunities for viral marketing is becoming more common. The requirement to track and manage product brands and images will necessitate the archiving of such material, so multi-media (image/video/audio) are being added to the mix.

 

This future is being driven and realized by the imperative for business transparency, risk management and compliance, and growing regulatory requirements layered on top of existing business intelligence and document management requirements. Still, document management is distinct from workflow. If an enterprise needs workflow, then it will continue to require a special purpose document management system. Workflow was invented by FileNet in 1985, acquired by IBM in 2006, and continues to lead the pack where detailed step-by-step process engineering is required. Elaborate rules-engines for enterprise decision management are different than compliance. If an enterprise requires a rule-engine for compliance and governance, then it will need a special purpose compliance, risk management, and governance system. Such solutions would be over-kill for those enterprises that require email archiving for eDiscovery, document management for first order compliance, and cross references to transactional data in the data warehouse. While the future is uncertain, one of the vendors to watch is BMMsoft.

 

Innovation Happens

BMMsoft has put together a product delivering functionality across these three previously unrelated silos - data warehousing, eDiscovery (e-mail), and document management - and able to be purchased as a EDMT®Server - a single part number from BMMsoft (EDMT stands for "E-Mail, Documents, Media, Transactions"). The database "under the hood" is Sybase IQ, a column-oriented data store with a proven track record and several large objectively audited benchmarks. The latest of these weighs in at 1000 terabytes - a petabyte - and was audited by Francois Raab, the same professional who audits the TPC.org benchmarks.[4]  The business need is real and based on customer acceptance. So is the product.

 

The three keys to connect and make intelligible the data from the three different sources are:

 

1) extreme scalability to handle the data volumes - this is where a column-oriented database would come in handy since the storage compaction is intrinsic and prior to the additional compression that could be applied;

2) parallel, real-time high performance ETL functionality to load all the data; and finally

3) search capabilities that enable high performance inquiries against the data.

Such unified access to diverse data types, intelligently connected by metadata, is also sometimes described as a "data mashup."

 

A part of the challenge that a start up - and up start - such as BMMsoft will face is building credibility, which BMMsoft has already solved with numerous client installations in production and success stories. In the case of BMMsoft EDMT® Server there is another consideration:  metadata is an underestimated and underdeveloped opportunity. Innovations in metadata that make possible many applications that require cross referencing emails, documents, and the transactional data. For example, fraud detection, threat identification, enhanced customer relations - all require navigating across the different data types. Metadata makes that possible. That is not an easy problem to solve; and BMMsoft has demonstrated significant progress with it. Second, the column-oriented database is intrinsically skinny in terms of data storage in comparison with the standard relational database, which continues to be challenged by database obesity. As data warehouses scale up, the cost of storage technology becomes a disproportionably large part of the price of the entire system. Note that for the column-oriented approach proportional cost savings come into view and are realized. Third, this also has significant performance implications, since if there is less data - in terms of volume points - to manage, then it is faster to do so. So when all the reasons are considered, the claims are quite modest, or at least in line with common sense. The wonder is that no one thought of it sooner.

 

When you think about it for a minute, there is every reason that an underlying database should be capable of storing a variety of different data types and doing so intelligently. The latter intelligence is the "secret sauce" that differentiates BMMsoft. The relationships between the different types of data are built as the data is being loaded by BMMsoft using multiple software technology patents.  The column-orientation of the underlying data store - Sybase IQ - intrinsically condenses the amount of space required to persist the information, yielding up to an order of magnitude - more typically a factor of two or three - in storage savings, even prior to the application of formal compression algorithms. This fights database obesity across all segments - email, document, media, transactional (structured) data warehousing information. This means that the application that lives off of the underlying data is able to take advantage of performance improvements since less data is being stored and more being fetched with every data retrieval. For those enterprises with a commitment to installed Oracle or MySQL infrastructure, BMMsoft provides investment protection. The EDMT® Server runs also on Oracle, Netezza and MySQL and can be easily ported to any other relational Database. 

 

Thus, BMMsoft is a triple threat and is able to function as a standalone product addressing data warehousing, email archiving, and document management requirements as separate silos. But just as importantly, for those enterprises that need or want to compete with advanced applications in fraud detection, security threat assessment, customer data mining beyond structured data, BMMsoft offers the infrastructure and application to do so. For example, the ability to perform cross-analysis between securities traded on the stock market and those companies named in email and voice mail (remember multimedia handling) will immediately provide a short list for follow up detection on on-going insider trading or other fraudulent scheme. While hindsight is 20-20, a similar method of identifying emerging patterns through cross-analysis would have been be useful in surfacing the 8 billion dollar Societe General fraud, Madoff's nonexistent options plays at the basis of the pyramid, the Georgia Tech shooter, and relevant chatter that shows up prior to terrorist attacks. Going forward, this technology is distinct in that it can be deployed on a small, medium, or large scale to highlight emerging hot spots that require attention.

 

One may object - but won't the competition be able to reverse engineer the functionality and provide something similar using different methods? Of course, eventually every innovation will be competitively attacked by some more-or-less effective "work around." Read the prospectus - new start ups and existing software laboratories at HP, IBM, etc. will eventually produce innovations that challenge the contender. However, that could require three to five years. Then there is the matter of bringing it to market. IBM provides an example, based on publicly available news reports. IBM went out and purchased FileNet for about $5 billion dollars. FileNet is a great company, which virtually invented workflow, and if one requires advanced workflow capabilities, it is hands down a good choice. However, it does not do data warehousing or email archiving. As a subsidiary of IBM which delivers substantial revenue to the "mother ship," the executives in charge will set a high bar on any IBM innovations which combine email archiving and structured data warehousing with document management. In short, IBM is faced with the classic innovator's dilemma.[5] The price points that interest it - both internally and externally - are further up on the curve than the deals that BMMsoft will be able to complete. Given that BMMsoft has established presence in the market, it has a good chance of marching up market, displacing the installed, legacy solutions as it goes. This happened before in the client server revolution when IBM mainframe deals at the several million dollar price point were undercut by a copy of PowerBuilder and a copy of Sybase, albeit a different version of the database. Given that BMMsoft has a head-start, it is exploiting first mover advantages and building an installed base that will be challenged only with great difficulty. The relevance of such technology in the context of healthcare information technology (HIT) will be explored in a pending post. Please stand by for update - and keep in touch!



[2] For an alternative point of view see an IDC forecast (published 2007) that pegs the Data Warehouse management/platforms market as approx $8.97B in 2010

http://www.information-age.com/channels/information-management/news/1052367/business-intelligence-market-grows-22-says-gartner.thtml

[3] Cited in "Document Management Systems Market: 2007 - 2010,":  http://www.researchandmarkets.com/research/c3917a/document_management

 

[4] http://www.sybase.com/guinness.

[5] Clayton Christensen, The Innovator's Dilemma. Cambridge, MA: Harvard Business School Press, 1998.


Posted August 9, 2010 8:31 AM
Permalink | No Comments |

Datawatch provides an ingenious solution to information management, integration, and synthesis by working from the outside inwards. Datawatch's Monarch technology reverse engineers the information in the text files that would otherwise be sent to be printed as a hardcopy, using the text file as input to drive further processing, aggregation, calculation, and transformation of data into usable information. The text files, PDFs, spreadsheets, and related printer input become new data sources. With no rekeying of data and no programming, business analysts have a new data source to build bridges between silos of data in previously disparate systems and attain new levels of data integration and cohesion.

 

datawatch (5).JPG

For those enterprises running an ERP system for back office billing such as SAP or a hospital information system (HIS) such as Meditech, the task of getting the data out of the system using proprietary SAP coding or native MUMPS data store can be a high bar, requiring custom coding. Datawatch intelligently zooms through the existing externalization of the data in the reports, making short work of opening up otherwise proprietary systems.

 

Note that a trade-off is implied here. If your reporting is a strong point, Datawatch can take an installation to the next level, enabling coordination and collaboration, breaking down barriers between reporting silos that were previously impossible to bridge and doing so with velocity. Programming is not needed, and the level of difficulty is comparable to that of managing an excel spreadsheet targeting a smart business analyst. However, if the reports are inaccurate or even junk, even Datawatch cannot spin the straw into gold. You will still have to fix the data at its source.

 

Naturally, cross functional report mining works well in most verticals extending from finance to retail, from manufacturing to media, from the public sector to not for profit organizations. However, what makes healthcare a particularly inviting target is the relatively late and still on-going adoption of data warehousing combined with the immediate need to report on numerous clinical, quality and financial metrics such as the pending "Meaningful Use" metrics created via the HITECH Act. This is not a tutorial on meaningful use; however, further details can be found in a related article entitled "Game on! Healthcare IT Proposed Criteria on 'Meaningful Use' Weigh in at 556 Pages" click here. One of the goals of "meaningful use" in HIT is to combine clinical information with financial data in order to drive improvements in quality care, patient safety and operational efficiency while simultaneously optimizing cost control and reduction. The use of report mining and integration of disparate sources also allow the healthcare industry to migrate towards a pay-for-performance model, whereby providers will be reimbursed based on the quality and efficiency of care provided. However, financial, quality, clinical metrics and the evolving P4P models all require cross functional reporting from multiple systems. Even for many modern hospital information systems (HIS) that is a high bar. For those enterprises without an enterprise-wide data warehousing solution, no one is proposing to wait three to five years for a multi-step installation prior to learning the needed data still requires customization. In the interim, Datawatch has a feasible approach worth investigating.

 

In conversations with Datawatch executives John Kitchen (SVP Marketing) and Tom Callahan (Healthcare Product Manager), I learned that Datawatch has more than 1,000 organizations in the healthcare sector using Datawatch technology. Datawatch is surely a well kept secret, at least up until now. This is a substantial resource for best practices, methods and models, and lessons learned in the healthcare area. Datawatch can leverage these resources to its advantage and the benefit of its clients. While this is not a recommendation to buy or sell any security (or product), as a publicly traded firm, Datawatch is well positioned to benefit as the healthcare market continues its expansion. Datawatch provides a compelling business case with favorable ROI from the time of installation to the delivery of problem-solving value for the end user client. The level of IT support required by Datawatch is minimal, and sophisticated client departments have sometimes gone directly to Datawatch to get the job done.

 

Let's end with a client success story in HIT. Michele Clark, Hospital Revenue Business Analyst, Los Angles based Good Samaritan Hospital, comments on the application of Datawatch's Monarch Pro: "We simply run certain reports from MEDITECH's scheduling module, containing data for surgeries already scheduled, by location, by surgeon. We then bring those reports into Monarch Pro. Then, in conjunction with its powerful calculated fields, Monarch allows us to report on room utilization, block time usage and estimated times for various surgical procedures. The flexibility of Monarch to integrate data from other sources results in a customized, consolidated dataset in Monarch. We can then analyze, filter and summarize the data in a variety of ways to maximize the efficiency of our operating room resources. Thanks to Monarch, we have dramatically improved the utilization of our operating rooms, can more easily match available surgeons with required upcoming procedures, and better manage surgeon time and resources. Our patients are receiving the outstanding standard of care they expect, while we make the most of our surgical resources. This kind of resource efficiency is talked about a lot in the healthcare community. With Monarch, we are achieving it."  This makes Datawatch one to watch.


Posted July 1, 2010 9:59 AM
Permalink | 1 Comment |

Datameer takes its name from the sea - the sea of data - as in the French la mer or German, das Meer.

 

I caught up with Ajay Anand, CEO, and Stefan Groschupf, CTO. Ajay earned his stripes as Director of Cloud Computing and Hadoop at Yahoo. Stefan is a long-time open source consultant, and advocate, and cloud computing architect from EMI Music.

 

Datameer is aligning with datameerlogo.JPGthe two trends of Big Data and Open Source. You do not need an industry analyst to tell you that data volumes continue to grow, with unstructured data growing at a rate of almost 62% CAGR and structured less, but a still substantial 22% (according to IDC). Meanwhile, open source has never looked better as a cost effective enabler of infrastructure.

 

The product beta is launched with McAfee, nurago, a leading financial services company and a major telecommunications service provider  in April with the summer promising to deliver early adopters with the gold product shipping in the autumn. (Schedule is subject to changes without notice.) 

 

The value proposition of Datameer Analytics Solution (DAS) is  helping users perform advanced analytics and data mining with the same level of expertise required for a reasonably competent user of an Excel spreadsheet.

 

As is often the case, the back story is the story. The underlying technology is Hadoop. Hadoop is an open source standard for highly distributed systems of data. It includes both storage technology and execution capabilities, making it a kind of distributed operating system, providing a high level of virtualization. Unlike a relational database where search requires chasing up and down a binary tree, Hadoop performs some of the work upfront, sorting the data and performing streaming data manipulation. This is definitely not efficient for small gigabyte volumes of data. But when the data gets big - really big - like multiple terabytes and petabytes, then the search and data manipulation functions enjoy an order of magnitude performance improvement. The search and manipulation are enabled by the MapReduce algorithm.  MapReduce has been made famous by the Google implementation as well as the Aster Data implementation of it. Of course, Hadoop is open source. MapReduce takes a user defined mapping function and a user defined reduce function and performs key pair exchange, executing a process of grouping, reducing, and aggregation at a low level that you do not want to have to code yourself. Hence, the need for and value in a tool such as DAS. It generates the assembly level code required to answer business and data mining questions that business wants to ask of the data. In this regards, DAS functions rather like a Cognos or BusinessObjects front-end in that it presents a simple interface in comparison to all the work being done "under the hood". Clients who have to deal with a sea of data now have another option for boiling the ocean without getting steamed up over it.


Posted April 15, 2010 9:21 AM
Permalink | No Comments |
PREV 1 2