Blog: Mark Madsen Subscribe to this blog's RSS feed!

Mark Madsen

Open source is becoming a required option for consideration in many enterprise software evaluations, and business intelligence (BI) isn't exempt. This blog is the interactive part of my Open Source expert channel for the Business Intelligence Network where you can suggest and discuss news and events. The focus is on open source as it relates to analytics, business intelligence, data integration and data warehousing. If you would like to suggest an article or link, send an e-mail to me at open_source_links@ThirdNature.net.

About the author >

Mark, President of Third Nature, is a former CTO and CIO with experience working in both IT and vendors, including a stint at a company used as a Harvard Business School case study. Over the past decade, Mark has received awards for his work in data warehousing, business intelligence and data integration from the American Productivity & Quality Center, the Smithsonian Institute and TDWI. He is co-author of Clickstream Data Warehousing and lectures and writes about data integration, business intelligence and emerging technology.


March 2009 Archives

I uploaded the slides from last's week's webcast on operational data integration and open source. They're embedded below for online viewing.

This is an overview of the difference between application integration and data integration, the differences in use and requirements for DI between business intelligence and OLTP, some integration architecture discussion, and why open source is an even better fit in the operational DI arena than it is for BI projects.

If you want to download a PDF of the slides or listen to a replay, you can find this talk under "How to Use the Right Tools for Operational Data Integration" on Talend's webcast page. There's no direct link to the presentation page so you have to click through.
More detailed description of the webcast
Data integration tools were once used solely in support of data warehousing, but that has been changing over the past few years. The fastest growing area today for data integration is outside the data warehouse, whether it's one-time data movement for A MySQL upgrade, application consolidation, or real-time data synchronization for master data management projects.

Data integration tools have proven to be faster, more flexible and more cost effective for operational data integration than the common practice of hand-coding or using application integration technologies. The developer focus of these technologies also makes them a prime target for open source commoditization.

During the presentation you will learn about the differences between analytical and operational data integration, technology patterns and options, and recommendations for how to begin using tools for operational data integration.

Key points:
  • How to map common project scenarios to integration architectures and tools
  • The technology and market changes that favor use of tools for operational data integration
  • The differing requirements for operational vs. analytic data integration
  • Advantages of open source for data integration tasks embed:

Posted March 23, 2009 5:00 AM
Permalink | No Comments |
I'm doing a research survey on open source data warehouse and BI adoption that takes about 5 minutes to fill out.There's an almost complete lack of data specific to the business intelligence and data warehouse market - all the open source studies I read are generic and at best they extrapolate what's happening based on the general IT market. I want to change that.

If you have evaluated open source tools in any area of the business intelligence stack - databases, ETL tools, reporting, visualization - please consider filling out the survey whether it passed your evaluation or not, so we can begin to understand where and how people are using open source. It's as important to understand what's wrong with open source tools as what's right.

The results of this research will be summarized in a keynote at the MySQL Conference on April 23 but we'll be extending it throughout this year.

This first survey we're running is about adoption so we can answer some basic questions:
What industries, departments or functional areas are using open source?
What countries are leading the adoption?
What software categories are being used: reporting, OLAP, ETL, data mining, databases?
Why are people choosing or deciding against open source in this segment of the market?

Thanks also to Infobright (open source columnar database) and JasperSoft (open source BI stack) who are kind enough to donate a TomTom One XL portable GPS for a prize drawing after the survey is done. If you complete the survey and provide your information, you'll be entered to win. We'll do the drawing and announce the name at the MySQL conference.

Whether your open source evaluation led you to think of it as the holy grail or the devil's chalice, please take 5 minutes to fill out the survey.

Posted March 20, 2009 10:18 AM
Permalink | No Comments |
A useful attribute of all open source tools is the ability to download and start evaluating the software immediately to see if it fits the requirements. There is no vendor involvement slowing the process of evaluation.

Bloor research published a report comparing costs of various data integration products but one of the more interesting items isn't about cost - it's the average time required for a company to evaluate various data integration products. Open source is the clear winner here.

weeks_to_eval_software.gif
Figure: Person-weeks required for evaluation. Source: Bloor Research

When working with proprietary software vendors, trials and proofs of concept require management involvement and multiple levels of approval. The legal department is often involved since there's usually a trial license agreement. The process is not under the developer's control, the schedule is governed by vendor terms and the process requires extra work.

Beyond the ease of evaluation, it's easier to get started with a project. With open source, time spent evaluating tools that might never be used can instead be spent on a proof of concept that is reusable in production.

If the proof of concept fails, the same time has been spent as it would with any other software. If the proof of concept succeeds, it can be moved directly into production without the required up-front commitments that traditional vendors need.

Speed to deliver is one of the open source advantages I described in more detail in the open source data integration paper I wrote for Talend. Also, if you'd like to read the Bloor report you can download a full copy from the Pervasive web site



Posted March 3, 2009 11:55 AM
Permalink | No Comments |