We use cookies and other similar technologies (Cookies) to enhance your experience and to provide you with relevant content and ads. By using our website, you are agreeing to the use of Cookies. You can change your settings at any time. Cookie Policy.

Blog: Mark Madsen Subscribe to this blog's RSS feed!

Mark Madsen

Open source is becoming a required option for consideration in many enterprise software evaluations, and business intelligence (BI) isn't exempt. This blog is the interactive part of my Open Source expert channel for the Business Intelligence Network where you can suggest and discuss news and events. The focus is on open source as it relates to analytics, business intelligence, data integration and data warehousing. If you would like to suggest an article or link, send an e-mail to me at open_source_links@ThirdNature.net.

About the author >

Mark, President of Third Nature, is a former CTO and CIO with experience working in both IT and vendors, including a stint at a company used as a Harvard Business School case study. Over the past decade, Mark has received awards for his work in data warehousing, business intelligence and data integration from the American Productivity & Quality Center, the Smithsonian Institute and TDWI. He is co-author of Clickstream Data Warehousing and lectures and writes about data integration, business intelligence and emerging technology.

I'm doing a research survey on open source data warehouse and BI adoption that takes about 5 minutes to fill out.There's an almost complete lack of data specific to the business intelligence and data warehouse market - all the open source studies I read are generic and at best they extrapolate what's happening based on the general IT market. I want to change that.

If you have evaluated open source tools in any area of the business intelligence stack - databases, ETL tools, reporting, visualization - please consider filling out the survey whether it passed your evaluation or not, so we can begin to understand where and how people are using open source. It's as important to understand what's wrong with open source tools as what's right.

The results of this research will be summarized in a keynote at the MySQL Conference on April 23 but we'll be extending it throughout this year.

This first survey we're running is about adoption so we can answer some basic questions:
What industries, departments or functional areas are using open source?
What countries are leading the adoption?
What software categories are being used: reporting, OLAP, ETL, data mining, databases?
Why are people choosing or deciding against open source in this segment of the market?

Thanks also to Infobright (open source columnar database) and JasperSoft (open source BI stack) who are kind enough to donate a TomTom One XL portable GPS for a prize drawing after the survey is done. If you complete the survey and provide your information, you'll be entered to win. We'll do the drawing and announce the name at the MySQL conference.

Whether your open source evaluation led you to think of it as the holy grail or the devil's chalice, please take 5 minutes to fill out the survey.

Posted March 20, 2009 10:18 AM
Permalink | No Comments |
A useful attribute of all open source tools is the ability to download and start evaluating the software immediately to see if it fits the requirements. There is no vendor involvement slowing the process of evaluation.

Bloor research published a report comparing costs of various data integration products but one of the more interesting items isn't about cost - it's the average time required for a company to evaluate various data integration products. Open source is the clear winner here.

Figure: Person-weeks required for evaluation. Source: Bloor Research

When working with proprietary software vendors, trials and proofs of concept require management involvement and multiple levels of approval. The legal department is often involved since there's usually a trial license agreement. The process is not under the developer's control, the schedule is governed by vendor terms and the process requires extra work.

Beyond the ease of evaluation, it's easier to get started with a project. With open source, time spent evaluating tools that might never be used can instead be spent on a proof of concept that is reusable in production.

If the proof of concept fails, the same time has been spent as it would with any other software. If the proof of concept succeeds, it can be moved directly into production without the required up-front commitments that traditional vendors need.

Speed to deliver is one of the open source advantages I described in more detail in the open source data integration paper I wrote for Talend. Also, if you'd like to read the Bloor report you can download a full copy from the Pervasive web site

Posted March 3, 2009 11:55 AM
Permalink | No Comments |
Below are the slides from the presentation I gave yesterday on open source BI adoption. The talk is a brief overview of the rationale and benefits, some of the situations appropriate for use, and a few thoughts on internal barriers to use.

This is part of a webcast done jointly with Actuate on the Business Intelligence Network. You can listen to the archived presentation as well as seeing Actuate's presentation on BIRT by going to the webcast registration page.

Posted February 20, 2009 4:20 AM
Permalink | No Comments |

Open source database adoption for BI and data warehousing appears to lag the open source BI and ETL tools. There are lots of reasons for this documented elsewhere, but one reason becoming less valid is performance.

An IDC survey of data warehouse size reported that ~60% of data warehouses are less than a terabyte in size. Several other surveys over the past few years reported similar findings. This tells us that the industry focus on scale-out options is overkill for the majority of people deploying data warehouses. What's needed is cost-effective performance at a scale of less than a terabyte. There are interesting vendors of both close and open source databases and appliances that work well in this size range.

Gartner recently gave some recommendations on open source databases and data warehousing that I think are inappropriate. They suggest MySQL as the only viable option. Part of their rationale is sound: commercial support and company viability. Most of the open source databases are smaller vendors or the projects are community supported rather than commercial, making them less suitable for enterpriuse use.

Where Gartner goes wrong is that  MySQL isn't as good for BI workloads. It's easy to find information on basic MySQL performance, but not for data warehouse workloads. Maybe that's why Gartner overlooked this MySQL performance test. MySQL couldn't complete the 100GB scale tests, and part of the reason is obvious: missing features for large-scale queries.

These are some of the reasons companies have stepped in to offer new storage engines and appliances that are MySQL compatible. Infobright is delivering a MySQL-compatiable BI-focused product - it's hard to get proper scaling and performance with standard MySQL as a data warehouse database. Kickfire offers a different option for performance in an appliance package. Postgres and Ingres offer better features for both querying and managability with data warehouse workloads. EnterpriseDB delivers commercial support for Postgres as well as providing a scale-out option, removing another of the Gartner criticisms.

Jos van Dongen did a small scale TPC-H benchmark with a group of open source databases and one of the major vendors (name withheld since they don't allow third party publication of bechmarks). What's most interesting is how well the (relatively new) MySQL 5.1 release performed. Even more amazing is how well MonetDB and LucidDB performed relative to the others. Maybe it shouldn't be a surprise since we're talking about columnar engines, query workloads and a small scale test. He's got a nice chart showing the BI-related features in these open source databases.

When you grow a dataset to the 10GB and 100GB scales (which Jos is doing), the results will sureley change. The maturity of a database is really seen when you have to do three things: manage larger volumes of data, optimize complex queries on that volume, and deal with concurrrent users querying this data. I suspect there will be a reshuffling of his benchmark results at larger sizes.

Other interesting performance information is the benchmark Josh Berkus wrote about in his blog post on a Postgres benchmark run at Sun, where he notes that Postgres is almost as fast as Oracle on equivalent hardware, at significantly lower cost. (I know this old info for those of you who follow Postgres more closely) While not a DW-specific benchmark, it does demonstrate equivalent performance levels - the key point. A similar benchmark was done with MySQL, DB2, Oracle and Microsoft a few years ago and showed similar results.

Posted February 8, 2009 12:19 PM
Permalink | 1 Comment |

Actuate is sponsoring a webcast about open source BI on February 19 at 11:00 Pacific / 2:00 Eastern. My portion of the presentation will cover topics like who (in general) is using open source BI, reasons for doing so, and some of the challenges faced. I'm not completely done yet but the goal is to help people who want to use open source BI better frame the discussion as they look to sell the idea.

For example, several times I've had to get involved with legal departments when buying support or pro versions. Corporate review of contracts is relatively straightforward, but if the legal department has never seen an open source license they can require some help. They're used to seeing standard clauses, and many times these are not present with open source since they don't apply. To a contract lawyer that can be a red flag, so you need to be prepared to help educate them about what is and isn't present in an open source license and explain what's going on. Otherwise you run the risk of having them reject the contract.

Once I'm done, Actuate is going to talk about embedding BI into applications with BIRT and then adding scalability and availability features. I previewed their talk and it will be interesting and much more technical than mine. Worth sticking around for, in other words.

Posted February 6, 2009 5:00 AM
Permalink | No Comments |

1 2 NEXT