Blog: Mark Madsen Subscribe to this blog's RSS feed!

Mark Madsen

Open source is becoming a required option for consideration in many enterprise software evaluations, and business intelligence (BI) isn't exempt. This blog is the interactive part of my Open Source expert channel for the Business Intelligence Network where you can suggest and discuss news and events. The focus is on open source as it relates to analytics, business intelligence, data integration and data warehousing. If you would like to suggest an article or link, send an e-mail to me at open_source_links@ThirdNature.net.

About the author >

Mark, President of Third Nature, is a former CTO and CIO with experience working in both IT and vendors, including a stint at a company used as a Harvard Business School case study. Over the past decade, Mark has received awards for his work in data warehousing, business intelligence and data integration from the American Productivity & Quality Center, the Smithsonian Institute and TDWI. He is co-author of Clickstream Data Warehousing and lectures and writes about data integration, business intelligence and emerging technology.


April 2009 Archives

I thought it would be nice to share some data on database size from the open source business intelligence / data warehouse adoption survey we've been running. Database size is a popular topic so some real data on size might be helpful if you're planning a deployment.

The question we asked was "How much raw data (in gigabytes) is being stored or accessed?" The chart below shows the results (with some annotation).

db-size-graph.gif

The databases in use are not all open source. This is the size regardless of database type. The restriction is that people are using open source in some part of the data warehouse stack, so an open source BI tool accessing an Oracle database would be included. Even so, the bulk of the respondents are using open source databases like MySQL and Postgres.

The general pattern follows what we see in the commercial data warehouse market, with the bulk of installations (82%) less than a terabyte in size. We do see a lower overall size relative to the completely commercial market - the number there is roughly 65%.

The truth is that for many organization, size is not a critical factor relative to other concerns. At the same time, the query performance is still a challenge for most. The difficulty of getting good query performance is one of the major factors driving people to look at appliances, columnar databases and other data warehouse platforms.

In the open source market there are quite a few options, some of which I listed a while ago. Two notable companies in the MySQL market are Infobright, makers of a columnar storage engine, and Kickfire, a hardware-based MySQL-compatible appliance. Both are aiming at the largest part of the market with products that are aimed at the under 10 terabyte space and with significantly lower costs than one expects in the data warehouse platform market.

I'll be doing a live webcast to preview some of the other data from the survey on Wednesday, April 29 at 10:00 AM Pacific. Also speaking will be Miriam Tuerk, CEO of Infobright, and Brian Gentile, CEO of Jaspersoft. After our respective talks we'll be taking questions online.

Also, the survey is running through May, so you can still add your stats to the picture.


Posted April 28, 2009 6:03 PM
Permalink | No Comments |