I thought it would be nice to share some data on database size from the open source business intelligence / data warehouse adoption survey we've been running. Database size is a popular topic so some real data on size might be helpful if you're planning a deployment.
The question we asked was "How much raw data (in gigabytes) is being stored or accessed?" The chart below shows the results (with some annotation).

The databases in use are not all open source. This is the size regardless of database type. The restriction is that people are using open source in some part of the data warehouse stack, so an open source BI tool accessing an Oracle database would be included. Even so, the bulk of the respondents are using open source databases like MySQL and Postgres.
The general pattern follows what we see in the commercial data warehouse market, with the bulk of installations (82%) less than a terabyte in size. We do see a lower overall size relative to the completely commercial market - the number there is roughly 65%.
The truth is that for many organization, size is not a critical factor relative to other concerns. At the same time, the query performance is still a challenge for most. The difficulty of getting good query performance is one of the major factors driving people to look at appliances, columnar databases and other data warehouse platforms.
In the open source market there are quite a few options, some of which I listed a while ago. Two notable companies in the
MySQL market are
Infobright, makers of a columnar storage engine, and
Kickfire, a hardware-based MySQL-compatible appliance. Both are aiming at the largest part of the market with products that are aimed at the under 10 terabyte space and with significantly lower costs than one expects in the data warehouse platform market.
I'll be doing a live webcast to preview some of the other data from the survey on Wednesday, April 29 at 10:00 AM Pacific. Also speaking will be Miriam Tuerk, CEO of
Infobright, and Brian Gentile, CEO of
Jaspersoft. After our respective talks we'll be taking questions online.
Also, the
survey is running through May, so you can still add your stats to the picture.