The question we asked was "How much raw data (in gigabytes) is being stored or accessed?" The chart below shows the results (with some annotation).

The databases in use are not all open source. This is the size regardless of database type. The restriction is that people are using open source in some part of the data warehouse stack, so an open source BI tool accessing an Oracle database would be included. Even so, the bulk of the respondents are using open source databases like MySQL and Postgres.
The general pattern follows what we see in the commercial data warehouse market, with the bulk of installations (82%) less than a terabyte in size. We do see a lower overall size relative to the completely commercial market - the number there is roughly 65%.
The truth is that for many organization, size is not a critical factor relative to other concerns. At the same time, the query performance is still a challenge for most. The difficulty of getting good query performance is one of the major factors driving people to look at appliances, columnar databases and other data warehouse platforms.
In the open source market there are quite a few options, some of which I listed a while ago. Two notable companies in the MySQL market are Infobright, makers of a columnar storage engine, and Kickfire, a hardware-based MySQL-compatible appliance. Both are aiming at the largest part of the market with products that are aimed at the under 10 terabyte space and with significantly lower costs than one expects in the data warehouse platform market.
I'll be doing a live webcast to preview some of the other data from the survey on Wednesday, April 29 at 10:00 AM Pacific. Also speaking will be Miriam Tuerk, CEO of Infobright, and Brian Gentile, CEO of Jaspersoft. After our respective talks we'll be taking questions online.
Also, the survey is running through May, so you can still add your stats to the picture.
Posted April 28, 2009 6:03 PM
Permalink | No Comments |