Blog: Mark Madsen Subscribe to this blog's RSS feed!

Mark Madsen

Open source is becoming a required option for consideration in many enterprise software evaluations, and business intelligence (BI) isn't exempt. This blog is the interactive part of my Open Source expert channel for the Business Intelligence Network where you can suggest and discuss news and events. The focus is on open source as it relates to analytics, business intelligence, data integration and data warehousing. If you would like to suggest an article or link, send an e-mail to me at open_source_links@ThirdNature.net.

About the author >

Mark, President of Third Nature, is a former CTO and CIO with experience working in both IT and vendors, including a stint at a company used as a Harvard Business School case study. Over the past decade, Mark has received awards for his work in data warehousing, business intelligence and data integration from the American Productivity & Quality Center, the Smithsonian Institute and TDWI. He is co-author of Clickstream Data Warehousing and lectures and writes about data integration, business intelligence and emerging technology.


Open source master data management got a boost on Monday when Talend announced that they acquired Xtentis MDM from Amalto. This product was geared towards creation of repository-style MDM applications, for example a product master data repository or a customer key cross-reference hub.

                     

Xtentis was a Java and XML-based product with an Eclipse UI, so it's a reasonably good technical fit with Talend's tools. While the product information links have been removed from their web site, you can still access the Xtentis product data sheet if you're interested in the functionality and user interface.

 

Talend's goal is to provide a generic MDM application that can be used for different subject areas. They will take over the application from Amalto and are already working on open-sourcing the base code with a planned product release date of January, 2010. It's not clear yet what the differences will be between the community edition and the subscription version. If their ETL tools are an indication, it will likely be in the areas of ease of use for multiple developers, manageability and more complete product line integration.

 

The development plan Talend described involves integration with their ETL and real-time integration tools. This is typically a weak point with MDM products on the market. Most MDM software, whether transaction-oriented or analytical, still requires the use of an ETL or real-time data integration product.

 

Talend claims this is, or will be, the first open source MDM product. That depends on how you define MDM, as the Sun Mural MDM project was announced in May of 2008. I lean toward Talend's claim of "first" because the Mural project was more of a data interchange and index system aimed at Java developers. Most IT people think of as master data management as something broader and deeper, with more functionality.

 

Mural is also unlikely to see much adoption. The project is still in a base state and the last official Mural announcement was over a year ago, showing how little has been going on internally. With Oracle owning multiple data integration and MDM products, it's hard to image that Mural will see any budget or staff dedicated to maintenance.

Posted October 1, 2009 7:30 AM
Permalink | No Comments |

I hear fairly often that consulting firms and systems integrators are more likely to use open source tools that IT because it allows them to be more competitive. They gain an edge by saving customers money on software licenses, or by having more customizable tools for projects, thus pricing themselves under competitors or providing a better fit with client needs. The other hope is that by freeing project budget from the software licenses, this could translate into more money spent on work with the consultants.

SI_use_OSS.gifWhile these points are all valid, the survey data on adoption seems to disprove the belief. An interesting pattern in the data is that consultants are generally less likely than IT professionals to use open source tools in this space (10% for consultants versus 36% for IT). The usage by respondent role is shown in the chart.

It is notable that 49% of the consultants and systems integrators are evaluating open source software today, signaling a possible shift. What this also says is that, far from leading the technology market, SIs and consultants seem to trail it, following the money rather than leading their customers in the market.

Even with the sudden rise in evaluation, consultants and SIs significantly trail IT departments. If you are in an IT organization that relies heavily on consultants for project work then using open source tools will require that you find qualified consultants ahead of time. Given these statistics, they are likely to be rarer than you expect.

 

We'd love to have your input on open source BI/DW software you're using and the challenges you faced. If you have 10 minutes, take our online survey. It will be open through September 22.


Posted September 16, 2009 4:30 AM
Permalink | 4 Comments |
The MySQL May conference keynote videos and presentations files are all posted so you can download the ones you're interested in now. Embedded below is the video and slide deck for my keynote on Thursday.

The gist of this presentation is that business intelligence and analytics are the #1 IT spending priority, BI technology is becoming a commodity, open source BI and DW tools are maturing, and the supporting stats about open source BI and DW adoption.


If you want to look at the slides at your own pace, they're embedded below:
The open source stats are from a survey on open source BI adoption I've been running for a couple months, sponsored by Infobright and Jaspersoft. You can see a recap of this keynote plus some more stats and short talks by the CEOs of Infobright and Jaspersoft in "The State of Open Source BI and Data Warehousing" webcast at the MySQL web site.

We'll have a paper discussing the results of the adoption survey available for download soon. Look for it some time next month.

Links (includes case studies from Monolith Software and Consorte Media):
Keynote video
Slides (PDF available via Slideshare)

Posted May 22, 2009 12:37 PM
Permalink | No Comments |
I thought it would be nice to share some data on database size from the open source business intelligence / data warehouse adoption survey we've been running. Database size is a popular topic so some real data on size might be helpful if you're planning a deployment.

The question we asked was "How much raw data (in gigabytes) is being stored or accessed?" The chart below shows the results (with some annotation).

db-size-graph.gif

The databases in use are not all open source. This is the size regardless of database type. The restriction is that people are using open source in some part of the data warehouse stack, so an open source BI tool accessing an Oracle database would be included. Even so, the bulk of the respondents are using open source databases like MySQL and Postgres.

The general pattern follows what we see in the commercial data warehouse market, with the bulk of installations (82%) less than a terabyte in size. We do see a lower overall size relative to the completely commercial market - the number there is roughly 65%.

The truth is that for many organization, size is not a critical factor relative to other concerns. At the same time, the query performance is still a challenge for most. The difficulty of getting good query performance is one of the major factors driving people to look at appliances, columnar databases and other data warehouse platforms.

In the open source market there are quite a few options, some of which I listed a while ago. Two notable companies in the MySQL market are Infobright, makers of a columnar storage engine, and Kickfire, a hardware-based MySQL-compatible appliance. Both are aiming at the largest part of the market with products that are aimed at the under 10 terabyte space and with significantly lower costs than one expects in the data warehouse platform market.

I'll be doing a live webcast to preview some of the other data from the survey on Wednesday, April 29 at 10:00 AM Pacific. Also speaking will be Miriam Tuerk, CEO of Infobright, and Brian Gentile, CEO of Jaspersoft. After our respective talks we'll be taking questions online.

Also, the survey is running through May, so you can still add your stats to the picture.


Posted April 28, 2009 6:03 PM
Permalink | No Comments |
I uploaded the slides from last's week's webcast on operational data integration and open source. They're embedded below for online viewing.

This is an overview of the difference between application integration and data integration, the differences in use and requirements for DI between business intelligence and OLTP, some integration architecture discussion, and why open source is an even better fit in the operational DI arena than it is for BI projects.

If you want to download a PDF of the slides or listen to a replay, you can find this talk under "How to Use the Right Tools for Operational Data Integration" on Talend's webcast page. There's no direct link to the presentation page so you have to click through.
More detailed description of the webcast
Data integration tools were once used solely in support of data warehousing, but that has been changing over the past few years. The fastest growing area today for data integration is outside the data warehouse, whether it's one-time data movement for A MySQL upgrade, application consolidation, or real-time data synchronization for master data management projects.

Data integration tools have proven to be faster, more flexible and more cost effective for operational data integration than the common practice of hand-coding or using application integration technologies. The developer focus of these technologies also makes them a prime target for open source commoditization.

During the presentation you will learn about the differences between analytical and operational data integration, technology patterns and options, and recommendations for how to begin using tools for operational data integration.

Key points:
  • How to map common project scenarios to integration architectures and tools
  • The technology and market changes that favor use of tools for operational data integration
  • The differing requirements for operational vs. analytic data integration
  • Advantages of open source for data integration tasks embed:

Posted March 23, 2009 5:00 AM
Permalink | No Comments |
PREV 1 2