Blog: Mark Madsen http://www.b-eye-network.com/blogs/madsen/ Open source is becoming a required option for consideration in many enterprise software evaluations, and business intelligence (BI) isn't exempt. This blog is the interactive part of my Open Source expert channel for the Business Intelligence Network where you can suggest and discuss news and events. The focus is on open source as it relates to analytics, business intelligence, data integration and data warehousing. If you would like to suggest an article or link, send an e-mail to me at open_source_links@ThirdNature.net. Copyright 2013 Thu, 01 Oct 2009 07:30:24 -0700 http://www.movabletype.org/?v=4.261 http://blogs.law.harvard.edu/tech/rss Open Source MDM To Get a Boost From Talend Open source master data management got a boost on Monday when Talend announced that they acquired Xtentis MDM from Amalto. This product was geared towards creation of repository-style MDM applications, for example a product master data repository or a customer key cross-reference hub.

                     

Xtentis was a Java and XML-based product with an Eclipse UI, so it's a reasonably good technical fit with Talend's tools. While the product information links have been removed from their web site, you can still access the Xtentis product data sheet if you're interested in the functionality and user interface.

 

Talend's goal is to provide a generic MDM application that can be used for different subject areas. They will take over the application from Amalto and are already working on open-sourcing the base code with a planned product release date of January, 2010. It's not clear yet what the differences will be between the community edition and the subscription version. If their ETL tools are an indication, it will likely be in the areas of ease of use for multiple developers, manageability and more complete product line integration.

 

The development plan Talend described involves integration with their ETL and real-time integration tools. This is typically a weak point with MDM products on the market. Most MDM software, whether transaction-oriented or analytical, still requires the use of an ETL or real-time data integration product.

 

Talend claims this is, or will be, the first open source MDM product. That depends on how you define MDM, as the Sun Mural MDM project was announced in May of 2008. I lean toward Talend's claim of "first" because the Mural project was more of a data interchange and index system aimed at Java developers. Most IT people think of as master data management as something broader and deeper, with more functionality.

 

Mural is also unlikely to see much adoption. The project is still in a base state and the last official Mural announcement was over a year ago, showing how little has been going on internally. With Oracle owning multiple data integration and MDM products, it's hard to image that Mural will see any budget or staff dedicated to maintenance.]]>
http://www.b-eye-network.com/blogs/madsen/archives/2009/10/open_source_mdm.php http://www.b-eye-network.com/blogs/madsen/archives/2009/10/open_source_mdm.php Thu, 01 Oct 2009 07:30:24 -0700
A Myth About Consultant Use of Open Source Dispelled I hear fairly often that consulting firms and systems integrators are more likely to use open source tools that IT because it allows them to be more competitive. They gain an edge by saving customers money on software licenses, or by having more customizable tools for projects, thus pricing themselves under competitors or providing a better fit with client needs. The other hope is that by freeing project budget from the software licenses, this could translate into more money spent on work with the consultants.

SI_use_OSS.gifWhile these points are all valid, the survey data on adoption seems to disprove the belief. An interesting pattern in the data is that consultants are generally less likely than IT professionals to use open source tools in this space (10% for consultants versus 36% for IT). The usage by respondent role is shown in the chart.

It is notable that 49% of the consultants and systems integrators are evaluating open source software today, signaling a possible shift. What this also says is that, far from leading the technology market, SIs and consultants seem to trail it, following the money rather than leading their customers in the market.

Even with the sudden rise in evaluation, consultants and SIs significantly trail IT departments. If you are in an IT organization that relies heavily on consultants for project work then using open source tools will require that you find qualified consultants ahead of time. Given these statistics, they are likely to be rarer than you expect.

 

We'd love to have your input on open source BI/DW software you're using and the challenges you faced. If you have 10 minutes, take our online survey. It will be open through September 22.

]]>
http://www.b-eye-network.com/blogs/madsen/archives/2009/09/a_myth_about_co.php http://www.b-eye-network.com/blogs/madsen/archives/2009/09/a_myth_about_co.php Wed, 16 Sep 2009 04:30:00 -0700
Open Source BI in the Real World - MySQL Keynote Slides and Video keynote videos and presentations files are all posted so you can download the ones you're interested in now. Embedded below is the video and slide deck for my keynote on Thursday.

The gist of this presentation is that business intelligence and analytics are the #1 IT spending priority, BI technology is becoming a commodity, open source BI and DW tools are maturing, and the supporting stats about open source BI and DW adoption.


If you want to look at the slides at your own pace, they're embedded below:
The open source stats are from a survey on open source BI adoption I've been running for a couple months, sponsored by Infobright and Jaspersoft. You can see a recap of this keynote plus some more stats and short talks by the CEOs of Infobright and Jaspersoft in "The State of Open Source BI and Data Warehousing" webcast at the MySQL web site.

We'll have a paper discussing the results of the adoption survey available for download soon. Look for it some time next month.

Links (includes case studies from Monolith Software and Consorte Media):
Keynote video
Slides (PDF available via Slideshare)
]]>
http://www.b-eye-network.com/blogs/madsen/archives/2009/05/open_source_bi_1.php http://www.b-eye-network.com/blogs/madsen/archives/2009/05/open_source_bi_1.php Fri, 22 May 2009 12:37:40 -0700
Size of Data Warehouses: Peek at Open Source Survey Results
The question we asked was "How much raw data (in gigabytes) is being stored or accessed?" The chart below shows the results (with some annotation).

db-size-graph.gif

The databases in use are not all open source. This is the size regardless of database type. The restriction is that people are using open source in some part of the data warehouse stack, so an open source BI tool accessing an Oracle database would be included. Even so, the bulk of the respondents are using open source databases like MySQL and Postgres.

The general pattern follows what we see in the commercial data warehouse market, with the bulk of installations (82%) less than a terabyte in size. We do see a lower overall size relative to the completely commercial market - the number there is roughly 65%.

The truth is that for many organization, size is not a critical factor relative to other concerns. At the same time, the query performance is still a challenge for most. The difficulty of getting good query performance is one of the major factors driving people to look at appliances, columnar databases and other data warehouse platforms.

In the open source market there are quite a few options, some of which I listed a while ago. Two notable companies in the MySQL market are Infobright, makers of a columnar storage engine, and Kickfire, a hardware-based MySQL-compatible appliance. Both are aiming at the largest part of the market with products that are aimed at the under 10 terabyte space and with significantly lower costs than one expects in the data warehouse platform market.

I'll be doing a live webcast to preview some of the other data from the survey on Wednesday, April 29 at 10:00 AM Pacific. Also speaking will be Miriam Tuerk, CEO of Infobright, and Brian Gentile, CEO of Jaspersoft. After our respective talks we'll be taking questions online.

Also, the survey is running through May, so you can still add your stats to the picture.

]]>
http://www.b-eye-network.com/blogs/madsen/archives/2009/04/size_of_data_wa.php http://www.b-eye-network.com/blogs/madsen/archives/2009/04/size_of_data_wa.php Tue, 28 Apr 2009 18:03:04 -0700
Operational Data Integration and Open Source Slides
This is an overview of the difference between application integration and data integration, the differences in use and requirements for DI between business intelligence and OLTP, some integration architecture discussion, and why open source is an even better fit in the operational DI arena than it is for BI projects.

If you want to download a PDF of the slides or listen to a replay, you can find this talk under "How to Use the Right Tools for Operational Data Integration" on Talend's webcast page. There's no direct link to the presentation page so you have to click through.
More detailed description of the webcast
Data integration tools were once used solely in support of data warehousing, but that has been changing over the past few years. The fastest growing area today for data integration is outside the data warehouse, whether it's one-time data movement for A MySQL upgrade, application consolidation, or real-time data synchronization for master data management projects.

Data integration tools have proven to be faster, more flexible and more cost effective for operational data integration than the common practice of hand-coding or using application integration technologies. The developer focus of these technologies also makes them a prime target for open source commoditization.

During the presentation you will learn about the differences between analytical and operational data integration, technology patterns and options, and recommendations for how to begin using tools for operational data integration.

Key points:
  • How to map common project scenarios to integration architectures and tools
  • The technology and market changes that favor use of tools for operational data integration
  • The differing requirements for operational vs. analytic data integration
  • Advantages of open source for data integration tasks embed:
]]>
http://www.b-eye-network.com/blogs/madsen/archives/2009/03/operational_dat.php http://www.b-eye-network.com/blogs/madsen/archives/2009/03/operational_dat.php Mon, 23 Mar 2009 05:00:00 -0700
Fill Out the Open Source Adoption Survey, Enter to Win a TomTom GPS research survey on open source data warehouse and BI adoption that takes about 5 minutes to fill out.There's an almost complete lack of data specific to the business intelligence and data warehouse market - all the open source studies I read are generic and at best they extrapolate what's happening based on the general IT market. I want to change that.

If you have evaluated open source tools in any area of the business intelligence stack - databases, ETL tools, reporting, visualization - please consider filling out the survey whether it passed your evaluation or not, so we can begin to understand where and how people are using open source. It's as important to understand what's wrong with open source tools as what's right.

The results of this research will be summarized in a keynote at the MySQL Conference on April 23 but we'll be extending it throughout this year.

This first survey we're running is about adoption so we can answer some basic questions:
What industries, departments or functional areas are using open source?
What countries are leading the adoption?
What software categories are being used: reporting, OLAP, ETL, data mining, databases?
Why are people choosing or deciding against open source in this segment of the market?

Thanks also to Infobright (open source columnar database) and JasperSoft (open source BI stack) who are kind enough to donate a TomTom One XL portable GPS for a prize drawing after the survey is done. If you complete the survey and provide your information, you'll be entered to win. We'll do the drawing and announce the name at the MySQL conference.

Whether your open source evaluation led you to think of it as the holy grail or the devil's chalice, please take 5 minutes to fill out the survey.
]]>
http://www.b-eye-network.com/blogs/madsen/archives/2009/03/fill_out_the_op.php http://www.b-eye-network.com/blogs/madsen/archives/2009/03/fill_out_the_op.php Fri, 20 Mar 2009 10:18:31 -0700
Open Source Speeds Delivery of Data Integration
Bloor research published a report comparing costs of various data integration products but one of the more interesting items isn't about cost - it's the average time required for a company to evaluate various data integration products. Open source is the clear winner here.

weeks_to_eval_software.gif
Figure: Person-weeks required for evaluation. Source: Bloor Research

When working with proprietary software vendors, trials and proofs of concept require management involvement and multiple levels of approval. The legal department is often involved since there's usually a trial license agreement. The process is not under the developer's control, the schedule is governed by vendor terms and the process requires extra work.

Beyond the ease of evaluation, it's easier to get started with a project. With open source, time spent evaluating tools that might never be used can instead be spent on a proof of concept that is reusable in production.

If the proof of concept fails, the same time has been spent as it would with any other software. If the proof of concept succeeds, it can be moved directly into production without the required up-front commitments that traditional vendors need.

Speed to deliver is one of the open source advantages I described in more detail in the open source data integration paper I wrote for Talend. Also, if you'd like to read the Bloor report you can download a full copy from the Pervasive web site


]]>
http://www.b-eye-network.com/blogs/madsen/archives/2009/03/open_source_spe.php http://www.b-eye-network.com/blogs/madsen/archives/2009/03/open_source_spe.php Tue, 03 Mar 2009 11:55:11 -0700
Open Source Business Intelligence Adoption and Use
This is part of a webcast done jointly with Actuate on the Business Intelligence Network. You can listen to the archived presentation as well as seeing Actuate's presentation on BIRT by going to the webcast registration page.
]]>
http://www.b-eye-network.com/blogs/madsen/archives/2009/02/open_source_bus.php http://www.b-eye-network.com/blogs/madsen/archives/2009/02/open_source_bus.php Fri, 20 Feb 2009 04:20:26 -0700
Open Source Database Performance for BI and DW Open source database adoption for BI and data warehousing appears to lag the open source BI and ETL tools. There are lots of reasons for this documented elsewhere, but one reason becoming less valid is performance.

An IDC survey of data warehouse size reported that ~60% of data warehouses are less than a terabyte in size. Several other surveys over the past few years reported similar findings. This tells us that the industry focus on scale-out options is overkill for the majority of people deploying data warehouses. What's needed is cost-effective performance at a scale of less than a terabyte. There are interesting vendors of both close and open source databases and appliances that work well in this size range.

Gartner recently gave some recommendations on open source databases and data warehousing that I think are inappropriate. They suggest MySQL as the only viable option. Part of their rationale is sound: commercial support and company viability. Most of the open source databases are smaller vendors or the projects are community supported rather than commercial, making them less suitable for enterpriuse use.

Where Gartner goes wrong is that  MySQL isn't as good for BI workloads. It's easy to find information on basic MySQL performance, but not for data warehouse workloads. Maybe that's why Gartner overlooked this MySQL performance test. MySQL couldn't complete the 100GB scale tests, and part of the reason is obvious: missing features for large-scale queries.

These are some of the reasons companies have stepped in to offer new storage engines and appliances that are MySQL compatible. Infobright is delivering a MySQL-compatiable BI-focused product - it's hard to get proper scaling and performance with standard MySQL as a data warehouse database. Kickfire offers a different option for performance in an appliance package. Postgres and Ingres offer better features for both querying and managability with data warehouse workloads. EnterpriseDB delivers commercial support for Postgres as well as providing a scale-out option, removing another of the Gartner criticisms.

Jos van Dongen did a small scale TPC-H benchmark with a group of open source databases and one of the major vendors (name withheld since they don't allow third party publication of bechmarks). What's most interesting is how well the (relatively new) MySQL 5.1 release performed. Even more amazing is how well MonetDB and LucidDB performed relative to the others. Maybe it shouldn't be a surprise since we're talking about columnar engines, query workloads and a small scale test. He's got a nice chart showing the BI-related features in these open source databases.

When you grow a dataset to the 10GB and 100GB scales (which Jos is doing), the results will sureley change. The maturity of a database is really seen when you have to do three things: manage larger volumes of data, optimize complex queries on that volume, and deal with concurrrent users querying this data. I suspect there will be a reshuffling of his benchmark results at larger sizes.

Other interesting performance information is the benchmark Josh Berkus wrote about in his blog post on a Postgres benchmark run at Sun, where he notes that Postgres is almost as fast as Oracle on equivalent hardware, at significantly lower cost. (I know this old info for those of you who follow Postgres more closely) While not a DW-specific benchmark, it does demonstrate equivalent performance levels - the key point. A similar benchmark was done with MySQL, DB2, Oracle and Microsoft a few years ago and showed similar results.

]]>
http://www.b-eye-network.com/blogs/madsen/archives/2009/02/open_source_dat_1.php http://www.b-eye-network.com/blogs/madsen/archives/2009/02/open_source_dat_1.php Sun, 08 Feb 2009 12:19:31 -0700
Leveraging Open Source BI Webcast on February 19 Actuate is sponsoring a webcast about open source BI on February 19 at 11:00 Pacific / 2:00 Eastern. My portion of the presentation will cover topics like who (in general) is using open source BI, reasons for doing so, and some of the challenges faced. I'm not completely done yet but the goal is to help people who want to use open source BI better frame the discussion as they look to sell the idea.

For example, several times I've had to get involved with legal departments when buying support or pro versions. Corporate review of contracts is relatively straightforward, but if the legal department has never seen an open source license they can require some help. They're used to seeing standard clauses, and many times these are not present with open source since they don't apply. To a contract lawyer that can be a red flag, so you need to be prepared to help educate them about what is and isn't present in an open source license and explain what's going on. Otherwise you run the risk of having them reject the contract.

Once I'm done, Actuate is going to talk about embedding BI into applications with BIRT and then adding scalability and availability features. I previewed their talk and it will be interesting and much more technical than mine. Worth sticking around for, in other words.

]]>
http://www.b-eye-network.com/blogs/madsen/archives/2009/02/leveraging_open.php http://www.b-eye-network.com/blogs/madsen/archives/2009/02/leveraging_open.php Fri, 06 Feb 2009 05:00:00 -0700
Open Source and Low-Cost ETL at the Las Vegas TDWI Event The Data Warehouse Institute conference in Las Vegas is just a few weeks away. While there aren't any presentations directly on open source, the Evaluating ETL Tools and Technology course I'm teaching on Tuesday will have two open source vendors included in the "vendors in action" session.

We're running a "low-cost ETL" theme for the afternoon. The vendors demonstrating their data integration tools will be MicrosoftPentaho and Talend. This is a rare chance to see all three doing their work side-by-side.

The exhibit hall will have several open source or open source-derived vendors as well. Be sure to check them out if you're going to the event.

]]>
http://www.b-eye-network.com/blogs/madsen/archives/2009/02/open_source_and_1.php http://www.b-eye-network.com/blogs/madsen/archives/2009/02/open_source_and_1.php Thu, 05 Feb 2009 11:52:13 -0700
October Rules Fest Almost Over It's too bad I missed the October Rules Fest, as it looks terrific and I'm evaluating open source rules engines. From their description:
October Rules Fest is a three day gathering of the best and brightest in the rules engine industry, October 22nd-24th, 2008. This conference on business rules technology features the inventors and scientists behind advanced rulebased technology and leading business rule management systems.

We will be bringing together, for the first time the founders and inventors of rules technologies and methodologies.

They mean what they say, too. Presenters cover the range from academic researchers to folks from Ilog and Fair Isaac to the creators of some of the key algorithms and code behind today's rules engines. Luckily for those of us not there, many of the presentations are posted at the conference site.

]]>
http://www.b-eye-network.com/blogs/madsen/archives/2008/10/october_rules_f.php http://www.b-eye-network.com/blogs/madsen/archives/2008/10/october_rules_f.php Thu, 23 Oct 2008 18:59:34 -0700
Slides From Webcast on BI Adoption I did a webcast on open source for TDWI and Pentaho last week (reg required). The topic was open source BI adoption, which I believe has gathered speed this year. If you go through the registration link you can download a PDF of the slides from the webcast page.

If you're already familiar with open source BI, there were probably not a lot of surprises in my part of the presentation. The general outline was a little about open source and the market to answer the "why now?" question, followed by some information about categories of organizations adopting it and why they are adopting, and a few notes about challenges people run into when adopting. It was only 35 minutes so there's not as much depth as I would have liked.

I think one fundamental point should be made about open source: it's just software. When people state that they are resistant to open source, what they really mean is that they are resistant to the unfamiliar. A product should be evaluated on the combination of its ability to meet requirements and it's cost relative to other options. It's that simple.

Overcoming someone's resistance to open source in your organization means that you probably need to educate them, given that they use open source every day without thinking about it. It's in everything from cars to cell phones, as well as almost all the commercial BI tools shipping today. More likely, they are resistant because they (a) are threatened in some way by the change you propose, (b) face organizational obstacles like educating the legal department about licenses or (c) face political consequences you aren't aware of. It's often their personal situation that is the biggest factor, given that most objections are easily refuted as myths.

]]>
http://www.b-eye-network.com/blogs/madsen/archives/2008/10/slides_from_web.php http://www.b-eye-network.com/blogs/madsen/archives/2008/10/slides_from_web.php Thu, 23 Oct 2008 12:00:00 -0700
Cloudera Provides Commercial Support for Hadoop I spoke yesterday with Mike Olson and Amr Awadallah about their new startup, and the appliance and BI markets. Mike is the former CEO of Sleepycat, and Amr was until recently a VP of engineering at Yahoo focused on BI for search. They’re joined by Christophe Bisciglia from Google and Jeff Hammerbacher, previously manager of Facebook's data team where Hive was developed.

They and several other founders created Cloudera to provide commercial support for Hadoop, an open source implementation of map-reduce (used for programatically processing large volumes of data on a compute cluster).

They said there are enough instances of companies using Hadoop in a commercial context that they believe there’s a market for commercial support on both internal installations and on Amazon’s Elastic Compute Cloud (EC2). Hadoop is still complex enough and the skills for deployment are uncommon enough that many companies need help. Cloudera is setting it’s sights on larger problems than just Hadoop support, though Amr and Mike were not yet ready to talk details.

There are plenty of analytical problems that are difficult to do in SQL. MPP database vendors are trying different approaches like marrying map-reduce to the database (Greenplum), building analytical functions into the database (Teradata and SAS). Their approaches may not work as well as Hadoop though, because the processing is still constrained by SQL and the data still has to be managed from within the database.

Cloudera has an impressive starting lineup. It will be interesting to see where they take the business.

]]>
http://www.b-eye-network.com/blogs/madsen/archives/2008/10/cloudera_provid.php http://www.b-eye-network.com/blogs/madsen/archives/2008/10/cloudera_provid.php Wed, 22 Oct 2008 19:28:37 -0700
Open Source Principles for OEMs Webcast With Actuate/BIRT Next week I'll be doing a webcast with Paul Clenahan on the topic of how software developers and OEMs can leverage open source principles (not just open source). Paul is from Actuate (the company behind BIRT development and the BIRT Exchange).

It's not often that people talk about the community side of commercial software development. The principles behind open source, like open and modular architecture or user/developer extensibility are often ignored by commercial software firms. Some are starting to embrace the practices that have made open source successful without necessarily opening their source code. This is part of an ongoing shift in the software industry as it struggles to cope with commoditization, increased compeitiveness, and (in some sectors) the pressure from open source projects.

There is no single open source business model. There are many, and they range from relatively open to relatively closed business models. Traditional software suppliers are adjusting to the new market realities, so I expect to see more and more blending of practices. This is good and bad for open source providers, because they will lose some of the differentiation they've had in the market. It's also good and bad for the traditional vendors.

I'm eager to hear Paul's take on this as it relates to embedded reporting and operational BI, the areas BIRT is most focused on. The description of the webcast is listed below.

Using Open Source Principles to Supplement OEM Applications

Online communities have proved to be a very important and necessary part of today’s Open Source and Web 2.0 world. The concept of accessing resources, services, support and products through online sites is especially beneficial to OEMs. Through the use of an online community, OEMs can retain customers, increase adoption of their product, enable users to recommend new ideas, help developers with product support and gain awareness in the marketplace for their offerings.

]]>
http://www.b-eye-network.com/blogs/madsen/archives/2008/08/open_source_pri.php http://www.b-eye-network.com/blogs/madsen/archives/2008/08/open_source_pri.php Fri, 22 Aug 2008 17:39:52 -0700