Blog: Mark Madsen Subscribe to this blog's RSS feed!

Mark Madsen

Open source is becoming a required option for consideration in many enterprise software evaluations, and business intelligence (BI) isn't exempt. This blog is the interactive part of my Open Source expert channel for the Business Intelligence Network where you can suggest and discuss news and events. The focus is on open source as it relates to analytics, business intelligence, data integration and data warehousing. If you would like to suggest an article or link, send an e-mail to me at open_source_links@ThirdNature.net.

About the author >

Mark, President of Third Nature, is a former CTO and CIO with experience working in both IT and vendors, including a stint at a company used as a Harvard Business School case study. Over the past decade, Mark has received awards for his work in data warehousing, business intelligence and data integration from the American Productivity & Quality Center, the Smithsonian Institute and TDWI. He is co-author of Clickstream Data Warehousing and lectures and writes about data integration, business intelligence and emerging technology.

 

August 2008 Archives

Next week I'll be doing a webcast with Paul Clenahan on the topic of how software developers and OEMs can leverage open source principles (not just open source). Paul is from Actuate (the company behind BIRT development and the BIRT Exchange).

It's not often that people talk about the community side of commercial software development. The principles behind open source, like open and modular architecture or user/developer extensibility are often ignored by commercial software firms. Some are starting to embrace the practices that have made open source successful without necessarily opening their source code. This is part of an ongoing shift in the software industry as it struggles to cope with commoditization, increased compeitiveness, and (in some sectors) the pressure from open source projects.

There is no single open source business model. There are many, and they range from relatively open to relatively closed business models. Traditional software suppliers are adjusting to the new market realities, so I expect to see more and more blending of practices. This is good and bad for open source providers, because they will lose some of the differentiation they've had in the market. It's also good and bad for the traditional vendors.

I'm eager to hear Paul's take on this as it relates to embedded reporting and operational BI, the areas BIRT is most focused on. The description of the webcast is listed below.

Using Open Source Principles to Supplement OEM Applications

Online communities have proved to be a very important and necessary part of today’s Open Source and Web 2.0 world. The concept of accessing resources, services, support and products through online sites is especially beneficial to OEMs. Through the use of an online community, OEMs can retain customers, increase adoption of their product, enable users to recommend new ideas, help developers with product support and gain awareness in the marketplace for their offerings.


Posted August 22, 2008 5:39 PM
Permalink | No Comments |

Talend announced an open source data quality offering this week at the TDWI conference in San Diego. The company is rapidly to filling out the basic components needed in a complete data integration suite. In June they delivered added changed data capture (CDC) features to Open Studio, their ETL tool. They also added Talend Open Profiler for data profiling. While Talend doesn’t offer a complete suite yet, these new offerings are a big expansion of functionality in short time. The ETL and data profiling tools are available today, but Data Quality won’t be ready for download until September.

Talend Open Profiler offers many of the features you would expect, similar to what you find in the tools from Oracle and Microsoft that ship with their databases.

The data quality product will offer basic functionality for data de-duplication, standard formatting requirements (as with phone numbers and addresses) and address validation. I spoke with Yves de Montcheuil, Talend’s VP of marketing before the announcement and he indicated that they are still working on partnerships to provide more advanced features via external data cleansing products and data providers. Expect some partnerships to be announced in the next few months as they work out the details.

Since Open Studio / Integration Suite can make web service calls, you can also use third party services if you don’t mind making a few web service calls. StrikeIron offers a number of commercial data cleansing services, as well as reference data services.

Talend Data Quality will follow the same licensing model as Talend Integration Suite, with an open edition and a commercial edition provided as a subscription. No word yet on what the feature differences are between the two. Personally, I don’t like these feature holdback models. I understand the rationale, but I still believe that it can lead to conflicts with contributors and generates the perception that a product is crippleware.

While this isn’t the first open source data profiling or data quality project available, it’s the first that is integrated into a single suite and, more important to many IT shops, commercially supported. It’s also arguably the most functional. There are a handful of other open source projects in this area, so if you’re not concerned about commercial support, it can’t hurt to explore the following:
Open Source Data Quality
DataCleaner
Mural standardization and match engines

I’ll do a more detailed look at Mural another time. It’s relatively new and focuses on all of the technical capabilities underlying MDM. My reading of the documentation indicates that there’s a lot of interesting stuff here, but there may be some pretty big problems as well.

A few people have asked me about InfoSolve over the past year. They offer OpenDQ, which, counter to all claims to the contrary, is not open source. I call this “fauxpen source”. The source isn’t available unless you do a project with them and is unsupported. In essence, you are buying a source code license as part of a project - where you get to support the product you purchase. If that’s the case, you may as well go buy a regular commercial product since that offers more value for the price. If the capital cost is high, then look to database offerings or open source, where you do have support and you also have a community of users and developers.


Posted August 20, 2008 9:00 AM
Permalink | No Comments |

A year and a half ago I was playing around with data visualization toolkits. What I found is that they fell into two buckets: those good for making graphs, and those good for art projects. The ones being used by design and art students were much more interesting and seem to have more possibilities for those of us delivering data visualization rather than fixed graphs.

The problem is that the toolkits / libraries are generally not as usable in a commercial setting. The most interesting one I found is Processing and I'm happy to see that there are now books out on it. I had trouble digging through the original documentation and doing anything interesting. The thing I often find is that the most interesting work is being done well outside the BI market. Here's a nice example of a Processing visualization from Robert Hodgin

Magnetic Ink, Process video from flight404 on Vimeo.

Here are a few other open source visualization tools and libraries I found when I was casting about last year:
Prefuse now ported to ActionScript as Flare - something I can actually use! Beats the pants off of what I was doing in Circos which, now that I've looked again has better tutorials
OpenDX
GGobi
GraphViz
ESOM
GNUplot
VTK
SCIGraphica

There are tons of others, but I don't have everything in one place for easy reference. That's what Google is for.


Posted August 5, 2008 1:20 AM
Permalink | 2 Comments |