Business Intelligence Network
business intelligence resources

Blog: Pete Loshin

Main

April 18, 2008

Clarifying the MySQL "Closed-Sourcing" brouhaha

Remember yesterday? Well, I was reading that post again and realized that it's not entirely clear what Sun is actually doing with MySQL. Here's another article about the whole thing, MySQL Not Going Closed Source? that you can check out, but the gist of it is this:

MySQL Server is still (and always was) open source. The difference is not (as I might have implied yesterday) that the Enterprise product was going to be different. What's actually happening is that if you are an Enterprise customer (meaning, you're paying the big bucks for the Enterprise license), you get some extra "add-ons".

Somehow, calling them "add-ons" made a big difference (for me, anyway) in understanding what's going on: Sun is giving their customers another reason to pay extra for Enterprise MySQL. The add-on in question, this time, is online backup. If you want to do online backup with MySQL at this point, you have two options:

  1. Buy the Enterprise edition.
  2. Program your own online backup add-on, or hire someone else to do it.

Sun Senior VP and former MySQL CEO Marten Mickos spelled it out, pretty much in those words.

I believe in free and open software as much as the next person. I also think that companies "selling" open source software have not just the right but the obligation (to their shareholders) to find a business model that allows them to continue to publish open source software. Enterprise customers have special needs--needs that generally don't intersect with the needs of most individuals or small groups who are using the software--and that as a result, it makes sense to have enterprise customers get the extra add-ons as part of their licensing fees.

Am I wrong?

April 17, 2008

Is Sun/MySQL selling out, or just selling?

Here's some news: MySQL, Sun's still-shiny new open source database acquisition, will be adding new features to its Enterprise (that is, paid) version that won't be added to the Community (free and open) version.

Here's the story at ComputerWorld: MySQL reserves features for paying customers; open-source community up in arms.

Oddly enough, though, the story seems to have originated on Jeremy Cole’s blog:
Just announced: MySQL to launch new features only in MySQL Enterprise. No press releases, and the news articles I've seen so far seem to be pointing to this blog entry (and MySQL honcho Marten Mickos' response/confirmation to the entry) as their primary source.

That tells me one of two things is happening: either Sun/MySQL is trying to pull a fast one and sneak this new development under everyone's radar, or else this is just business as usual and not anything to get upset about--or at least, not be surprised about.

You can read about "user outrage" in the ComputerWorld article, as well as on Slashdot (Sun to Begin Close Sourcing MySQL). But Sun has to find a way to make that MySQL acquisition pay off, somehow.

On the other hand, as Dana Blankenhorn points out here (Did Sun just make mySQL closed source?), MySQL started limited source code access to its Enterprise version last year--if you want to see the code, you've got to be a paying customer. That's fine: if you've paid for an Enterprise license, you get to see the source code (and do what you want with it).

It shouldn't surprise anyone when a company that runs an open source project tries to make it pay off. The good news is that there is a huge--and strong--open source MySQL community, and that (as Mickos pointed out) anyone who likes could develop their own, free and open, version of the features that aren't going to make it into the community version.

I'm sure we'll be hearing more interested news from the open source database players in days and weeks to come (including from me!). One of the exciting aspects of this development is that it illustrates and illuminates some of the most critical issues facing paying and non-paying users of open source software, as well as the vendors who are trying to build their businesses on free software.

November 5, 2007

Opening Up the Internet: Craigslist + Yahoo! Pipes = Better Data Searching

We've really come a long way with the web and the Internet over the past dozen years or so. Back then, it was kind of a big deal to run screen-scraping software that could pull data off websites, or access corporate legacy mainframe systems through a webified front end.

Now, we're seeing more and more of the web is instantiated in some seriously big data stores, and we're seeing more and more of the owners of those seriously big data stores making data processing tools and APIs available to anyone who wants them, so we can have some nice little mashup applications combining, for example, maps and data with geographical components.

But here's something sort of new: a way to make an already popular, useful and generally great website--in this case, Craigslist--with another popular, useful and great website--Yahoo! Pipes. The result is even better than either one.

Yahoo! Pipes is kind of like a web version of UNIX piping: a way to take the results of one command (output) and "pipe" it into another command as input. What you get is a very handy way to create very specific and powerful searches, and turn the results into useful information.

So, here's the article that got me hooked: How to Actually Search Craigslist. As great as Craigslist is, it has some drawbacks. James Aaron, who wrote the article, is a student at San Jose State's School of Library and Information Science, and is looking for a job currently. He likes Craigslist, but, as he explains, it could be even more helpful if there were ways to search better:

There is no way to truncate searches, such as "librar*" to include librarian, library, libraries, etc. There is no way to perform Boolean AND, OR, NOT searches. There is no way to remove frequently occuring irrelevant items. There is no way to search two sub-regions at once. So, unless I want to perform 20 searches a day and receive MANY completely irrelevant hits, I basically have to browse.

The answer, he tells us, is Yahoo! Pipes, and he explains just how to use Pipes with Craigslist to make Craigslist that much more useful.

In other words, more evidence of just how much the entire web is evolving into the world's biggest ever data store, with the most powerful ever set of tools for extracting business intelligence.

How could you use this kind of capability to extract actionable knowledge from the web?

April 18, 2007

Open Source and Best Tech Products of All Time?

When PC World reported on The 50 Best Tech Products of All Time, Slashdot counted Apple holds down seven places in the list, Microsoft two, and open source software (Red Hat Linux) one.

That sounded strange to me, so I took a look at the list myself. Lots of old and new hardware, but also more open source than you'd imagine based on Slashdot's report:

Number 1, Netscape Navigator, was open-sourced in 1998, (see Netscape Communicator Open Source Code White Paper), but from the start Netscape's success was based on their use of an open standard protocol for web commerce.

Number 3, Tivo, is another open source success story, based on GNU/Linux.

Then, there's Number 20, Microsoft Windows 95. Yes, Windows 95 incorporated a bunch of open source code from BSD in order to provide TCP/IP support.

You could say that Number 30, Mac OSX, is an Apple product, but to be fair, you should at least acknowledge the debt Apple owes to open source projects including but not limited to OpenBSD and X Windows.

And finally, while shareware is not the same as open source, it is certainly a related category into which Eudora (Number 32) and PC-Talk (Number 47) both fit.

January 29, 2007

Endlessly Free and Libre Images

Regardless of your political leanings, you've got to love this Compendium of Public-Domain Image Links posted on Daily Kos the other day.

It's always nice to find that someone else has cared enough about something to do a comprehensive job of doing the job of gathering everything you need to know or all the resources you could possibly want relating to some topic of interest. Well, someone did just that for public domain and other open-ish resources for free images.

Check it out, and be sure to check the links as well, as they may have even more information and links than the original post.

Meanwhile, here's some good opportunities to anyone in the business of data base management who wants to do something interesting with great big piles of images that can be used free of copyright (more or less, anyway).

January 16, 2007

Stop Talking to Machines!

The gethuman project is a consumer movement to improve the quality of phone support in the US. This free website is run by volunteers and is powered by over one million consumers who demand high quality phone support from the companies that they use.

Masochists may prefer navigating the endless and aggravating automated customer support systems, but sometimes you just want, no, NEED to talk to a human. Sure, those systems save businesses lots of money, both by deterring customers from getting what they paid for as well as by forcing the burden of performing customer service onto the customer.

But there are ways around most systems, and you can find the secret passphrases that get you a person at any of hundreds of businesses and government agencies, gethuman 500 database, a very nice little website that takes information collected from a variety of sources and turns it into nicely actionable knowledge that anyone with a web browser can use.

Of course, this is a long-overdue shot over the bow of corporations cutting customer service corners, and you can bet your bottom dollar that corporations will use the information in this database to help them "streamline" (that is, cut out customer short-cuts) customer service.

In the meantime, enjoy!

December 19, 2006

Linux Equivalents Websites

I love simple ideas, and The Linux Equivalent Project is delightfully simple: a single-page website that lists Linux alternatves to Windows software, with links to each project's home page. A super resource if you're hesitant about Linux and the availability of critcal software applications.

All software cited is end-user software, but I'd love to see the scope expanded to include development, back-end, server, and other kinds of enterprise software, as well as Windows equivalents to Linux programs.

And the day after I discovered The Linux Equivalent Project, I ran across The table of equivalents / replacements / analogs of Windows software in Linux..

The link is to an Englsh translation of a Russian webpage that has been up and running since at least 2003, and that does what it says: lists software function categories from desktop apps to games to servers, developer tools and scientific apps, with lists of approximately equivalent software for Windows and Linux, as well as links to most of them. Bigger and broader coverage, yes, but also a bit messier and with more holes/bugs (and the authors actively solicit feedback to fix those).

Finally, check out this Foogazi blog entry, about Alternatives to Windows Programs. It's a little chattier, but another nice little roundup.

November 2, 2006

The Small Print Project

Ever wonder just exactly what all that EULA legal stuff you have to click on to get at software or services is all about? Check out The Small Print Project, a new website that solicits and publishes particularly egregious EULAs with the plain language explanations of why no sane person would accept them.

One of the first submissions is this Amazon.com agreement under which Amazon reserves the right to delete any and all movies it sells you.

It's still early days, but already you can find some excellent insights into what you can infer just from reading the small print. For instance, here's an entry about "hair removal" scams; the scammer apparently attempts to restrict its content to prevent any official oversight.

Check it out!

October 19, 2006

Google Gadgets Gladden Guys and Gals

Call it what you will: part of the Web 2.0 explosion, or just a handy way to put new features on your website. Google Gadgets are mini-applications that grab information from Google.com or any webste and let you plaster it onto your website.

Right now, there are over 1,500 Google Gadgets for websites (and growing rapidly) that you can just drop onto one of your own web pages so any browser (on any OS) capable of handling the Gadget HTML/Javascript code can access the applet.

Though many bad web sites will undoubtedly get badder, savvy BI professionals will also undoubtedly view Google's new web application programming interface as an opportunity to integrate enterprise information and gather (or distribute) business intelligence.

Unlike Google Gadgets for Desktops, of which there are over 100,000 examples, the Google Gadget API lets you generate content that can be interpreted by anyone, not just Windows users.

October 13, 2006

Google Resource for Searching Open Source Code

It's been almost a year since I found and wrote about a cool Resource for Searching Open Source Code; it's just a week or so since Google Labs released Google Code Search.

Which is more or less Google's version of the C/C++ Source Code Search Engine, except it covers all "public" source code, not just C/C++ source code. According to the Google Code Search discussion group, it's already proving popular and useful, though some have taken it as a challenge to find programming languages that are not included.

September 22, 2006

When Can We Open Source Voting Machines?

Some more interesting news about Diebold voting hi-jinks in Rolling Stone magazine. Read all about it in Will The Next Election Be Hacked?.

I won't rehash the history and controversy over electronic voting issues, I'll just point to two recent reports about just how secure Diebold's voting machines are. First, this research paper titled Security Analysis of the Diebold AccuVote-TS Voting Machine. The bottom line: not good.

And second, one of the co-authors of that paper blogged that “Hotel Minibar” Keys Open Diebold Voting Machines. The kind of key you, or anyone, can order over the Internet.

Continue reading "When Can We Open Source Voting Machines?" »

August 15, 2006

Ready, or Really Ready?

Compare this with this.

The first one, Ready.gov, "...is a national public service advertising campaign produced by The Advertising Council in partnership with Homeland Security. The Ready Campaign is designed to educate and empower Americans to prepare for and respond to emergencies, including natural disasters and potential terrorist attacks."

The first cost millions and took six months, according to this blog entry by Dr. Michael Stebbins, Director of the Biology Policy for the Federation of American Scientists.

The second, according to Stebbins, was developed in two months by FAS intern Emily Hesaltine. It corrects errors and omissions, and includes an "analysis" (that is, a critique) of Ready.gov.

Check it out and let me know which one you think does the job better.

July 17, 2006

Universal Database of Software Titles

Remember way back when you first started to use a computer? Maybe you were just a kid, fooling around with a TRS-80 or Apple ][ or Commodore or whatever. Did you have a favorite program? If you've ever had a yen to show your kids what it was like back in the day, or just felt a little nostalgic about the old times, you just might be able to track it down--if you're lucky.

The good news is that many of these old programs and platforms can be replicated on modern computers, so the original installation disks may still useable. Of course, you won't get any support from the publishers, but worse news is that many of these old programs are in a legal limbo: originally published under proprietary licenses by companies that have been acquired, sold, or liquidated after going bankrupt.

Continue reading "Universal Database of Software Titles" »

June 13, 2006

Hypo-Allergenic Cats, Adding Value, and Proprietary Technology

One hypo-allergenic kitten: US$3,950

Processing and delivery (by private jet): US$995

Premium Placement fee (to reduce waiting time from two years or more to a few months): US$1,950

A business model that guarantees total monopoly over product distribution: Priceless.

Continue reading "Hypo-Allergenic Cats, Adding Value, and Proprietary Technology" »

February 7, 2006

What Would YOU Pay to Link to a News Story?

Last week I commented on how Microsoft wasn't planning to publish a patch for the Kama Sutra/Blackworm/MyWife worm until next week; it turned out not to be that big a deal.

But imagine my surprise when I noticed that the news source for the original article was playing some games: they'll email the article to all your friends for you, in the process collecting all of your email addresses. Or, they'll sell you a "license" to email the URL for as little as $5.00. If you prefer, you can pay a measly $2.50 to "license" the link on your own website--a better deal because if you wanted to email the URL to 200 people you'd have to pay $50.00).

The costs go up even faster if you want to license an article, or even just excerpt an article, to be used in a book or newsletter; the whole thing is done through a third-party clearance company and presumably the publisher and the clearance company split the proceeds and leave the original author out in the cold.

Rather than increasing profits, this whole thing tends to reduce the likelihood that anyone would want to link to this publisher's articles, or that other authors would cite their articles. Why bother with the cost and nuisance of this "license", or even worse, worry about legal action resulting from what would normally be considered "fair use"?

January 26, 2006

Money Games, Ad Hoc Databasing, and Serious Science

Look in your pocket. No, the other one, where you stash your cash. That's right. Now, pull out your currency and go check out Where's George?, a website that's a front end for entering US bills into a database and figuring out where they came from, where they are, and where they are going.

Every note has a serial number, a denomination, a series (year) and a few other bits of information that remain static; the only thing that changes is the location. So, you enter the information, along with your zip code. Then, you go on with your life. Chances are good that you'll get the most bang for your bucks by spreading them around, maybe down at the corner deli or maybe buying a paper at the airport across the country. Eventually, someone else will check out Where's George, too, and enter a new location for what once was your currency.

The whole idea is kind of interesting, in a kind of geeky, weird way. I'm not sure I'd bother looking up or registering my cash, unless I had a lot of time on my hands. It all started out in 1998 as a sort of fun thing to do, for no other good reason (though maybe as part of the dot-com boom, when anything that could generate traffic was thought to be a potential money-maker).

Oddly enough, though, all that data turned out to be useful after all. Scientists are now reporting that tracking currency is a valuable tool for modeling the spread of infectious diseases.

What a cool example of how a game, really, based on openly-available information, when looked at as a database can actually return a worthwhile result.

January 21, 2006

Putting a Pricetag on Computer Crime

According to the FBI, and as reported on C|Net, "Dealing with viruses, spyware, PC theft and other computer-related crimes costs U.S. businesses a staggering $67.2 billion a year".

Wow. Here's what I found staggering, as I read the article:

  • Though the headlines tended to say "computer crime", the survey covered viruses, spyware, trojan horses, and other malware in addition to other computer-related crimes. So, if you could eliminate viruses and other malware from your systems, you could eliminate significant portions of these costs. With 98.4% of respondents claiming to use antivirus software, maybe they should reconsider how effective that stuff is?
  • A whopping 64% of respondents to the FBI's survey reported a financial loss in the preceding 12 months; but the statistical experts figured there would be a skewing of data: you'd be more likely to respond to the survey if you'd actually suffered a loss, so it would be absurd to claim that 64% of all US businesses had suffered financially the prior 12 months. So they applied a fudge factor and, somehow, determined that only 20% of US businesses had likely suffered financially. Where did that number come from?
  • On average, the total loss for the 12 month period in question was $24,000, so the FBI multiplied it all out to get the precise sum of $67.2 billion in losses for the 12 month period in question. Given the degree of imprecision in their original numbers, I'm staggered by the precision of their result.
  • How much of that loss, the greatest part of which is attributed to "virus-type" incidents, could have been mitigated by using Linux or other open source operating systems that are less susceptible to catastrophic losses due to viruses and other malware?

Given all that money the FBI says is being spent on this kind of criminal activity, I wonder when we'll start to see software companies that sell insecure products be held accountable.

January 16, 2006

Get Ready for GPL, Version 3

You may not have realized it, but the current version of the GNU General Public License (GPL) is version 2; even if you realized it, you may not have been aware that the Free Software Foundation is hard at work on Version 3.

It's a first draft, and with millions of programmers and others in the open source "community", expect to hear plenty of commentary on what's right and what's wrong with it.

Not having yet had the opportunity to study it (and not being a lawyer in any case), I can't really comment on the ramifications: but the ever-expanding GPL may soon address issues such as Digital Rights Management (DRM), patents, and maybe more.

Stay tuned!

January 9, 2006

ODF Still a Go for Massachusetts

More news about Open Data Format (ODF) and Massachusetts in the wake of Peter Quinn's resignation. Andy Updegrove reports: New Acting MA CIO Appointed, and Full Speed Ahead with ODF.

January 4, 2006

Data Mining for the Masses

If you read any of the same blogs I do, you're likely to see a link to this article by Tom Owad on Applefritter.com Data Mining 101: Finding Subversives with Amazon Wishlists. There's nothing particularly revolutionary about the actual technologies used, other than the fact that they are all easily and cheaply (or freely) accessible to just about anyone with the time and interest in seeking them out.

In a nutshell, Tom rights about how he went about mining Amazon.com's wishlists to find anyone who might be interested in the kinds of things that subversives might be interested in. And then he shows how to extract locations and map them. All in all, a tidy demonstration of just how much too much information some people are more than willing to put out there.

A fascinating read for the details as well as this comment: "It used to be you had to get a warrant to monitor a person or a group of people. Today, it is increasingly easy to monitor ideas. And then track them back to people."

January 2, 2006

Peter Quinn Resigns

Sadly, the man who was going to be bringing the Open Document Format (ODF) to Massachusetts, Peter Quinn, is resigning. Andy Updegrove reported last week that Quinn would be stepping down effective 1/9/06.

This story bothers me for a couple of reasons, not least is that it is due in large part to "reporting" by the Boston Globe.

Having just got back into my office after a vacation, I don't have all the details on this story yet, but you can get more from Andy Updegrove's standards blog.

December 18, 2005

Wikipedia vs Encyclopedia

Just because you pay for it doesn't mean you're getting your money's worth, as a report, Internet encyclopaedias go head to head from Nature.com demonstrates.

Wikipedia is a free, online encyclopedia that anyone can edit. If you read an article in there and find an error, you can fix it. You can also do mischief, but Nature reporter Jim Giles reports that at least as far as science goes, Encyclopedia Brittanica Online, averaging about 3 errors per article, is only marginally better than Wikipedia, averaging roughly 4 errors per article.

Wikipedia is even easier to use, since you don't have to log in.

December 6, 2005

Resource for Searching Open Source Code

The C/C++ Source Code Search Engine (csourcesearch.net) is quite something: some guy (I think) who goes by the nom de code "Sembiance" decided it would be a good idea to build a searchable database of open C/C++ source code. So he did it.

It's an interesting open source story for a lot of reasons:

  1. You can only do this kind of thing with open source code.
  2. It's actually pretty useful for anyone planning to use open source code in their enterprise, or for anyone who wants to make sure that their non-open code doesn't actually come from the open source code world. Though you might want to try one of the commercial services, like those from Black Duck Software or Palamida, Inc..
  3. It's an intriguing example of a database application that was created by an individual using some very powerful open source software.

Of course, you can't build a huge searchable code base just from the raw source code; you've got to have the right tools to do all the formatting and data basing and indexing and so on. What might have been a multi-year, multi-million dollar project if done from scratch apparently turned out to be a hobby for Sembiance, using open source tools. The ones cited on csourcesearch.net include:

  • MySQL for the database engine
  • Apache Lucene which is a text search engine library written entirely in Java and running on top of the Apache webserver.
  • CodeWorker, a universal parsing tool & a source code generator. CodeWorker allowed Sembiance to parse out different parts of the source code he was working with and let his code understand them in context.
  • GeSHi - Generic Syntax Highlighter, a tool for highlighting your code based on the appropriate syntax, so C/C++ code looks like it should.
  • Gentoo Linux is cited as a significant contributor to Sembiance's project, probably in large part due to the Portage software management tool. When I say "software management" I mean the task of keeping track of which software packages are installed on the system, at which revision, and whether they've been patched.
  • PHP, the general-purpose scripting language that makes csourcesearch.net a full-fledged member of the LAMP community (O'Reilly's ONLamp.com is as good a place to start as any).
  • Also mentioned in the acknowledgment section of csourcesearch.net is freenode, a service for providing "interactive services to peer-directed project communities." In other words, a good place to get help, implemented on Internet Relay Chat (IRC).
  • Then, there's Flooble.com. I can't really tell what exactly it is, other than a showcase site for Animus Pactum Consulting that also happens to include a pretty decent resource for webmasters looking for design information and free Java/JavaScript code.

You can browse and search by what kind of license is used, by individual packages, and by software categories. All in all, csourcesearch.net provides an intriguing tool for exploring the world of open source C/C++ software for anyone interested in knowing more, whether you're looking to do due diligence on your own code base or just interested in learning more about how to build your own applications.

November 30, 2005

The Never-Ending Sony Story

I've already blogged about Sony's DRM woes (What Hath Sony Wrought?), but it just never ends. They're being sued in Texas, maybe in New York and the Electronic Frontier Foundation (EFF) is bringing a class action suit against them.

Then there's this report from Business Week Online that Sony was warned about the danger of their rootkit to their customers almost a full month before the news hit.

Stay tuned, it's only going to get more interesting.

November 21, 2005

True Crime and Open Source Mapping Solutions

Did you see Shawn's blog entry ("Criminal Predictions")?.

It reminded me of ChicagoCrime.org. Using Google Maps for the maps, and a feed of crime data from the Chicago PD's publicly available database of reported crime at the Citizen ICAM site, ChicagoCrime.org lets you browse Chicago to see where, and what, crimes have been reported over the past 90 days.

The brainchild of up-and-coming web journalist/technologist Adrian Holovaty, ChicagoCrime.org was a spare-time project that's garnered a lot of attention (read Holovaty's interview in newspaper industry newspaper Editor and Publisher).

I wonder how much of the functionality of the SPSS/Information Builders could be encompassed in this type of application built on a shoestring and powered by another of Holovaty's projects, the Django web framework. An important part of Django is the ability to define data models entirely with the Python programming/scripting language. Django offers a dynamic database-access API, as well as the ability to write portions of the application in SQL, if needed.

Web-application hybrids ("mashups") combine two or more, usually unrelated, web services to create new and unique applications. Some of the most interesting are Google Map mashups marrying powerful mapping with geographical data from wherever you can find it. Most are partly or entirely driven by free and open source software. Check out the Google Maps Mania blog for the latest Google mapping news.