Blog: Pete Loshin« October 2007 | Main | December 2007 » November 23, 2007But Who Will Use Clippy on Linux?Bear with me: this one has almost nothing to do with business intelligence, unless you consider creating a clone of a brain-dead piece of proprietary software to Linux an example of "business intelligence, lack of". I followed this link, Clippy v1.00 because it was described as a free download of the (notorious) Clippy Assistant. I thought, excellent, just what I need: an annoying pseudo-sentient digital "assistant" to "help" me get through the day. Too bad, it's Windows-only. And it's just one of dozens of "prank" programs offered by RJL Software. Far be it from me to tell them how to do their business, but their claim to be: ...the providers of unique software solutions for today's changing market. RJL Software offers a wide range of Software and Services for the Windows environment. falls a little short since so much of their software is April Fools type prankery. Anyway, Googling "clippy open source linux version" came up with this: Clippy for Linux, Digg.com entry that points to this project: Vigor. As you can see from the screenshots, Vigor is, like many open source programs, both less-polished and more entertaining than the proprietary version. But it's still only at version 0.016, so there's room for improvement. I'm optimistic though, as I was able to install Vigor quite easily on my Ubuntu desktop system with Synaptic Package Manager. November 13, 2007Installing MySQL: A Tale of Two PlatformsBefore I begin here, be warned: one of the links I'm pointing to here is NSFW. That means "NOT SAFE FOR WORK", and in this case it means that the page contains vulgar language and profantiy. If you're easily offended or are at work, you may want to go do something else now. Here's the story: On November 10, this article was posted: Installing MySQL on Mac OS X. A well-written, comprehensive, detailed and in-depth how-to article that anyone who wants to get MySQL going on OS X would be happy to stumble over. The guy who wrote it, Dan Benjamin, seems a talented and very nice fellow, and he went to a great deal of trouble to put the article together. I'm guessing that sometime shortly after that article got posted, Mark Pilgrim read it and decided that running MySQL on a Mac seemed like a lot work--much more than using it on Linux. Then, Mark wrote his own "answer", contrasting, how-to Installing MySQL on Ubuntu (the NSFW way). This article is NSFW. But it's also hilarious as H-E-double-hockey-sticks, and it looks to be just as useful as the first article. If you're worried about the foul language in the second article, I'll summarize: installing MySQL on OS X sounds like an incredibly complicated and scary adventure; installing it on an Ubuntu Linux box sounds like a walk in the park. The funny thing is, we just got an iMac. And I just installed MySQL on an Ubuntu Linux system. I don't anticipate installing it on the iMac, so I'm enjoying this on multiple levels. November 5, 2007Opening Up the Internet: Craigslist + Yahoo! Pipes = Better Data SearchingWe've really come a long way with the web and the Internet over the past dozen years or so. Back then, it was kind of a big deal to run screen-scraping software that could pull data off websites, or access corporate legacy mainframe systems through a webified front end. Now, we're seeing more and more of the web is instantiated in some seriously big data stores, and we're seeing more and more of the owners of those seriously big data stores making data processing tools and APIs available to anyone who wants them, so we can have some nice little mashup applications combining, for example, maps and data with geographical components. But here's something sort of new: a way to make an already popular, useful and generally great website--in this case, Craigslist--with another popular, useful and great website--Yahoo! Pipes. The result is even better than either one. Yahoo! Pipes is kind of like a web version of UNIX piping: a way to take the results of one command (output) and "pipe" it into another command as input. What you get is a very handy way to create very specific and powerful searches, and turn the results into useful information. So, here's the article that got me hooked: How to Actually Search Craigslist. As great as Craigslist is, it has some drawbacks. James Aaron, who wrote the article, is a student at San Jose State's School of Library and Information Science, and is looking for a job currently. He likes Craigslist, but, as he explains, it could be even more helpful if there were ways to search better: There is no way to truncate searches, such as "librar*" to include librarian, library, libraries, etc. There is no way to perform Boolean AND, OR, NOT searches. There is no way to remove frequently occuring irrelevant items. There is no way to search two sub-regions at once. So, unless I want to perform 20 searches a day and receive MANY completely irrelevant hits, I basically have to browse. The answer, he tells us, is Yahoo! Pipes, and he explains just how to use Pipes with Craigslist to make Craigslist that much more useful. In other words, more evidence of just how much the entire web is evolving into the world's biggest ever data store, with the most powerful ever set of tools for extracting business intelligence. How could you use this kind of capability to extract actionable knowledge from the web? November 2, 2007Mining Valuable Intelligence From Random NumbersSomewhere in my stack of obsolete 3.5" floppy diskettes I've got a spreadsheet that contains some interesting raw data. Long ago I was in the habit of buying a bag of M&Ms from a vending machine in the corporate cafeteria every afternoon: before eating any, I would open the bag, sort the colors, count the M&Ms of each color, and record the totals in a spreadsheet. The primary benefit I got from that activity was a nice set of data, from which I could infer some general rules about which were the most and least common M&M colors> I also got something to do during the afternoon lull to keep me from falling asleep. It was the kind of job where most of my co-workers were very bright, but we often had time on our hands; conversation topics included arguing different strategies for getting rich by inventing something really cool--and strategies for winning the lottery. Now that we have the Internet, and there's an endless supply of data sets to play with, here's a guy who actually came up with something useful on that whole lottery thing: Pattern Analysis of MegaMillions Lottery Numbers. Can you use the information in this article to increase your odds of winning the big bucks? It's not clear: if the lottery number selection process is truly random, the answer is no. But you could use the numbers, and the techniques, as described in the article, to discover hidden influences on the selection process that might skew the results. For me, though, the best part of this article is that it takes the question of whether lottery drawings are truly random and then applies a scientific approach to it. And, that all the data is available on the New Jersey lottery website, both in HTML and delimited format for easier processing. |