Blog: Pete Loshin Subscribe to this blog's RSS feed!

Not Pictured

Welcome! One way or another, open source software has influenced just about every major information technology development of the past forty years from multitasking operating systems to personal computing to the Internet itself - and it's already taking on the business information software industry. Whether you agree with me or not, I'm looking forward to sharing news and views here about open source software and how it is shaping the business of business intelligence.

 

 

Recently in Musings Category

Somewhere in my stack of obsolete 3.5" floppy diskettes I've got a spreadsheet that contains some interesting raw data. Long ago I was in the habit of buying a bag of M&Ms from a vending machine in the corporate cafeteria every afternoon: before eating any, I would open the bag, sort the colors, count the M&Ms of each color, and record the totals in a spreadsheet.

The primary benefit I got from that activity was a nice set of data, from which I could infer some general rules about which were the most and least common M&M colors> I also got something to do during the afternoon lull to keep me from falling asleep.

It was the kind of job where most of my co-workers were very bright, but we often had time on our hands; conversation topics included arguing different strategies for getting rich by inventing something really cool--and strategies for winning the lottery.

Now that we have the Internet, and there's an endless supply of data sets to play with, here's a guy who actually came up with something useful on that whole lottery thing: Pattern Analysis of MegaMillions Lottery Numbers.

Can you use the information in this article to increase your odds of winning the big bucks? It's not clear: if the lottery number selection process is truly random, the answer is no. But you could use the numbers, and the techniques, as described in the article, to discover hidden influences on the selection process that might skew the results.

For me, though, the best part of this article is that it takes the question of whether lottery drawings are truly random and then applies a scientific approach to it. And, that all the data is available on the New Jersey lottery website, both in HTML and delimited format for easier processing.


Posted November 2, 2007 6:00 AM
Permalink | No Comments |

When I first started blogging here, I had a list of categories that I thought would cover most of my entries. Based entirely on my personal and quirky blog reading experiences, I suspect other bloggers have done the same thing: come up with a bunch of topics they want to write about, and add those to their blog's "categories" list. They probably also write about other stuff at first, things that are interesting to them at the time, and add new categories to cover those ad hoc topics.

I'm thinking on line here, so bear with me.

There are a number of different quantifiable variables here that I'd like to consider:

  1. The number of entries in the blog
  2. The number of categories to which each blog entry is assigned.
  3. The number of categories in the blog. There's more data here:
    • The earliest date on which an entry is linked to each category.
    • The last date on which an entry is linked to each category.
    • The overall number of blog entries linked to each category.

My conjecture here is pretty simple: that the Pareto Principle guides the distribution of most recent posts and categories in which they are posted. In other words, in mature blogs, roughly 80% of all entries will be logged under roughly 20% of the categories. Or, in *other* other words, as time goes on, bloggers tend to focus their writing on a small subset of the topics they originally intended to cover.

I further suspect that as bloggers become more adept at writing, they also tend to be better at distilling the essence of their message--and as a result, multiple-topic postings should decline as the length of time the blog is maintained increases.

Now, if only I could figure out a way to extract that kind of data into a usable data set, I'd be on my way to a possibly cool new piece of information.


Posted July 1, 2007 10:00 AM
Permalink | No Comments |

If you work as a bank teller, I'm pretty sure you can't take your cash drawer home to count out your currency. Likewise, I don't think jewelers allow their employees to take precious metals or stones home and pharmacists probably don't have the option of taking drugs home to fill prescriptions.

Most companies whose employees handle valuable commodities have strict security protocols intended to prevent losses due to carelessness as well as outright theft.

Except the IT industry, apparently.

It seems to be perfectly OK for employees--and contractors, consultants and various other third-party non-employees--to walk out the door with corporate databases loaded onto laptops or portable hard drives, with predictable results when those laptops or hard drives are lost/stolen.

When laptops with sensitive data get lost and/or stolen, it doesn't matter how conscientiously you've protected your personal information from identity thieves. You are at risk because someone who should have known better acted irresponsibly. Maybe it was a human resource clerk at your current employer--or maybe at a company you haven't worked for since the Reagn administration.

Maybe it was someone at a hospital where you received emergency medical treatment, or the insurance company that paid your claim, or your university. Or someone who works for a government agency.

Whoever did it may never be held accountable. And you may not even hear about it until you get a letter informing you that your data may have been compromised and you can sign up for a free credit monitoring service, sponsored by the company or organization that lost your data in the first place.

To get an idea of the scope of the problem, check out numbrX Security Beat, "an online record of reported personal, private and confidential data breaches which can lead to identity theft and credit fraud."

And remember, the breaches you read about on numbrX are probably only the tip of the iceberg: these are only the breaches that have been reported publicly.


Posted June 29, 2007 8:00 AM
Permalink | No Comments |

Go read this article by Matthew Haughey: How Ads Really Work: Superfans and Noobs, and then think about how you can turn data into knowledge.

If that doesn't convince you to drop everything and go read the article, here's my quick summary:

In one sentence, what Matt (re-)discovered is the old 80/20 rule, also known as the Pareto Principle, or power law (this one's an article about power laws and blogs.

Matt was using Google Analytics and found that most of his ad revenue came from "noobs" (one-time visitors who are on the search for something), with most of his loyal visitors ("superfans") generating a disproportionately low volume of ad revenue.

So, what can you do with this data? Matt decided it made sense to give his loyal fans an ad-free experience because they didn't click on ads anyway. Win-win: he got a higher click-through rate because all the pages served to his superfans didn't actually have any ads AND he was able to give potential superfans an incentive to opt for premium membership.

Not really a big deal, just an example of using common sense when you're crunching numbers.


Posted June 1, 2007 6:00 AM
Permalink | 1 Comment |

Check out this article 8 Free Personal Finance Management Programs at one of my new favorite websites, the Consumerist.

The thing for me about financial software is that you've got to have a lot more discipline, know-how and ambition than I've got (apparently) to get all your data loaded in. So even though I once actually bought a copy (years ago) of Quicken, I save a lot more money by not using one of the open source solutions than I do by buying, and not using, a commercial one.

I know this hasn't much to do with enterprise computing or business intelligence, but it does have a lot to do with the reason so many people can't migrate away from Windows: they're locked into using Quicken, MS Money or some other proprietary application that hasn't yet attracted an open source alternative.

When adequate open source alternatives for personal finances and tax preparation become widely usable, the potential for widespread migration to Linux is huge.


Posted April 9, 2007 8:00 AM
Permalink | 1 Comment |
PREV 1 2 3

Search this blog
Categories ›
Archives ›
Recent Entries ›