Blog: David LoshinJanuary 23, 2008Doubles Your Pleasure, Double Your FunThere is an oft-quoted statistic about the growth rate of data volumes that I wanted to use in some context, and I started searching for a source. I googled "data volumes" +"double every" to see what I could find, and to my surprise, lots of hits, but it is difficult to pin down the exact parameters. Lots of folks are using the statistic: "Data doubles every year" I am still following links from the first page of results, and we are doubling our data every 3 to 18 months. "Reed's Law states that the volume of data doubles every 12 months. " OK, so there is actually a law about it. Hold on a second, according to wikipedia this law is about the utility of (social) networks, so perhaps the law doesn't apply in all jurisdictions. Anyway, these may all be references to a UC Berkeley study on the growth of data , which said that the amount of information stored on media such as hard disk drives doubled between 2000 and 2003. So let's look at this a little more carefully - we have a scientific study that looks not at the creation of data, but rather the use of storage media to hold what is out there. And out there is a lot of stuff needing a lot of storage, like images, music, videos, etc. Things that have information yet from which are still a challenge to extract data. Also, consider that for each thing out there, there are likely to be a lot of copies! I am sure that a scan of all the TiVos in the country would demonstrate that lots of people are still catching up on older episodes of 24 and American Idol. I need to refine my question a little bit, then, but I am afraid it will be difficult to track down defensible sources for it. I am more interested in knowing about the growth rate for data that can be integrated into an actionable information environment. I may not care about the bits comprising that specific episode of 24 that is sitting on millions of DVRs, but as an advertiser, I might be interested in profiling which households have watched which episodes and at what kind of time shift. Anyone have any ideas? May 25, 2007Talking About Search and GoogleI had a conversation the other day with one of my former colleagues, and I asked him his opinion about whether approximate matching and semantic techniques would be integrated into search engines. His response surprised me: he told me that he had read that over 90% of google searches involve a single word, and that in the absence of information, the engine didn't have that much to work with. Therefore, was it really worth it to add this increased functionality if, for the most part, it would add computation time but only benefit a small number of searchers? That, of course, shocked me, but maybe it shouldn't have. I thought I was pretty good at googling, mostly because I was able to get pretty good results as a by-product of the feedback I get from each search. For example, you start with a phrase in quotes, and that may be sufficient. If not, you can scan the short results coming back to seek out better phrases to include (or exclude) from the search. Others are much more comprehensive in their searching, using qualifiers and key tokens to enhance their search (e.g., Johnny Long, who will be a keynote speaker at the upcoming Data Governance conference in San Francisco, at which I will also be speaking, by the way). But perhaps the general computer user is not so sophisticated, and may need some suggestions. Anyone want to contribute their favorite search strategies? November 15, 2006Open Source ConundrumEven top management at open source BI companies seem to feel that the costs associated with deploying open source projects are roughly the same as going the traditional route. On the other hand, since the costs for deploying an open source solution are largely on the back-end (e.g., paying people to do things) instead of the front-end (e.g., software licensing fees), there might be a greater ability to start a BI project using open source tools than trying to justify the costs just to get to the starting gate. However, in terms of innovation, open source projects often trail the traditional commercial tool vendors. Open source projects grow by community participation, in which lots of contributors make things happen, or through acquisition, in which components are added to the mix through negotiated deals. So while there are some benefits to starting with open source, I suspect the general process might be to migrate over time to a traditional commercial product. Here is the challenge: I am interested in experiences using open source Business Intelligence software, good, bad, ugly, or beautiful. Feel free to email me or post directly to the blog. I am looking forward to some responses! September 26, 2006Is Organizational Evolution a Zero-Sum Game?At times, our consulting practice is faced with a conundrum: the evolution of certain technologies and practices for enhanced information exploitation suggest changing business operations in a way that might reduce, or even eliminate, some participants' roles. In other words, implementing technical changes to benefit the organization simultaneously have a determinental impact on individuals within the organization. In terms of self-preservation, it is not in the best interests of these individuals to support new technical initiatives that might result in their own termination. Yet in order to do their job the right way, they are obliged to do what is right for the organization, right? This situation resembles the game theory concept of a zero-sum game, in which moves that benefit one player equally have a negative impact on another player. The challenge, then, is to determine how to socialize the evolution of the program in a way that demonstrates mitigation for any individual impacts or displacements. For example, when suggesting an action whose side effects include the elimination of a specific person's role, seek ways to evolve that person's responsibilities to support the change process and long term maintenance of the technical evolution. Doing so will finesse the "zero-sum" situation and will provide new challenges for both staff training and organizational improvement. July 12, 2006Opportunity in Data Standards managementI have written extensively about the value of developing a data standards program as part of a data governance framework, and so far we have convinced a number of clients as well. In fact, Knowledge Integrity is looking for a motivated individual to join our Data Standards team at one of our client sites. Click here for more information. May 8, 2006Playing by the RulesWhile I was doing some random web searching, I came across an interesting web page that provides some training on finding MP3s using Google. Not that I am suggesting that search engines be used for unacceptable behavior, but my curiosity is piqued by the more general concept of "getting around the rules," and how that concept relates to the more piquant topic of compliance. There are two approaches to compliance. the first is doing what you need to do to comply; the second is seeing how much you can do to avoid being compliant. Here is a quick, although probably dated example: During the 1980s and 1990s, police would set up speed traps employing radar systems to determine how fast cars were traveling. As the goal was to identify (and punish) drivers exceeding the posted speed limit, this reflects a simple model of compliance. Drivers who were inclined to speed could react in one of two ways. The first (for the "compliers") was to drive slower (become compliant). The second (for the "avoiders") was to purchase some technology (a radar detector) that would notify the driver when the radar monitoring was taking place and allow the driver to slow down during the monitoring phase, but then resume the noncompliant behavior when there was limited risk of being caught. Do organizations opt for one or the other of these approaches? What is the risk/reward model? To look at our example, those who became compliant were penalized to some extent by having to reduce their speed and get to where they were going more slowly. There was some monetary investment on behalf of the avoiders (the cost of the radar detector), but otherwise they were rewarded for their noncompliance, since they still get to their destinations more quickly, with some limited risk of getting caught nonetheless. Is it better to be a complier or an avoider? How does an organization determine its approach, and then communicate that approach to the individuals within the organization? And lastly, I wonder whether there is some middle ground between these two options. Any comments? October 19, 2005Encoded Printing, Privacy, and Business IntelligenceThis morning, I read a very interesting story about how there must have been some apparent interactions between government folks and printer manufacturers that resulted in the embedding of encoded information on printed pages. This message, embedded as a series of yellow dots only visible using a magnifying glass and blue light, was determined to be a digital signature used by the US Secret Service to "prevent illegal activity," (probably money counterfeiting). From a privacy point of view, it is always jarring to hear about ways that activity is being tracked without the target's awareness, but those of us in the Business Intelligence world know that there are many ways that individual activity may be (and probably is) being tracked. And sometimes, people even are happy to "be tracked," if it results in money savings or better efficiency. I am sure that there are many sideline privacy "activists" that participate in supermarket "clubs" or frequent flyer programs. The question I want to throw out to the blogspace is: where is the line between beneficial tracking and invasive tracking? September 19, 2005Data Intelligence and Performance MetricsHere is my latest challenge to you readers: I had a heck of a time trying to explain to some colleagues the value of presenting the results of measured metrics tied to business performance, and I need some help in figuring out a good way to do it. Consider the scenario: An organization has the ability to provide some basic reporting statistics on the technical (and some of the operational) aspects of their applications. But it is not necessarily clear what business value is being provided by these statistics, and whether these metrics are relevant to achieving business objectives. Perhaps because I eat, live, and breathe data and business intelligence, the definition and use of business performance metrics tied to business data is obvious. In this case, the questions revolve around performance improvement. But how can you improve a process if you can't measure how successful you are at it in the first place? Here is what I want to be able to convey: That is what I wanted to say, but I think it kind of came out like this: "... bla bla bla performance indicators bla bla ... process improvement ... bla bla bla ... control limits ... bla bla bla" Suddenly, I am gripped with fear that I have transformed into a buzz-phrase spewing robot. (Don't worry, I got over it pretty quickly.) But here is the challenge: How do you effectively communicate the value of systemic thinking and reporting to an audience largely experienced in procedural and operational processing? August 14, 2005Open Source DataWe hear a lot about open source software and its potential benefits to the marketplace. How about the concept of open source data? The idea is creating a repository of data that is readily available, can be configured for business benefit, and is collectively supported by a development community. One place to start is with public data, such as what is available from the US Census Bureau. |