Blog: David Loshin Subscribe to this blog's RSS feed!

David Loshin

Welcome to my BeyeNETWORK Blog. This is going to be the place for us to exchange thoughts, ideas and opinions on all aspects of the information quality and data integration world. I intend this to be a forum for discussing changes in the industry, as well as how external forces influence the way we treat our information asset. The value of the blog will be greatly enhanced by your participation! I intend to introduce controversial topics here, and I fully expect that reader input will "spice it up." Here we will share ideas, vendor and client updates, problems, questions and, most importantly, your reactions. So keep coming back each week to see what is new on our Blog!

About the author >

David is the President of Knowledge Integrity, Inc., a consulting and development company focusing on customized information management solutions including information quality solutions consulting, information quality training and business rules solutions. Loshin is the author of The Practitioner's Guide to Data Quality Improvement, Master Data Management, Enterprise Knowledge Management: The Data Quality Approachand Business Intelligence: The Savvy Manager's Guide. He is a frequent speaker on maximizing the value of information. David can be reached at loshin@knowledge-integrity.com or at (301) 754-6350.

Editor's Note: More articles and resources are available in David's BeyeNETWORK Expert Channel. Be sure to visit today!

Recently in Cutting Edge Category

Apparentyly, the same issues that plagued competing US intelligence agencies immediately after the 9/11 bombings have not yet been resolved. According to this Time Magazine article, President Obama's summarized the failure to prevent terrorism suspect Umar Farouk Abdulmutallab from boarding a Detroit-bound plane was that "The U.S. government had sufficient information to have uncovered this plot and potentially disrupt the Christmas Day attack, but our intelligence community failed to connect those dots."

Yet again, we see that despite being flooded with data, there was a failure to turn that data into actionable knowledge. Apparently, according to the article, inteligence agencies knew that the suspected bomber Abdulmutallab had traveled to Yemen, a spot of brewing anti-US terrorism plots, that his father had contacted the US embassy in Nigeria to notify them of his son's activities, that no one asked whether Abdulmutallab had a US visa, or whether he should have been added to the no-fly list. Also, the fact that he purchased a one-way ticket and no checked luggage might have raised some concern as well.

Any of these events should have triggered some action, but the fact that they didn't potentially raises a different question: how often do we miss events that should trigger a security response? I am sure a lot more frequently than we'd like to believe, and that might raise your level of anxiety.

And that raises another different question: what is the probability/risk that a missed event is a critical one like the Dec 25th situation? Of course, a low probability might alleviate some of the anxiety.

However, from a data perspective, the issue is a matter of data sharing and integration - protocols for capturing the key semantic aspects of logged events could be published to a common repository that could be continuously monitored, mined and evaluated to determine when some proactive action should take place. Is MDM the answer? Maybe, or perhaps a master repository published to a cloud environment with layered data services for rapid identity resolution...

Oh, check out this interview to understand a little more about national security.

 


Posted January 6, 2010 6:57 AM
Permalink | 2 Comments |

I came across a few articles last week talking about easing the cost of compliance for small and medium companies for Sarbane Oxley.  This article from the New York Times comments that "the House Financial Services Committee moved to permanently exempt companies worth less than $75 million from the auditing provisions of the Sarbanes-Oxley Act."

Despite its being touted as a measure to ease the financial burden, I have reservations about this for two reasons. First, eliminating the auditing provision essentially removes any capability to ensure investors that there are processes in place to verify that the financial data meets specified compliance criteria. In turn, this opens the door for noncompliance and places the burden on the shareholders to force the company to be honest about the company's finances.

Second, it eliminates the need to institute a key best practice for data quality management - transparent inspection and monitoring of enterprise data. As a data quality practitioner, I am disappointed that the government is stepping away from mandated data quality management and data governance.


Posted November 13, 2009 6:48 AM
Permalink | No Comments |

Before my current career in data management, I had a previous life as a software developer, working on designing and implementing compilers for Fortran and C for massively parallel processing (MPP) computers, and while I have been working on data quality and BI for the past 13 years or so, I still have a great interest in the high performance computing space. Recently I have had the opportunity to indulge that interest with respect to learning about the distributed/parallel programming model that Google has championed called MapReduce, and its relationship to the use of analytical database management systems.

There are some similarities, some differences, and ultimately, the two paradigms are complementary when it comes to supporting end-user business needs. If you are interested in the thought processes, check out this analysis paper, funded by Vertica, which compares and contrasts both high performance approaches.


Posted October 5, 2009 5:45 AM
Permalink | No Comments |

An interesting article about people leaving facebook caught my eye because it resonated with some of the same issues I have had with it - inspired nosiness, misrepresentations of the concept of a friend (vs. connection), the way some people become obsessed and absorbed into it, and other observations.

After I had signed up (prodded by an old friend with whom I had fallen out of touch), I started to see others from my (growingly hazy view of the) past contact me asking to be connected. I guess I just said yes, and ended up with some connections, which led to other requests, etc.

So facebook is a little different than my other social network, linkedin.com, which is valuable to me as a business tool. Facebook does not provide that value, although it is interesting to see what people I used to know a long time ago are doing (hmm, a little nosy there, eh?).

The problem is that there are reasons that I stopped being in touch with a lot of former acquaintences, and getting back in touch with people that I no longer have much in common with is interesting at first but benign moving forward. And despite the few situations in which I am connectede with someone I regret losing touch with, it makes me have to actively ignore people that I have been able to passively ignore for a good twenty years or so.

On the other hand, there are some folks (like my friend Jeremy Epstein) who are building careers out of exploiting social marketing, and from an information perspective, there seems to be a lot of opportunity (check out Stephen Baker's book Numerati for some good examples as well).

I am interested - what is your experience with Facebook - as a connectivity tool, as a business tool, as an entertainment forum? post your comments!


Posted September 3, 2009 10:03 AM
Permalink | 4 Comments |

Strolling around the exhibit floor at the TDWI conference in Chicago the past few days provided an interesting look into a rapidly evolving trend in data warehousing applicances. Of the 30 or so vendors exhibiting, I counted at least 7 that would be considered appliance vendors:

DATAllegro
Dataupia
Kognitio
Netezza
ParAccel
Sybase
Teradata

I might throw Oracle, HP, and Sand Technology in there as well, but I think you see my point - there seems to be the perception that there is a market for high performance "plug-in" systems to deploy data warehouses. What is perhaps even more interesting is that half of these vendor offerings are not specifically hardware appliances, but rather software database systems that can be deployed on top of different hardware systems - in other words, they are "software appliances" (!?)

In essence, many of these approaches, along with some from other vendors as well (Vertica was notably absent from this crowd, but showed up at the previous Las Vegas TDWI) focus on structural optimizations (such as columnar-oriented databases) that are very well-suited for loading into core memory and providing very fast read access, making it especially nice for query/reporting clients. The realization that the database system can be optimized and parallelized in a way that is decoupled from the hardware makes these software-only approaches look very cost-effective, especially when considering sizing a warehouse to meet current needs while considering future growth. Not only that, these systems are finely tuned for performance, (see Mark Madsen's comments about ParAccel's TPC-H benchmark scores).

The common theme with the software appliance crowd is lowering the barrier of entry to Small/Medium businesses seeking to jump on the BI bandwagon. WIth a variety of operational modes that span full-blown deployments (with hardwre purchase and integration) down to a service-based hosted model, this platform enables data warehousing at a fraction of the cost. This concept in its own right is worth some more exploration, and I think I may try to address that in an upcoming column.


Posted May 15, 2008 11:40 AM
Permalink | No Comments |
PREV 1 2

   VISIT MY EXPERT CHANNEL

Search this blog
Categories ›
Archives ›
Recent Entries ›