Business Intelligence Network business intelligence resources

Blog: William McKnight

Main

November 1, 2007

Where the largest databases will come from

In "Top 10 Largest Databases in the World", the author covers the top 10 largest databases in the world. Their sizes are impressive indeed.

Speaking of large databases, I recently commented in an article that "The amount of information uptake into a corporation when RFID is implemented can be unprecedented. The largest data stores in the world soon will be in manufacturing and will comprise mostly item movement data."

Continue reading "Where the largest databases will come from" »

October 3, 2007

Wednesday What: 2 cents on Executive Sponsorship

To say you should “pick” your data warehouse executive sponsor carefully would be a rather strange statement. Not many DW programs can pick their sponsor. Usually, it’s the other way around. The program staff must deal with the sponsor that has chartered that course of action for the organization. DW programs need to have that top down driver and support at the executive level. Nonetheless, the executive sponsor acceding to his or her roles is critical to success and hopefully those roles are clear or there is the opportunity to make them clear.

I have worked with all manner of executive sponsor, from those who intuitively get it and put forward the effort required for success to those who need some guidance. Most are more than willing to listen and align their actions with successful practices for DW success. Generally, I ask my sponsors to invest in understanding DW systems generally and within their company; lead, as necessary, the governance meetings; provide overall direction for the DW and keep the DW out of internal cross-fire.

Continue reading "Wednesday What: 2 cents on Executive Sponsorship" »

September 26, 2007

Wednesday What: Data Modeling, the Enigma

End clients generally end up overrating the usability of packaged data models, whether as stand-alone models or those models inherent in their software packages. Sooner or later, some amount of what I call ‘original data modeling’ is going to be necessary in any enterprise. Modeling expertise is a must-have though companies go to enormous lengths sometimes to avoid it. I don’t advocate necessarily changing packaged models in applications. However, no untouched application models, or combinations thereof, are going to be completely sufficient for a data warehouse in Fortune clients.

Continue reading "Wednesday What: Data Modeling, the Enigma" »

September 20, 2007

Wednesday What: Prove you did it

Time for the Wednesday (or thereabouts) “What” (what I have learned…). OK, I seem to be endlessly prompted in my client work with these learnings so there’s no shortage of them, but sometimes I don’t have an elegant preamble to a blog entry. So, I’ll just say it.

You’ve got to tie that warehouse data back to source or users will cry foul. It doesn’t matter how dirty the source data is. If you want to change the data en route to the warehouse to clean it, fine, change it, but bring the original data as well in a different set of columns in order to prove your tie-out.

Tie-out should make you more comfortable with your ETL as well. It sometimes involves adding pre-extract queries to the source data and post-load queries to the warehouse data. It sometimes involves ‘spot’ query checks, which can get tricky. I.e., the method used to pick your spot data can come under scrutiny. It also gets tricky when the ETL is run intra-day or real-time, when ETL cycles are at an absolute premium. However, you still need to do it IMO. These tie-out results go in your operational metadata.

Continue reading "Wednesday What: Prove you did it" »

September 5, 2007

Microsoft Excel: Don't Mess

Here’s one more on the theme of what I’ve learned. You’ve heard of Don’t Mess with Texas? Well, how about Don’t Mess with Excel!? Users love the sense of control over the data and the ability to perform their own calculations. (Other) BI tools will not sweep Excel out of any enterprise. However, precisely because of its flexibility, Excel is notorious as a source system for data warehousing and its applications need to get into a DBMS to serve the organization in that capacity.

Continue reading "Microsoft Excel: Don't Mess" »

August 28, 2007

Data warehousing: What I've learned in 15 years

In no particular order, I’m going to be addressing this topic in a series of blog entries, starting with the approach to the build.

While a top down approach may seem ideal, data warehouses get built bottoms-up. The best data warehouses are built bottoms-up, but the worst data warehouses are built extreme bottoms-up. By extreme, I mean without any sense of where it’s all going, costing, best practices or where the ROI is going to come from. Like a virus growing within the organization, so the data warehouse expands to encompass other random and redundant data, becoming important enough to keep around, but with an organization that’s never sure why and with increasing concern about what it doesn’t do. Eventually, it gets redone until enough top-down is inserted into the process to make it usable. So, in other words, injecting some top-down elements into data warehousing is essential, but don’t believe it’s going to be complete top-down.

Continue reading "Data warehousing: What I've learned in 15 years" »

July 25, 2007

When Data Warehouse Projects are Successful

In the Data Warehousing industry, we are continuing to see the maturation of the value proposition and the management of risk. In the early days, the technology was experimental. Data Warehouse projects consumed $millions on nothing more than the promise of “if we build it, I’m sure it will pay for itself. After all, XYZ company found out something that caused their warehouse project to pay for itself in only six months!” Vendors were great at sending the message that “all of your competitors are building these systems in secret, because they consider it to be a competitive advantage. We would share more information, but we are under non-disclosure.”

The promise of striking gold in them thar hills of data was the subject of serious boardroom conversations. And those that failed to achieve the promise, either because the system was never built, or because it was delivered late and way over budget, or because they didn’t find the nuggets of gold they had hoped for, kept quiet. They didn’t want their colleagues or competitors to know.

Now it is generally known that Data Warehouse projects can fail, and have failed, and as a result, less of them actually do fail. We understand the risks and how to manage them.

Here are several of the factors that have contributed to our ever-increasing success:

Continue reading "When Data Warehouse Projects are Successful" »

March 16, 2007

Time for a new approach to Information Management

I'm getting concerned about the data warehouse. It has served us well, but can the current profile of data warehouses out there handle the next 10 years or will widespread changes be necessary? Consider that most data warehouses out there are not best practices by definition and are therefore dumps of operational data where history collects and reports are run from. This only solves some of the challenges associated with going it alone with just operational data, which are:

Data access
Reporting capabilities
Concurrency between query and operational needs
Structure for data access
Data quality for data access
Data integration
Storage of history data

Notably, it is the concurrency and history issues that instigate many data warehouse programs. However, integration is largely limited to data sharing a common database instance - which is good, but leaves too much complexity to the data access layer, where the end users find the data access tools too complex already. Building summaries and making sense of the data warehouse structure and data, especially without metadata, which most DW lack adequate levels of, is exasperating so current users mostly skim the surface of their true needs.

Also, data quality is only addressed in data warehouse programs out there selectively. Many remain afraid to change operational data, even if it is wrong. It needs to be fixed operationally anyway, and that just isn't happening enough.

So, how is data warehousing supposed to fit into this new world of data explosion, real-time requirements and a need for process-orientation?

Continue reading "Time for a new approach to Information Management" »

December 19, 2006

Federal Government Data Warehouse

I have heard a lot about, though not worked on, the federal government's data warehouses. Some things are pretty clear. It is (they are?) large. This article cites 659 million records in the FBI's database. Look at the data sources - FBI records and criminal case files, Treasury, State and Homeland Security departments and the Federal Bureau of Prisons - more than 50 FBI and other government agency sources.

Continue reading "Federal Government Data Warehouse" »

October 10, 2006

The realities of Data Warehousing today

I was just thinking about what the unique realities of data warehousing today are. As I see it, the top realities are:

• Multiple, complex applications serving a variety of users
• Exploding data size that will continue to explode with RFID, POS, CDR, and all manner of transactional data extending back years into history
• Data latency is becoming intolerable as needs demand real-time data
• A varied set of data access tools, serving a variety of purposes, for each data warehouse
• Multiple workloads streaming into the data warehouse from varied corners of the company as well as from outside the company
• A progression towards more frequent, even continuous, loading
• Data types running the gamut beyond traditional alphanumeric types

August 14, 2006

Data warehousing dead?

Tanjian Norman commented on a post from 2005 as follows:

"DW is dead? It seems that EAI/EII or more importantly service enablers like SOA are complementary to a DW environment. Please elaborate on why you think DW is positioned for replacement by EAI/EII technologies."

He was responding to my post in which I said, among other things: "DW will eventually go the way of EAI. The extra data store in the picture is redundant and the market eventually drives out inefficiencies."

I thought I would bring my response into a new post...

No, DW is not dead. It's alive and well and thriving. Almost every midsize and up company has at least a semblance of one. A robust data warehouse is highly desired and sought after everywhere. My point is about the future - when exactly I do not know. I agree that EII is complimentary-only to data warehousing today, but its merits should surely be considered.

Data warehousing is evolving. New data warehouses and data warehouse rearchitectures are well advised these days to consider not building a pure batch-loaded data warehouse where all analytic calculations, including master data and all reporting is done. There are several layers of calculations, functions and even data that are no longer necessarily part of a robust data warehouse reference architecture.

1. Master data calculations - Master data is not ideally calculated in a downstream data warehouse. It is needed in the operational environment as well as the data warehouse. As time goes on, the data warehouse will be a receiving system for the master data.

2. Operational business intelligence - I blogged about this here. There is certainly a lot of calculations that do not, or cannot, interact with the data warehouse in order to be effective. This can go well beyond basic operational reporting from a single system.

3. Yes, EII. EII is able to facilitate multi-system operational reporting and business intelligence. Some clients believe that several of their data sources do not need to be fed into the data warehouse if they can run EII queries and eliminate the redundancy of having the data in multiple places. EII can handle multiple databases in multiple formats, referential integrity, XML and basic transformations. EII still has a long way to go (query tuning, 2 phase commit, business metadata, memory constrained, etc.) and data warehouses are still absolutely vital, but it shows promise and is another factor chipping away at the data warehouse requirement.

4. Modern ERP - There was a time when ERP vendors debated that the data warehouse was not necessary. When this was proven untrue, they provided packaged data warehouses so at least they kept some of that business too. In the background, they've continued to add functionality to the base ERP to keep chipping away at the data warehouse requirements. Today, one of the striking things about ERP systems is they are keeping history data indefinitely. Having a historical data repository used to be one of the main reasons for building a data warehouse, but that is not always a data warehouse requirement any more.

And finally, a "real-time" data warehouse is evolving to look more and more like an ERP system itself, with real-time feeds of operational data, triggers and analytical applications. So, the definition of data warehousing may be changing to keep the term active, but the data warehouses themselves are evolving.

I hope this helps. Feedback welcome, which could be interesting...

April 28, 2006

Data Warehouse Manager pleads guilty

Access has its privileges … and responsibilities

Remember that fellow college student who ran the college computer system? He could see everyone's class schedules, grades, ratings, etc. if he wanted to. (I was that guy, by the way.) Everyone else with that access had titles like Dean, Professor or President. IT staff always had — and still has — special privileges and access.

With privileged access comes responsibility, and sometimes that privilege is abused. Who has the highest privileged access to data for non-business meetings other than the data warehouse team?

Consider the following: A group inside ERCOT, the Electric Reliability Council here in Texas, allegedly created bogus companies that charged more than $2 million for completing fake work. A guilty plea deal was arranged last week with a person in the scheme. His title was – you guessed it – data warehouse manager.

The Sarbanes-Oxley Act requires CFO to sign off on company numbers or face severe consequences. Is it a stretch to think that responsibility would be shifted intra-company — or even at the legal level — to whomever manages the data, the CIO? And then, perhaps, on to those individuals with privileged access to a wide range of data before anybody else sees it, such as the data warehouse team?

Privileged access requires responsibility and accountability. CIOs need education on laws affecting corporate information and need to stay alert to regulations about historical data, vendor data and all company data. The data warehouse manager is in position to help, but clearly he or she, too, needs to be controlled as well.

March 22, 2006

Easter Island and the tale of 2 data warehouse programs

I am fascinated by Easter Island and plan to visit there someday to see the giant moai. I have a picture of one on my wall, oversighting me as I work.

Easter Island is a story of a people who, while the land was productive for them, allowed themselves the luxury of building giant statues of their ancestors, called moai, from the resources. Over time, as the land became less productive, moai-building ceased and there's evidence of a backlash against the statues since many were toppled and even unmoved from the inner-island quarry. However, their ecosystem was forever affected by the abundant use of resources for the moais, and eventually the populated dwindled and probably suffered immensely.

I do several data warehouse assessments each year where I analyze programs from all perspectives and render analysis. If it's warranted, the assessment is critical. One program I have been involved with for years received a very harsh criticism of their program, with specific remedies. It was a funding sinkhole. You name it, they were doing it wrong. They took the advice and followed it to a 't'. They now have an effective program with hundreds of users accessing clean, timely, documented data in various reasonable manners of delivery. They have documented high ROI and fabulous quotes from the user community. With nice promotions all-around, they're now a candidate for a best practices competition.

I assessed another program about the same time. They weren't nearly as bad off and the assessment reflected that - more complimentary than critical. The program was stalled however and there were several areas they needed to work on. Upon receipt of said assessment, it was immediately stamped 'confidential' and buried in a drawer. A couple of years later, they are in a worse position than they were during the assessment. They continue to throw people at the problem, doing things in the same ways as always and generating the same, mediocre results. The staff still goes home at night without a sense of accomplishment. They've actually accustomed themselves to protecting the status quo and not expecting much. Meanwhile, executive pressure grows upon the program as their pole position in their industry is coming under serious challenges.

So why did this program bury the assessment? The answer is there were critical points in it that would have made team members look bad for a small handful of decisions. This program continues to utilize their resources in the same manner as before. They are headed for an eventual dark turn. This team wants to be able to tell upper management they've had an outside assessment and everything is fine. The reflection upon data warehousing itself then is poor and will worsen. I can see the executives saying "if this is good, then let's not do data warehousing at all."

The 'moral to the story' is that it is possible in data warehousing to have the foresight, sometimes with the help of a second set of experienced eyes, to avoid disaster and put your program on an excellent trajectory - but only if the advice is taken.

February 24, 2006

Use the data warehouse, go to jail

If you use it the way Scott Levine of Boca Raton, Florida did. As reported in ZDNet, Mr. Levine "was found guilty of breaking into Acxiom's servers and downloading gigabytes of data in what the US Justice Department calls one of the largest data heists to date. Acxiom, based in Little Rock, says it operates the world's largest repository of consumer data, and counts major banks, credit card companies and the US government among its customers."

It's always interesting to me when the term data warehousing gets into real news. Most of the time, it's negative news like this, but at least it's getting the credit for storing the data (although without proper security in this case).

December 17, 2005

LaMarmotta and the abandoned data warehouse

At a place now called La Marmotta, which is on Lake Bracciano, less than 20 miles northwest of Rome, a lake village has been found from 5700 B. C. (link: fee required). After several months of careful vacuuming in 1994, a 35-foot-long dugout canoe emerged. It was seaworthy, not just lake-worthy. Several years ago, a team bulit a copy and sailed nearly 500 miles along the Mediterranean coast.

The Marmottans came from far away. They brought pigs and cows, they cultivated flax and made wine. Pots contained grains and bones, the remains of Neolithic stew. They were in touch with other communities in the Mediterranean with many ships coming and going. That coming and going lasted more than 400 years. It was a well-organized village.

The settlement survived for at least 4 centuries before it was abandoned, suddenly and mysteriously, in about 5230 B. C.

Continue reading "LaMarmotta and the abandoned data warehouse" »