Business Intelligence Network
business intelligence resources

Blog: Dan E. Linstedt

Main

May 4, 2008

Operational Data Warehousing on the way...

Before we get to Dynamic Data Warehousing, we need to first reach Operational Data Warehousing. Now I realize that I'm not the first, nor will I be the last to use or even possibly abuse this term. In fact if you search on the term today you'll get tons and tons of hits. I do however believe that Data Warehousing and BI as an industry have gotten slow, and become somewhat of a laggard in terms of keeping up with technology. Just look at the adoption curve of DW2.0... It simply isn't there yet (wish it were). Anyhow, in this blog let's take another look at the ODW as Bill Inmon and I are beginning to discuss it.

Continue reading "Operational Data Warehousing on the way..." »

April 25, 2008

Part 6: Secrets of the Masters

To follow on with our series, we'll dive in now and explore some of the elements needed for a repeatable, consistent, and redundant project. These are components that make the project book completely usable - without these pieces, the project methodology usually sits on a shelf and gathers dust. What we are aiming at is: the hope of reducing overhead costs, reducing errors, increasing productivity, and increasing agility of I.T.

Continue reading "Part 6: Secrets of the Masters" »

March 18, 2008

New Skills Required - Interactive BI

Do you still think that knowing flash and actionscript is not a newly required skill? Have you seen the latest version of Crystal Reports from BO? It now contains a front-end product that used to be called Excelsius, which is a flash-based front-end BI dashboard library. You can drag and drop buttons, charts, pre-built reports, and other things on to the different scenes in the dashboard. No more flipping pages, and writing PHP code or Java code to exercise BI on the client side. It's a highly protected environment.

Continue reading "New Skills Required - Interactive BI" »

March 16, 2008

Part 5: Secrets of the Masters

As this series progresses, I've received some wonderful comments, thank-you to all who are replying. In this entry we'll talk about some of the additional skills that are helpful in managing and developing successful projects. We've touched on a few already, but I'm not convinced we gave enough credit to these items. Many people argue with standards, claiming they are over-burdening their development - claiming they can't get their job done with them, claiming they are too verbose or have too many pieces to work effectively. They revert to RAD, JAD, and extreme methodologies...

Continue reading "Part 5: Secrets of the Masters" »

March 2, 2008

New skills required for BI and Data Presentation going forward

So you've all seen Flash production movies? You've all heard of Pod-casts? How about web-interactivity without "changing pages"? I'm sure you've seen Flash produced web-sites, or played an animated game lately. This post is about the new skill sets needed by BI vendors, and Business Intelligence Analysts to survive the new upcoming wave. Those of you producing PowerPoints, or sitting in the background coding "BI Reports..." you've got a few things to learn.

Continue reading "New skills required for BI and Data Presentation going forward" »

February 25, 2008

The new evolution of Data Modeling

Bill Inmon and I sat down the other day to discuss a system that we are building. We didn't have a good "name" for it, but what it amounts to is: Operational Data Warehousing. If you can believe it, what we've done is taken the Operational specifics of systems capturing data - and placed it on top of the Data Warehouse as a single integrated historical and operational data store. We are currently using the Data Vault model for this componentry. Some folks have called this "Active Data Warehousing" in the past, but we feel that this is one step beyond, in that it actually IS the operational store at the same time as being the Data Warehouse. Convergence has arrived...

Continue reading "The new evolution of Data Modeling" »

February 15, 2008

Part 4: Secrets of the Masters

In this entry I'm going to get on a small tangent about "contracting" with companies that execute in a consulting realm, what to watch for, what to ask about, how to negotiate with these companies. These companies are famous for "squeezing" you as a customer to pin you down on deliverables (this I see as completely fair) in order for them to get paid, they _must_ have a set of clearly defined deliverables and timelines signed off on. However, these companies are also interesting in another light. I'll tell you a true story (without names) about a review of certain companies who pitched to solve a problem for a very large customer. Our team was involved in "reviewing" their bids.

Continue reading "Part 4: Secrets of the Masters" »

January 31, 2008

Part 3: Secrets of the Masters

Every good BI/EDW solution is backed by a good architecture, DW2.0 is no different. The frame-work that DW2.0 provides is a sound framework with all the components necessary. That said, in addition to the framework, architectures need to exist at different levels, as do standards, and templates. A solid enterprise data warehouse project usually contains many of the following components that implementers and consultants use to make a project successful.

Continue reading "Part 3: Secrets of the Masters" »

January 28, 2008

Part 2: Secrets of the Masters

With these posts I hope to shed some light on what makes projects work, no matter the scale, no matter the time-line; always with an eye on costs, overhead, and a watch on the number of errors. In part 1 of this series I introduced top-level concepts required by nearly all "good" projects. In this entry I will dive a little deeper into what these concepts bring to the table, and also add some lower-level concepts that are also necessary in successful projects. I'm specifically targeting 2nd generation warehouses, and DW2.0 - in an effort to move forward.

Continue reading "Part 2: Secrets of the Masters" »

January 27, 2008

Secrets of the Masters - Part 1

It's been a while since my last blog entry, my apologies. I've been heads down building companies lately, and they seem to be starting to gain steam. All that aside however, I've been thinking quite a lot over the years about these notions we hear about: top 10 this, top 10 that... most of these are about "mistakes" we can make, followed by short sound bytes about what or how to look for answers. There are books and books filled with great information about building, maintaining and deploying an Enterprise Data Warehouse, and then there are architectural discussions from John Zachman, Bill Inmon, and others. All good information....

Continue reading "Secrets of the Masters - Part 1" »

December 21, 2007

IT Agility and Business Stove-pipes

There are problems in I.T. today with a lack of agility. There are issues with Business creating their own spread-marts in MS-Access, Excel, and OLAP Cubes. There is a widening gap between the "corporate Enterprise Data Warehouse" and what I.T. can provide, how quickly they can adapt, and how cost effective they can be going forward. There is a rise of something called 2nd generation data warehouses... Why? Because 1st generation warehouses are suffering from "stove-piped solutions" re-created by using the incorrect modeling techniques for your data warehouse. Bill Inmon has been writing lately about data modeling and how to do it properly. In this entry I'll dive in head first into these issues, and what's going on in the industry, and what you can do about it.

Continue reading "IT Agility and Business Stove-pipes" »

November 19, 2007

Fabric Systems, MPP, and SMP Compute Power

I recently attended the Oracle Open-World show, it was well put together. The vendor floor was extremely busy. I learned a little bit about some up and coming technology called FABRIC SYSTEMS. There are a few vendors out there (it seems) trying to make a go of it. As a result of learning about fabric based systems, I'm going on a quest to learn more. What I learn, I will share here - because I believe that these types of systems will change the landscape as well.
Fabric based systems seem to have MPP in mind, but allow the individual SMP's units to act together. They appear to be scalable, and high powered.

Continue reading "Fabric Systems, MPP, and SMP Compute Power" »

November 7, 2007

Operational Data Warehousing / Active Data Warehousing

There's a lot of stir in the market place these days with a term called: Operational Business Intelligence, over the years there's been a lot of interest and usage of the term: Active Data Warehousing, which according to Bill Inmon is really: "real-time data warehousing" or "operational data warehousing." What is this thing? Has it been defined before? What kinds of systems are we building these days and what is the next stage going forward? In this entry we'll look at the terms Operational Data Warehousing, Operational Business Intelligence, and Active Data Warehousing...

Continue reading "Operational Data Warehousing / Active Data Warehousing" »

October 16, 2007

Indexing: VLDW and Data Sets

We are nearing the end of the entries I will be making (for now) on the VLDW world. I will discuss indexing going forward in a traditional RDBMS engine point of view. "Appliances" are changing some of this as they move into the field. But for now, indexing of large data sets requires some consideration.

Continue reading "Indexing: VLDW and Data Sets" »

October 11, 2007

System of Entry, System of Record - System of Shifting Sands

I recently attended Teradata Partners conference, which was a lot of fun, one of the things they discussed was governance, data stewardship, data ownership - and of course: Claudia Imhoff in her masterful presentation of MDM talked diligently about SoR, SoE, and a few other acronyms. The gist of the statements (across the board) was that System Of Record lines are blurring. Shifting Sands I might say...

Continue reading "System of Entry, System of Record - System of Shifting Sands" »

October 9, 2007

Context and Perspective

I sat down with my good friend Jeff Jonas yesterday and discussed the nature and notion of contextual processing. Jeff is a phenomenal individual, and much smarter than I ever hope to be, but all that aside, we had a wonderful conversation about the nature of processing streaming data (one piece at a time, or possibly multiple pieces in parallel, but separated) and how to focus the notions of context.

How is this related to B.I.?
It has everything to do with Business Intelligence, and how we "experience" and use our data sets/patterns within to make sense of our business, especially in an Operational B.I. world

Continue reading "Context and Perspective" »

October 2, 2007

Over-Normalization: VLDW and performance of queries

Just like there is a danger in over-denormalization (overrunning the block sizes, causing chained rows, and a multiplier to reading the data), there is a danger in over-normalizing... Or is there? Lately there has been renewed discussion about column-based-solutions coming in to play (but that's for another blog). In this blog entry I discuss the dangers of over-normalizing data on a traditional row based database system, especially as it relates to VLDW and MPP.

Continue reading "Over-Normalization: VLDW and performance of queries" »

September 29, 2007

Normalizing / Denormalizing: VLDW & Architecture

I've blogged a little bit about this before. In this entry we'll explore the mathematics side of normalization and denormalization in the data model. Of course, this entry is also specifically targeted at very large data sets. On small systems that can't handle the mathematics, performance degrades to a certain degree. There's a rumor out there, an un-truth, a fallacy that states: if you're having performance problems in your database, simply denormalize (flatten the tables, remove the joins). This works, but only to a point.

Continue reading "Normalizing / Denormalizing: VLDW & Architecture" »

September 25, 2007

ETL/ELT: VLDW and Multi-passing data

In my last entry I got caught up explaining how to reduce the data set and increase parallelism. That should provide a huge bang for the buck for most installations. Again, we must go against the grain of what we learn when we deal with very large volumes of data. What we find is that large data takes a specific architecture to solve these problems. In this entry we'll discuss the ability to multi-pass data sets rather than single pass data sets, and why this makes a difference in performance.

Continue reading "ETL/ELT: VLDW and Multi-passing data" »

September 24, 2007

ETL/ELT: VLDW & Data Integration Details

I've been blogging on this topic now for at least the past week. I've gotten some good feedback from a few of you out there in blog-land (thank-you). I've seen a couple questions on "how" to reduce the data set, and what I mean by "multi-passing" the data, and so on. Keep in mind that it's much easier to teach these subjects in classes, than it is to blog on them and get all the needed details put in place for everyone to read. In this blog I provide additional information on how to make performance work for large data sets and data integration.

Continue reading "ETL/ELT: VLDW & Data Integration Details" »

September 21, 2007

ETL Engines: VLDW & Loading / Transforming

I hope you've enjoyed this series; I've not received any comments either way. I'll be finishing up this series soon. In this entry I'll address "ETL" processing in general, in another entry I'll discuss "ELT" in general, and then I'll begin to discuss BI Queries and Query engines going forward, finally at the end of this series I'll bring in a couple "appliances" or "appliance like" performance enhancers.

In this entry, I'm going to focus on super fast, high speed ETL. This entry is less about the tools, and more about the architectures that work. I hope you enjoy it.

Continue reading "ETL Engines: VLDW & Loading / Transforming" »

September 17, 2007

Applications In General: VLDW and Machine Play

I've got this series going on VLDW, and I'd like to continue it for a little bit longer. In this entry I'll dive into different types of applications in a VLDW environment, and their impact on the machines underneath. There are indeed impacts to large data sets passing through machines with specific types of applications (including database engines) enclosed within. What some people tend to forget is that Volume & Latency change everything. It is absolutely vital when performance and tuning for extremely large data that sight of the applications on the hardware is not lost.

Continue reading "Applications In General: VLDW and Machine Play" »

September 12, 2007

Database Specifics: VLDW & Switches, making it work

So just how do you get a particular Database to work for you in a growing data environment? There must be something you can do right? Yes, there are certain switches, and specific knowledge which you need to focus on in order to get the RDBMS engines to scale. This entry is all about those specific databases, and their switches. While unfortunately (due to legal ramifications) I cannot discuss publicly the performance numbers (it's all in the fine-print of the licensing agreements they have), I can discuss the switches that make things fast. Remember, that unless you have the proper hardware, and the right architecture, these switch settings won't make a difference. In fact, they may even slow your system down.

Continue reading "Database Specifics: VLDW & Switches, making it work" »

September 9, 2007

Databases and VLDW: Petabyte Scalability

If you've been following along, you've noticed that I've been writing about Terabyte (50+) to Petabyte levels. In this entry, it's no different. I'll discuss database engines generically, and their usage as a VLDB component. I'll use another entry to discuss appliances (separately) that contain embedded database engines. Databases all have their quirks and notions, but there are some generalities that simply stick - that you cannot ignore, that most people have ignored over the years... I'll try to dispell these false-hoods, and bring the truth to the table.

Continue reading "Databases and VLDW: Petabyte Scalability" »

September 8, 2007

Disk and VLDW: What you need, can use

In very large data warehousing, or VLDB for that matter, I am constantly asked: what kind of disk should I have? can I use a SAN, how about a NASD, what about DASD? I have RAID 5, is that good? Now there is Raid 7, 5+, S, 10, and so on. There are differences that DO make a performance difference in the disk that you are using, and when you are dealing with very large data sets you MUST have throughput. This is the optimal end-game.

Continue reading "Disk and VLDW: What you need, can use" »

September 6, 2007

Operating Systems: What you need for VLDW

So you read my post on Hardware right? If not, take a gander at it... I'd like to think it's mostly complete. This entry focuses on Operating Systems, what the machine needs to work under severe volume loads, and how the operating systems react. In the near future I'll have a post on Applications including ETL / ELT, and Database engines, and then I'll move back in to the business side: skill sets, requirements, defining/gathering/estimating, etc.. All of these things are items that I teach at TDWI in VLDW. I try to keep it fresh. This however, is a short posting.

Continue reading "Operating Systems: What you need for VLDW" »

Hardware: What do you need for VLDW?

In this entry we'll explore the requirements of VLDW, what you might need to build one, or to grow it going forward. Don't forget that everyone's definition of VLDW is different depending on the starting point. No matter if you're scaling from 30GB to 1 TB, or from 50 TB to 1 PB, these guidelines and simple pieces should help you move forward. I guess you could treat some of this as a "score-card" for your environment. This is a HARDWARE LOOK at what you need. In the future I'll blog on the software requirements, and then finally, I'll address appliances.

Continue reading "Hardware: What do you need for VLDW?" »

Commoditization of the EDW/BI Market.

Like all markets, the EDW and BI market is moving (if it hasn't already) towards commoditization in implementation. Many would argue (even today) that this is "impossible" - as I've heard it, they still believe that every DW / BI solution must be custom driven, must be custom loaded, must be custom created. Why? Because we have "different requirements/different data/high volumes/real-time" and so on. We've heard it all before, it was the calm before the storm when ERP vendors rose up and said: You CAN have a standard, and the standard is "our solution." It happened with Source Systems, and now it will happen with EDW/BI solutions.

Continue reading "Commoditization of the EDW/BI Market." »

September 5, 2007

How Data Models can Impact Business

How does your I.T. department respond when you (the business) need to make a change? Do they come back to you with a long list of impacts, a long time frame to implement, a high cost, or all of the above? Are you (the business) so frustrated that you build your own solutions in Excel or MS-Cubes, or something else? The focus of this entry examines the problem from a business point of view, with a little bit of technical speak.

In this entry we'll dive a little deeper and try to discover at least one part of the problem that is impeding flexible change and incurring huge costs. We'll also make a suggestion as to how this might be fixed, and by the way - if you are moving towards real-time, or huge volumes of data or both, then this issue begs for a solution.

Continue reading "How Data Models can Impact Business" »

September 1, 2007

Why do business changes impact my EDW so much?

Well, if you're like most of the world, you have an EDW that is built from a Federated Star Schema... This could be part of the problem, but definitely not all of it. A large part of the problem is where the business rules sit in terms of processing data going IN to the EDW. Regardless of architecture, if the Business rules sit between the source systems and the EDW (and actually don't sit in the source systems themselves), then we (I.T.) are doing a great dis-service to the business community, and ourselves. If you're like me, and have been around the block more than once with a Data Warehouse, then you know that "Changes to business usually don't come cheap" within the EDW environment - and that usually translates into high dollars, high impact, and eventually high-risk (resulting in Super-Nova of Star-Schemas as an EDW).

Continue reading "Why do business changes impact my EDW so much?" »

Federated Star Schemas Going Super-Nova

Are you caught in the explosion of your life? Have you gotten to a point where federated star schemas aren't cutting it for Enterprise Data Warehousing? Is data volume pushing impacts on the star that can't be handled? What does a change to your single enterprise conformed dimension cost? How much time does it take, and how many impacts are there to integrate new systems or more data to your existing federated system?

Star schemas are GREAT ARCHITECTURES for solving subject oriented answers, they were and are not designed to be enterprise data warehouses! They get to a certain point where they get in the way of being nimble, creating workable solutions for the business, or constrain the business (because of cost or time to implement - due to impacts of changes). The business is now caught, in a super-nova exploding star. Let's take a look to see what happened, and what we can do to "fix" this situation.

Continue reading "Federated Star Schemas Going Super-Nova" »

August 31, 2007

Necessary Shifts in the Industry

For too long the industry has preached: load quality data into your warehouse, cleanse the data, manipulate it, and then load it to the warehouse. The mantra has been "never release bad data to the end users." There are hundreds of articles written about this, and probably quite a few groups carrying this mantra forward. But I have to ask, where did this mantra come from? How on earth did it get "written in stone?" What really, is the true value here? When we implement this kind of paradigm do we RISK getting the warehouse "wrong?" Are we integrating away problems which are causing the business to hemorrhage money?

Continue reading "Necessary Shifts in the Industry" »

May 15, 2007

Where o where is my metadata?

Well, it's been a thousand miles, and a million years since I've seen a good metadata interface GUI - or for that matter, a complete enterprise metadata data warehouse (MDDW). Something that not only reports and integrates the metadata but also allows modifications from a front-end user perspective. Something with security, thin client access, read-write (bi-directional ability), and so on. In this blog I discuss what I'd like to see in the future of BI and metadata management. Right now the market is very dis-jointed. This is somewhat of a one-sided rant, if the vendors would like to respond - I welcome the new information.

Continue reading "Where o where is my metadata?" »

May 3, 2007

Addressing Convergence, Appliances, and the Market Space

Appliances have a long way to go to mature - this is true. There are still a lot of customers asking for software packages to run on their existing systems. They rightfully want to leverage their existing investment in infrastructure. However, there are companies that are smaller (and some that are larger) that want to become more nimble, lower their maintenance overhead, replace old technologies with new for competitive advantage, and so on. These companies are looking at appliances in the market space. Appliances are growing up - albeit slowly.

What does this have to do with Convergence?
Software vendors are converging and acquiring to compete, they are beginning to see the value of packing incredible power and performance onto pre-configured hardware platforms, and providing web-interfaces for configuration. Software vendors are converging, as are hardware vendors, on to the notion that integration is king.

Continue reading "Addressing Convergence, Appliances, and the Market Space" »

March 27, 2007

Business Value of Data Vault Data Models

As it turns out, there are a lot of applications of the Data Vault architecture which I've built over the past 15 years. As testimony to the efforts, there are a few companies who've built concepts, and data models on the Data Vault architectures, then proceeded to patent the data models and the processes around the data models in order to provide competitive edge. I think the Data Vault architecture has finally grown up. If investment bankers can see the value of the Data Vault architecture and are willing to fund a patent effort on the models built based on the standards, then there must be a correlation between the standards in the architecture and the understanding of business users.

Continue reading "Business Value of Data Vault Data Models" »

March 25, 2007

I.T. Profitability - a follow on

I've received some good feedback and comments from readers in the field regarding an entry I made recently about I.T. costs and profitability. One comment discussed the notion that I.T. chargeback really isn't profitability, but rather just a shifting of sands, as the business has money which is simply re-allocated. In this entry we'll explore some other notions of profitability for I.T. along with discussing why Chargeback works in certain industries, but not in others.

We'll also discuss the notions of standardization and it's correlation to profitability. As always, I'd like to hear from you. What is it you have questions on or disagree with?

Continue reading "I.T. Profitability - a follow on" »

March 21, 2007

Defining Unstructured Data & DW2.0

My last post discussed the notion of unstructured data being as much as 80% of the data that we in IT will / should begin to deal with. One of the readers requested that I expand on what I'm including in Unstructured Data. This entry discusses the types of structured/unstructured and semi-structured data as I see it. As usual, this pertains to business knowledge, and is a huge part of DW2.0. As it turns out, it also is (or will become) a huge part of changing IT from a cost center into a profit center; why? Because if we can integrate unstructured information, and glean the knowledge from it (determine contextual linkages), we can better understand where our business holes are.

Continue reading "Defining Unstructured Data & DW2.0" »

March 7, 2007

Is IT really putting business out of Business?

I just read an interesting article in CIO decisions that talks about how IT's costs are continually rising, profitability of companies is becoming razor thin, and that IT (if it continues at this rate) will eventually put business out of business. There are several points to this article, and I applaud gentlemen like Greg Hackett for stepping out and stating things that most of us (including me) don't often see. It's like taking time to stop and smell the roses, you know you should do it, but when you have a split second decision to make, do you really stop?

Continue reading "Is IT really putting business out of Business?" »

February 5, 2007

BI/DW Appliances: I hate to say "I told you so..." but...

Every so often a future suggestion that I've discussed (made by more than just me) actually happens. Now I'm not the kind of guy to normally say "I told you so." However on this occasion, I feel it's important to announce that the market is changing, dramatically - and that software vendors NEED TO TAKE NOTE!!! EAI is now available in plug-and-play appliance format. In this entry we'll discuss what this means, and how it will affect ETL/ELT, EII, and EIEIO (old Mac Donald had an appliance... E-I-E-I-O).

Continue reading "BI/DW Appliances: I hate to say "I told you so..." but..." »

January 14, 2007

Time Value of Metadata

My friend Bill recently wrote an article on Time-Value of Information, in which he declared that the value decreases exponentially over time. I have no argument there when it comes to data sets that are non-metadata. However, where metadata is involved, I believe that the value of the some metadata actually increases in conjunction with utilization. Conversely - metadata that is not utilized drops in value like a stone in water. In this entry we dive into specific attribute based valuation, and begin exploring a hard method for finding / assigning value.

Continue reading "Time Value of Metadata" »

January 12, 2007

Performance of ETL From an Architecture Perspective

It seems these days that many people have similar problems with performance and tuning of their ETL routines (in another blog entry I'll discuss performance and tuning of ELT). ETL may be the "old-horse" in the stables, but it will exist for a very long time to come, as it serves many different purposes (such as sharing or balancing the workload) between the Transformation Engine and the Database Engine. Particularly where ELT is 100% database engine based, and puts some serious strain on the RDBMS (especially in huge volumes). So where does that leave ETL? What are some of the top suggestions for getting ETL to perform?

Continue reading "Performance of ETL From an Architecture Perspective" »

January 8, 2007

My Holiday Wish List for BI of Tomorrow

I've posted and written many different things over the years about what technology (specifically BI tool vendors, and RDBMS vendors, and ELT / ETL vendors, EII, EAI vendors) need to have in the future. This is another look at an updated wish-list, along with market expectations and what I'm seeing as faults in the industry today. Don't get me wrong, the vendors (some) are scrambling to put new technology in place such as temperature based data, high speed interconnectivity, and massive parallelism to handle volumes - it's just they aren't quite there yet. So here's a look at what I'm hoping to see in 2007 and beyond.

Continue reading "My Holiday Wish List for BI of Tomorrow" »

RFID Is Dead! Or Is It?

RFID (Radio Frequency Identifier Tags) have been stopped in terms of productions, usage, and mandates to be implemented from companies like Wal-Mart and others. Of course, you'll still see RFID on store shelves, particularly for larger and more expensive products - but this is a problem that has been stated as containing tons of problems ranging from ethical questions to simple data gathering questions. In case you're a follower of the RFID channel, you might be interested in some of these findings.

Continue reading "RFID Is Dead! Or Is It?" »

December 19, 2006

Necessities of Governance

Governance is an industry buzzword these days, with all the SOA initiatives going on, one would think that Governance would be on the top of the list as well. If you're not governing your enterprise consolidation, you probably are not taking full advantage of the benefits and cost savings that could be coming your way. Sure governance is an uphill battle in the beginning, sure everyone fights standards and agreed standards, and yes - absolutely - no one can seem to decide on how to define the common data sets (common data model). But if you're involved in, or working with SOA it is imperative to engage governance at the enterprise level. However it's not just governance that makes it work, a formal methodology should be utilized to assist with the governance as the organization organically grows its efforts. These include: ITIL, SEI/CMMI and a few others.

Continue reading "Necessities of Governance" »

December 13, 2006

Does Big Data Equal More Business Intelligence?

The question has been argued over the past two decades, is more data better? Do I really need more data? Where on earth is all this data coming from? How do I manage the ever-growing data sets? Does more data mean better business decisions? How can I reconcile these monstrous data sets? and so on... You've heard by now (I'm sure) many different folks in the industry offer their valued opinions. We can stand up on our feet and say: I'm on the fence - because half the time I hear it's the quality of the data that matters, the other half the time I have to defend the auditability and traceability of the data set in my warehouse. We can also stand on the fence because we can now "mine" for bad data patterns (only if we are collecting them), and learn where our mistakes are.

Continue reading "Does Big Data Equal More Business Intelligence?" »

December 8, 2006

Data Integration - Performance Numbers

I've been teaching and consulting on performance and tuning of systems architectures for 10+ years. I've seen the increases in performance across the board from many different vendors - hardware, software, network to disk to RAM to CPU and so on. This entry does not mention particular vendor names, but rather discusses the _nature_ of performance and tuning at the core of Very Large Systems - whether you're doing Business Intelligence / Data Warehousing, or simply data movement / data integration - these numbers (hopefully) will make sense to you. When I say very large environments, I'm talking about 1 billion rows+ _per file_ - handled within a single batch run, 1B rows per year of history loaded, with 5 years of history to load, that means the second through 5 years of history must manage "update detection" against an increasing set of rows on the target, on the order of billions. So what kind of performance do you want from your systems?

With DW2.0 around the corner, and unstructured data creeping in - and near-line, active, and historical storage coming on-board, it's more important than ever before to get your systems in top shape to handle massive volumes.

Continue reading "Data Integration - Performance Numbers" »