Blog: Dan E. Linstedt« October 2006 | Main | December 2006 » November 28, 2006High volume and low performance - what to do?If you're like the rest of the world these days you've got an ever growing data set, and at the same time an ever shrinking processing window. This is not something you want to treat lightly. In most cases, you are also experiencing severe performance problems and don't know how to deal with it, or haven't been able to solve these issues. Well, there are ways and means in which performance can be improved - I've been teaching, and consulting on performance of VLDW integration systems for 10+ years, there are techniques and manners in which your performance can improve. The catch? You have to be willing to swallow the blue pill (from the movie: The Matrix). Let's just see how far done the rabbit hole goes... So you're caught in performance issues, be it ETL, ELT, EII, EAI, or worse: RDBMS system. You have big data to move or maybe lots of data in very short time windows. Maybe you have huge data to bounce it against (look up, match, consolidate, quality cleanse, etc). Maybe you're pulling together 200M rows for one feed to a dimension. You're source is 20 to 40 tables, and you've got a window to worry about. How do you handle this situation and others like it? What do you look for? There are many different facets to examine in performance and tuning of your systems and your architecture. The major facets that I examine are: Of course, we also consider where the data is coming from: ERP, CRM, BPM, Financials, and other applications that lay up-stream, and whether or not the systems are delivering data in real-time, batch, or both. If there's one thing I rely on in many of these situations it's the numbers. The throughput numbers usually tell the story as to where the problems are originating, and then through orders of scale - what we can do about it. The numbers tell me how much, how fast, and give me the ability to re-factor the architecture where the pain points exist. In plain English please... I go through more than 100+ points in an architecture to pinpoint the top 25 that are causes to the performance problems in a system (too numerous to mention here), but I'll give you a taste of what I usually look at in the RDBMS side of the house. By the way, the customers who go through my performance and tuning assessment have seen on average, anywhere between 400% and 4000% performance improvements - taking run times from 48 hours down to 8 hours, from 8 hours to 2 hours, from 6 hours to 45 minutes, from 56 hours to just under 12 hours and so on. But it means the customers were willing to take my suggestions to heart and implement them. In the RDBMS world here are some things I measure by: And so on... If you have an interest in a particular area, please post a question or comment, and I'll try to blog on that going forward. In the mean-time, please be aware that we offer these assessments with fantastic results. Thank-you, November 17, 2006DNA Computing - What should it be used for?In this entry, I return to Nanohousing(tm), the notion of utilizing nanotechnology for computing, and Business Intelligence purposes. Remember that these writings are an attempt to go beyond the horizon, and are futuristic guesses on what specific points of nanotech can be applied to within the DW / BI world. It will take years to get to these points, but rest assured - changes are happening. One of the areas that have really interested me in nanotech is the notion of DNA computing, that is using DNA strands form and function (combined) to serve specific computational purposes and answer specific questions. "The hope of this field is that the pattern matching and polymerization processes of DNA chemistry, combined with the enormous numbers of molecules in a pound, will make feasible computations that are now too hard for conventional computers." DNA Computing, http://www.fas.org/irp/agency/dod/jason/dna.pdf First I'd like to point out (as I have a few times before) that the notion of form and function are recombined at the DNA computing level. In the BI/DW world of today, we have separated form from function, and it is inhibiting our ability to move forward, not to mention it is a severe drain on flexibility, scalability, and applicability. Form in our BI / DW world today would consist of models: Process models, business models, data models, architecture models, network models, and so forth. Function would be what these models do with the data / information passing through them. For instance, data models today hardly resemble the business processes in which the data sets flow - while there have been some advances, like UML and Object Oriented modeling - they are still (for the most part) diversified from the true business functions. We strive to make sense of the data, and the architectural modeling paradigms by assigning metadata - descriptive context. We also are now headed back towards convergence of business function and "architecture" with Master Data Models and Master Data sets. Finally we're beginning to get it - but still, the nature of the RDBMS engine in today’s world is to apply common functionality to models designed by external means. They are not tightly coupled. When we examine DNA Computing as a function of nanotechnology we find this to be a tightly coupled form and function process. The "model" in which the data sits, even where the information is encoded within the strand becomes important. The "function" is built in to the type of DNA strand created - in a bio-chemical sense. "No arithmetical operations are performed, or have been envisioned, in DNA computing. Instead, the potential power of DNA computing lies in the ability to prepare and sort through an exhaustive library of all possible answers to problems of a certain size. ... A single strand of DNA can be abstracted as a string made up of the letters A, C, G, T. ... Complementary strands of DNA will form a doulbe strand (the famous double helix). Two strings are complementary if the second, read backwards is the same as the first, except that A and T are interchanged, and C and G are interchanged." Now what happens in the BI / DW space if we were to follow this "wet-technology" model? What would happen if we were to combine form and function like the DNA computation machine? Would we see tremendous leaps in traditional computational power? I hypothesize that this is true, that if we were to simulate DNA computation in a newly designed DNA type database engine we would see a number of things happen. But remember, I'm not talking about traditional DNA modeling software on a traditional CPU / Computing Engine - no, I'm talking about a machine that currently only exists in bio-tech labs, in the test tubes. Ok, so what could we do better today that we haven't done in the past, and do it on conventional computing resources? Obviously the web-service is part of an extended neural network, which is capable of being taught, learning on it's own, and being corrected over time. So we still have some incorporation of traditional practices (due to the ultimate abstraction). This is a fundamental difference between the computational world and the DNA computing world. DNA Computing uses bio-chemistry to solve it's problems, and learn new things. Security is built in (as a function of what a DNA strand can and cannot "tie" to, bond with, cut and merge to - and how it will execute these things. As a matter of interest to DARPA, here is an interesting look at the applications of nanotech in today's world. How do you see DNA computing affecting the future of BI / DW? Cheers, November 7, 2006Don't get your knickers in a twist!Metadata is an interesting piece, many corporations and individuals fight over the true meaning of metadata and the context to which it applies. This entry is a thought experiment and explores the question of context, deriving context and resolving contextual fights within an organization as they relate to enterprise metadata. I believe everyone can have a metadata sit-in, and maybe finally work this thing out. Note: this is a tiny bit of light reading... Why should I even have knickers? What are knickers anyway? And why would they be twisted? Well, if you've never visited England, I suggest maybe you do so. It's a beautiful country - anyhow, knickers have multiple definitions depending on the time of reference and who's doing the referencing. For most of us who speak or understand English today, the statement usually refers to under-garments worn around the waist area. Ok, so what's changed? According to Websters Dictionary: knick‧ers Pronunciation[nik-erz] Now notice something interesting: At the end of the definition, it doesn't even agree with itself - they've twisted their knickers, and said see the word KNICKERBOCKER - let's see what KNICKERBOCKER has to say: Knick‧er‧bock‧er Pronunciation[nik-er-bok-er] –noun 1. a descendant of the Dutch settlers of New York. 2. any New Yorker. -------------------------------------------------------------------------------- [Origin: 1800–10, American; generalized from Diedrich Knickerbocker, fictitious author of Washington Irving's History of New York] Which not surprisingly has NOTHING to do with Knickers in the first place. Look at Definition #1 in the first quote, and definition #1 in the second quote - they DON'T MATCH!!! They are from close to the same time-period in origin. Ok, so we studied the root of the word, this is not so interesting... But it gives rise to a contextual problem (one that we have throughout our enterprises today. We can't decide on how to define our own terms, and furthermore, the metadata (the definitions and contextual understanding) 1) changes over time, 2) changes based on individual or line of business. Our enterprise metadata (Master Metadata) needs to be set forth, and needs to be built from an enterprise (top-down) view. That's not to say that we can't all have our cake / definitions and eat them too! We can, and we should. The best way to describe this type of effort is to look at existing Semantic Mapping Technology, or the Semantic Web, or Semantic Integration. Normally these things are done by hand, and if you choose to do so I would highly suggest an investment in a tool that can track, develop, and visualize Taxonomies, and Ontology’s of words. In order to make this work you might need: Yes, I'm suggesting Metadata at CMMI level 4, quantitatively tracked. Quality scores could be included, but are subjective to the individual scoring the metadata. Now on to your knicker problem, uhhh I mean - the Knickers Twisting problem... I mean - don't wear tight pants and then exercise if you don't want your knickers in a twist... Ok - I digress (sorry). In all honesty, Knickers are _not_ knickerbockers, although the word may have been derived from the original term. Knickers at an enterprise level may be accepted from a pants manufacturing corporation such as Levi Strauss - as the definition of PANTS or UNDERPANTS... but which is it? In the real-world of metadata this needs to be resolved by the executive team, they need to be the ones to define PRIMARY metadata. Using Taxonomy trees, secondary, and tertiary metadata can be defined based on LOB (lines of business) and work breakdown structures (roles & responsibilities or uses of the metadata). As long as the metadata is tied to the CURRENT VIEW of the organization, and what the data set represents. So that when data is delivered to the enterprise the metadata goes with it, and the organization can drill up/down and across the metadata meanings (provided they have the proper security). Unfortunately I do not know of any single tool that can accomplish this today. There are a set of open-source tools that manage semantic meaning, and a set of other tools that manage taxonomies, and another set of tools that manage version control / document management, security, and so on. Metadata tool set vendors are still in their infancy, hopefully someone will rise to the challenge - and hopefully I have not put your knickers in a twist! We can help you sort out the metadata mess, and establish a contextual, enterprise based metadata system that will save you time and money. This is a serious issue and must be solved before the enterprise gives rise to an SOA initiative, or before the enterprise claims to have completed an SOA initiative. As always, I'd love to hear from you - your thoughts, comments, poetry, haiku, and and tall tales are all welcome. Thanks, November 3, 2006MDaaS, Master Data, pros and cons, and definitionIn this entry we'll dive a little further into the pros and cons of master data as a service (MDaas). We'll bring to light the different kinds of master data, and how it will evolve in the market place into a service oriented architecture, housed offsite (generically). MDaaS follows the standard curve of new ideas, individual creation (decentralization), then centralization, and then commodity based master data. I think the firm which undertakes master data as a commodity will be a hot property in the near future. First, I'd like to discuss the definition of master data (which I've done in other blogs). From a 30,000 ft perspective, master data is operational, quality cleansed, singular in nature, and descriptive about a business key - it is in fact an operational data store for the enterprise (with a few rules twisted). By the way, come see me at TDWI in Orlando next week - I'm teaching on Master Data (how to implement within your enterprise). Master data should not contain: Master data structures should contain: Basic rules: Now, MDaaS requires that Master Data be housed off-site, on hosting services, in a remote database, connected through metadata and service layers. MDaaS can be specific by client (like SalesForce.com does with it's sales companies data it houses). MDaaS attributes: MDaaS must NOT: Some interesting items, there are some general master data sets that can and should be available to paying subscribers as shared data sets, these include: Any data currently reported to the public and available on the web, should be turned into MDaaS - and in some cases already has. Types of Master Data Entities might include: Some of these are protected and encrypted and relegated to authentication for access, some are not. At long last, what are the pros and cons of MDaaS? Cons: Do you have anything to add to this entry? Please share it. I'd love to hear your thoughts. Again, come see me next Friday at TDWI for Master Data Implementation. Cheers, |