Blog: William McKnight Subscribe to this blog's RSS feed!

William McKnight

Hello and welcome to my blog!

I will periodically be sharing my thoughts and observations on information management here in the blog. I am passionate about the effective creation, management and distribution of information for the benefit of company goals, and I'm thrilled to be a part of my clients' growth plans and connect what the industry provides to those goals. I have played many roles, but the perspective I come from is benefit to the end client. I hope the entries can be of some modest benefit to that goal. Please share your thoughts and input to the topics.

About the author >

William is the president of McKnight Consulting Group, a firm focused on delivering business value and solving business challenges utilizing proven, streamlined approaches in data warehousing, master data management and business intelligence, all with a focus on data quality and scalable architectures. William functions as strategist, information architect and program manager for complex, high-volume, full life-cycle implementations worldwide. William is a Southwest Entrepreneur of the Year finalist, a frequent best-practices judge, has authored hundreds of articles and white papers, and given hundreds of international keynotes and public seminars. His team's implementations from both IT and consultant positions have won Best Practices awards. He is a former IT Vice President of a Fortune company, a former software engineer, and holds an MBA. William is author of the book 90 Days to Success in Consulting. Contact William at wmcknight@mcknightcg.com.

Editor's Note: More articles and resources are available in William's BeyeNETWORK Expert Channel. Be sure to visit today!

Teradata rolled out Teradata Data Labs (TDL) in Teradata 14.  Though it is not a high-profile enhancement, it is worth understanding for not only Teradata data warehouse customers, but also for all data warehouse programs as a signal for how program architectures now look. Teradata Data Labs supports how customers are actually playing with their resource allocations in production environments in an effort to support more agile practices under more control by business users.

TDL is part of Teradata Viewpoint, a portal-based system management solution.  TDL is used to manage "analytic sandboxes" by these non-traditional builders of data systems.  Companies can allocate a percentage of overall disk and other resources to the lab area and the authorities can be managed with the TDL.  By creating "data labs" and assigning them to requesting business users, TDL minimizes the potential dangers of the "can of worms" that has long been opened, supporting production create, alter and delete activity - not just select activity - by business users.

These sandboxes must be managed since resources are limited.  Queries can be limited, various defaults set and, obviously, disk space is limited for each lab.  Expiration dates can be placed on the labs, which is not dissimilar to how a public library works.  Timeframes will span a week through a year.  The users may also send a "promotion" request to the labs manager, requesting the entities within the lab be moved out of labs and into production.

Data labs can be joined to data in the regular data warehouse.  One Teradata customer has 25% of the data warehouse space allocated to TDL.

TDL can support temporary processing needs with strong resources - not what is usually found in development environments.  I can also see TDL supporting normal IT development.  Look into TDL, or home-grow the idea within your non-Teradata data warehouse environment.  It's an idea whose time has come.

TDL is backward-compatible to Teradata 13.


Posted April 17, 2012 9:38 AM
Permalink | No Comments |

I have completed teaching the Master Data Management Course in Sydney.  Thank you to my wonderful students.  Some memorable learning the last 2 days was done around some of these points:

  • Master data, with MDM, can be left where it is or, more commonly, placed in a separate hub
  • Product MDM tends to be more Governance-heavy than Customer
  • In a ragged hierarchy, a node can belong to multiple parents
  • Be selective about the fields you apply change management to
  • Customer lifetime value should ideally look forward, not behind, and should use profit instead of spend
  • Customer analytics can be calculated in MDM or CRM, the debate continues
  • Complex subject areas require multiple group input
  • Critical elements in MDM data security include confidentiality, integrity, non-repudiation, authentication and authorization
  • Syndicated data is becoming increasingly important and MDM is the most leveragable place to put that data
  • The web is also a source of syndicated data
  • Data quality is a value proposition
  • Do you have a data problem or a customer data problem or a product data problem?  It affects your tool selection
  • Care about what matters to your shop when you evaluate vendors
  • The program methodology should be balanced between rigor and creativity
  • In the design phase, you develop your test strategy, data migration plan, non-functional requirements, functional design, interface specifications, workflow design and logical data model
  • Don't mess up by staffing the team with only technicians
  • The purpose of the data conversion maps is to document the requirements for transforming source data into target data
  • Organizational change management is highly correlated to project success
  • Stakeholder management is not a one-time activity
If you're interested in hosting the class in 2012, please contact me.

 


Posted November 10, 2011 2:23 AM
Permalink | No Comments |
IMG_0908.jpg

Day 1 of The 3-day Master Data Management course is in the books here in beautiful Sydney, Australia.  It's been an outstanding day of learning and sharing about the emerging, important discipline of master data management.

Here are my most vivid recollections from today:

  • MDM is highly misunderstood due to the wide range of benefits provided
  • MDM is part of major changes in how we handle data and to information chaos, which will get more complex before it gets less complex
  • MDM can and should support Hadoop data and all manner of data marts
  • Lack of a subject-area orientation in the culture is a challenge for MDM
  • Some MDM is analytical, most is operational
  • MDM subject areas can mix or hybrid across factors of analytical/operational, physical/virtual and the degree of governance needed
  • Often many systems build components of a master record, few work on the same attributes
  • MDM returns are in the improved efficacy of projects targeting business objectives
  • To do a return on investment justification, all project benefits must be converted to cash flow
  • MDM should be tightly aligned with successful projects, creating benefits for the MDM program
  • Personal motivators must be understood and are important in building an MDM roadmap
  • Vendor solutions may be subject area-focused or support multiple subject areas
  • Tactical MDM supports an individual project, enterprise MDM supports the organization for the subject area
  • Strong project management discipline can be more important in that role than MDM domain knowledge
  • The data warehouse will remain relevant in organizations, but many of its functions are moving operational, such as those to MDM
  • You can mix a subject are with the hub persisting frequently used data elements and pointing to source systems with the rest of the data
  • Do not count on the data warehouse for what MDM provides
  • Governance workflows provide the ability to escalate if actions are not taken in a timely manner
  • External sources like EPCID are becoming relevant in the product subject area

More to come on days 2 and 3.



Posted November 5, 2011 11:57 PM
Permalink | No Comments |

This week, at the PASS Summit, Microsoft unveiled its inevitable "big data" strategy.  The world of big data is the new unchartered land in information management and the big vendors are jumping on board.  "New economy" giants like eBay, twitter, FaceBook and Google are the early adopters - and many even built the big data tools that everything is based on. 

 

It would be too easy to dismiss big data as a Valley-only phenomenon, and you shouldn't.  Microsoft's information management tools serve perhaps the widest ranging set of clients anywhere.  They've either made their move to "keep up with the Joneses" (Oracle had some big data announcements last week) or there must be some Global 2000 budgets in it.  The industry will not thrive without some of the latter and that's what I'm betting on.

 

There's vast utility in unstructured and machine-generated data (somehow tweets count in this category) and many reasons, starting with monetary, why, once a company finds some use for it, they will choose a big data tool like Hadoop rather than a relational database management system to store the data.  Yes, and even live with the tradeoffs of lack of ACID compliance, lack of transactions, lack of SQL (although this is eroding by the day), lack of schema sharing, the need to user-assemble (although this is also eroding) and node failures being a way of life.  Indeed, the "secret sauce" of Hadoop is the distribution of data and node recovery failure - RAID-like, but less costly.

 

It's better to play with this "hippy developed" (as one skeptic referred to it as) Hadoop than ignore it at this point.  That's what Microsoft has done.  Microsoft is working to deploy Hadoop on Windows and cloud-based Azure.  This could really work in Microsoft's big data land grab.  It's a hedge against going too hard-core into the open-source world.  It's comfortable Windows combined with Hadoop.  For the many, many fence-sitters out there, this is good timing.  Many want to trace movements of physical objects, trace web clicks and other Web 2.0 activity.  They want to do this without sacrificing enterprise standards they are used to with products like Windows and its management toolset.

 


Posted October 15, 2011 2:12 PM
Permalink | No Comments |

Potentially Teradata's most significant enhancement in a decade will be on display next week at the Teradata Partners conference.  And that is Teradata Columnar.  Few leading database players have altered the fundamental structure of having all of the columns of the table stored consecutively on disk for each record.  The innovations and practical use cases of "columnar databases" have come from the independent vendor world, where it has proven to be quite effective in the performance of an increasingly important class of analytic query.  Here is the first in a series of blogs where I discussed columnar databases. 

Teradata obviously is not a "columnar database" but would now be considered a hybrid, exhibiting columnar features upon those columns that are chosen to participate.  Teradata combines columnar capabilities with a feature-rich and requirements-matching DBMS already deployed by many large clients for their enterprise data warehouse.  Columnar is available in all Teradata platforms - Teradata Active Enterprise Data Warehouse, Teradata Data Warehouse Appliance, Teradata Extreme Data Appliance and Teradata Extreme Performance Appliance.

Teradata's approach allows for the mixing of row structure, column structures and multi-column structures directly in the DBMS in "containers."  The physical structure of each container can also be in row- (extensive page metadata including a map to offsets) which is referred to as "row storage format" or columnar- (the row "number" is implied by the value's relative position) format.  All rows of the table will be treated the same way, i.e., there is no column structure/columnar-format for the first 1 million rows and row structure for the rest.  However, (row) partition elimination is still very alive and, when combined with column structures, creates I/O that can now retrieve a very focused set of data for the price of a few metadata reads to facilitate the eliminations.

Each column goes in one container.  A container can have one or multiple columns.  Columns that are frequently access together should be put into the same container.  Physically, multiple container structures are possible for columns with a large number of rows.

Posted September 30, 2011 3:17 PM
Permalink | No Comments |

1 2 NEXT

   VISIT MY EXPERT CHANNEL

Search this blog
Categories ›
Archives ›
Recent Entries ›