Blog: William McKnight http://www.b-eye-network.com/blogs/mcknight/ Hello and welcome to my blog! I will periodically be sharing my thoughts and observations on information management here in the blog. I am passionate about the effective creation, management and distribution of information for the benefit of company goals, and I'm thrilled to be a part of my clients' growth plans and connect what the industry provides to those goals. I have played many roles, but the perspective I come from is benefit to the end client. I hope the entries can be of some modest benefit to that goal. Please share your thoughts and input to the topics. Copyright 2013 Thu, 09 May 2013 11:05:20 -0700 http://www.movabletype.org/?v=4.261 http://blogs.law.harvard.edu/tech/rss Teradata Introduces Intelligent In-Memory 0 0 1 343 1957 McKnight Consulting Group 16 4 2296 14.0

This week, Teradata introduced its in-memory capabilities.  The press release is here.   

In-memory is the current fastest commercial medium for data storage, upwards of thousands of times faster than HDD.  Naturally, it is expensive relatively speaking or it would be more pervasive now.  However, the price point has lowered recently and we are having an in-memory hardware renaissance.   As Tony Baer of Ovum said in a tweet response to me:  "w/DRAM cheap,takes SmartCaching to nextlevel."

Teradata takes in-memory a step further by considering the relative priority of data and, though all data is backed by disk, it will automatically transfer it in and out of memory as appropriate, in a feature it calls Intelligent Memory.  This is the data "80/20 rule" at work.   After 7 days of usage, Teradata says the allocation will be fairly settled.  Teradata has found that, across multiple environments, 85% of I/O uses 10% of the cylinders.  It's in-memory driven by data temperature (and this extends to HDD and SSD). 

You still need to allocate the memory (and SSD) initially but whatever you allocate will be intelligently utilized.  It will run will run across the Teradata family: the 670, 1650, 2700 and 6700 models. 

It behooves us to seek out the fastest medium for data that can be utilized effectively in the organization and delivers maximum ROI.  Those who understand the possibilities for data today should be sharing and stimulating the possibilities for information in the organization.  All productive efforts at information management go to the bottom line.

]]>
http://www.b-eye-network.com/blogs/mcknight/archives/2013/05/teradata_introd.php http://www.b-eye-network.com/blogs/mcknight/archives/2013/05/teradata_introd.php Business Intelligence/Data Warehousing Thu, 09 May 2013 11:05:20 -0700
When to do Data Virtualization
  • High Urgency
  • Limited Budget
  • High Volatility in requirements
  • Highly Constrained replication constraints
  • Explore and Learn organizational personality

Dave also mentioned, and I agree, that you buy data virtualization for infrastructure, not one project.  This will be a point I'll make again in my keynote tomorrow where I regard data virtualization, data integration and governance as "blankets" over the "no-reference" ("no" because how can it be a reference architecture when it keeps changing) architecture.
]]>
http://www.b-eye-network.com/blogs/mcknight/archives/2013/02/when_to_do_data.php http://www.b-eye-network.com/blogs/mcknight/archives/2013/02/when_to_do_data.php Business Intelligence/Data Warehousing Wed, 20 Feb 2013 10:25:18 -0700
Genre-Bending NoSQL: Couchbase Server 2.0 Available 0 0 1 366 2088 McKnight Consulting Group 17 4 2450 14.0

Couchbase announced the general availability of Couchbase Server 2.0 today.  This is the long anticipated release that carries Couchbase's capabilities over the threshold from a key-value store to document store. 

Key-value stores have little to no understanding of the value part of the key in the key-value. It is simply a blob, keyed by a unique identifier which serves a "primary key" function.  This key is used for get, put and delete.  You can also search inside the value, although the performance may be suboptimal.  The application consuming the record will need to know what to do with the blob/value.

Key-value stores are prominent today but with a few additional features they can become a document store or a column store and Couchbase has made their server a document store. When records are simple and lack strong commonality of look, a key-value store may be all you need. However, I don't see any of Couchbase's current workloads moving away from Couchbase due to the innovations.  These workloads include keeping track of everything related to the new class of web, console and mobile applications that are run simultaneously by thousands of users, sometimes sharing sessions - as in games.  They also include shopping carts.

Workloads involving relationships amongst data, complexity of the data or the need to operate on multiple keys/records at once need some of the recently added functions to 2.0 which include distributed indexing and querying, JSON storage, online compaction, incremental map reduce and cross data center replication.

Document stores tend to be excellent for logging online events of different types when those events can have varying characteristics.  They work well as pseudo content management systems that support website content.  Due to their ability to manipulate at a granular level, document stores are good for real-time analytics that change dynamically for which the application can take advantage of immediately.  

Key-value stores have historically (ok, it's all a short history) performed better than the more functional document stores.  Couchbase is navigating their capabilities forward in a very sensible manner.  There is no reason for performance degradation or lack of functionality and Couchbase is beginning to show this.  While there's still a release or two to go to be at a functionality standard for document stores, Couchbase is coming at it from a strong performance base.

]]>
http://www.b-eye-network.com/blogs/mcknight/archives/2012/12/genre-bending_n.php http://www.b-eye-network.com/blogs/mcknight/archives/2012/12/genre-bending_n.php BigData NoSQL Wed, 12 Dec 2012 07:41:35 -0700
Blood in the Water or Just Fishing Where the Fish Are? The story goes that Willy Sutton robbed banks because "that's where the money is."  While this attribution appears to be an urban legend, it's no myth that Oracle has a lion's share of databases - both transactional and analytic.

IBM started an advanced land grab for Oracle customer conversions by bringing a high compatibility of PL/SQL into the DB2 database.

Now, Teradata has invested resources in facilitating the migration away from Oracle.  With the Teradata Migration Accelerator (TMA). structure and SQL (PL/SQL) code can be converted to Teradata structures and code. This is a different philosophy from IBM, which requires few code changes for the move, but also doesn't immediately optimize that code for DB2.

While data definition language (DDL) has only minor changes from DBMS to DBMS, such as putting quotes around keywords, Teradata's key activity and opportunity in the migration is to change Oracle cursors to Teradata set-based SQL. 

"Rule sets" - for how to do conversions - can be applied selectivity across the structure and code in the migration.  TMA supports selective data movement, if desired, with WHERE clauses for the data.  TMA also supports multiple users doing a coordinated migration effort.

TMA also works for DB2 migrations.

While it will not do the trick on its own, having these tools, which convinces a shop that the move could be more pain-free than originally thought, will support DBMS migrations.  

]]>
http://www.b-eye-network.com/blogs/mcknight/archives/2012/04/blood_in_the_wa.php http://www.b-eye-network.com/blogs/mcknight/archives/2012/04/blood_in_the_wa.php Business Intelligence/Data Warehousing Mon, 30 Apr 2012 10:15:41 -0700
Capturing Sentiment and Influence Clues Teradata Aster demonstrates its graphical "pathing" capabilities very nicely by showing the relationships between tweeters and their tweets at events, like the Teradata Third-Party Influencers Event I attended last week. 

The demonstration shows how to produce some sentiment of the event, but more importantly demonstrates relationships and influence power.  Customer relationships and influence power are becoming part of the set of derived data needed to fully understand a company's customers.  This leads to identifying engagement models and the early identification of patterns of activity that lead to certain events - desired or otherwise.

One important point noted by Stephanie McReynolds, Director of Product Marketing, at Teradata Aster, was that the sphere of relevant influence depends on the situation.  You can retweet hundreds of tweets, many for which you do not even know the tweeter.  However, when buying a car, those who would influence you would be only a handful.

One would need to take some more heed of an influencer's opinion - or that of someone with a relationships to the influencer.  It can become quite a layered analysis and influence power is hard to measure.  Grabbing various digital breadcrumbs is relatively easy, but is it indicative of influence?  Likewise, is a tweetstream indicative of the sentiment of an event?  I'm not sure.  It may not even be indicative of the sentiment of the tweeters.  Digital is all a start.  The worlds of third-party data, real sentiment analysis and possibly sensor data are coming together.   

]]>
http://www.b-eye-network.com/blogs/mcknight/archives/2012/04/capturing_senti.php http://www.b-eye-network.com/blogs/mcknight/archives/2012/04/capturing_senti.php DBMS Selection Tue, 24 Apr 2012 11:17:22 -0700
Serious Play in the Sandbox with Teradata Data Labs Teradata rolled out Teradata Data Labs (TDL) in Teradata 14.  Though it is not a high-profile enhancement, it is worth understanding for not only Teradata data warehouse customers, but also for all data warehouse programs as a signal for how program architectures now look. Teradata Data Labs supports how customers are actually playing with their resource allocations in production environments in an effort to support more agile practices under more control by business users.

TDL is part of Teradata Viewpoint, a portal-based system management solution.  TDL is used to manage "analytic sandboxes" by these non-traditional builders of data systems.  Companies can allocate a percentage of overall disk and other resources to the lab area and the authorities can be managed with the TDL.  By creating "data labs" and assigning them to requesting business users, TDL minimizes the potential dangers of the "can of worms" that has long been opened, supporting production create, alter and delete activity - not just select activity - by business users.

These sandboxes must be managed since resources are limited.  Queries can be limited, various defaults set and, obviously, disk space is limited for each lab.  Expiration dates can be placed on the labs, which is not dissimilar to how a public library works.  Timeframes will span a week through a year.  The users may also send a "promotion" request to the labs manager, requesting the entities within the lab be moved out of labs and into production.

Data labs can be joined to data in the regular data warehouse.  One Teradata customer has 25% of the data warehouse space allocated to TDL.

TDL can support temporary processing needs with strong resources - not what is usually found in development environments.  I can also see TDL supporting normal IT development.  Look into TDL, or home-grow the idea within your non-Teradata data warehouse environment.  It's an idea whose time has come.

TDL is backward-compatible to Teradata 13.

]]>
http://www.b-eye-network.com/blogs/mcknight/archives/2012/04/serious_play_in.php http://www.b-eye-network.com/blogs/mcknight/archives/2012/04/serious_play_in.php DBMS Selection Tue, 17 Apr 2012 09:38:07 -0700
Thoughts from the Master Data Management Course, Days 2 & 3

I have completed teaching the Master Data Management Course in Sydney.  Thank you to my wonderful students.  Some memorable learning the last 2 days was done around some of these points:

  • Master data, with MDM, can be left where it is or, more commonly, placed in a separate hub
  • Product MDM tends to be more Governance-heavy than Customer
  • In a ragged hierarchy, a node can belong to multiple parents
  • Be selective about the fields you apply change management to
  • Customer lifetime value should ideally look forward, not behind, and should use profit instead of spend
  • Customer analytics can be calculated in MDM or CRM, the debate continues
  • Complex subject areas require multiple group input
  • Critical elements in MDM data security include confidentiality, integrity, non-repudiation, authentication and authorization
  • Syndicated data is becoming increasingly important and MDM is the most leveragable place to put that data
  • The web is also a source of syndicated data
  • Data quality is a value proposition
  • Do you have a data problem or a customer data problem or a product data problem?  It affects your tool selection
  • Care about what matters to your shop when you evaluate vendors
  • The program methodology should be balanced between rigor and creativity
  • In the design phase, you develop your test strategy, data migration plan, non-functional requirements, functional design, interface specifications, workflow design and logical data model
  • Don't mess up by staffing the team with only technicians
  • The purpose of the data conversion maps is to document the requirements for transforming source data into target data
  • Organizational change management is highly correlated to project success
  • Stakeholder management is not a one-time activity
If you're interested in hosting the class in 2012, please contact me.

 

]]>
http://www.b-eye-network.com/blogs/mcknight/archives/2011/11/thoughts_from_t.php http://www.b-eye-network.com/blogs/mcknight/archives/2011/11/thoughts_from_t.php Master Data Management Thu, 10 Nov 2011 02:23:28 -0700
Lessons from the Master Data Management Course IMG_0908.jpg

Day 1 of The 3-day Master Data Management course is in the books here in beautiful Sydney, Australia.  It's been an outstanding day of learning and sharing about the emerging, important discipline of master data management.

Here are my most vivid recollections from today:

  • MDM is highly misunderstood due to the wide range of benefits provided
  • MDM is part of major changes in how we handle data and to information chaos, which will get more complex before it gets less complex
  • MDM can and should support Hadoop data and all manner of data marts
  • Lack of a subject-area orientation in the culture is a challenge for MDM
  • Some MDM is analytical, most is operational
  • MDM subject areas can mix or hybrid across factors of analytical/operational, physical/virtual and the degree of governance needed
  • Often many systems build components of a master record, few work on the same attributes
  • MDM returns are in the improved efficacy of projects targeting business objectives
  • To do a return on investment justification, all project benefits must be converted to cash flow
  • MDM should be tightly aligned with successful projects, creating benefits for the MDM program
  • Personal motivators must be understood and are important in building an MDM roadmap
  • Vendor solutions may be subject area-focused or support multiple subject areas
  • Tactical MDM supports an individual project, enterprise MDM supports the organization for the subject area
  • Strong project management discipline can be more important in that role than MDM domain knowledge
  • The data warehouse will remain relevant in organizations, but many of its functions are moving operational, such as those to MDM
  • You can mix a subject are with the hub persisting frequently used data elements and pointing to source systems with the rest of the data
  • Do not count on the data warehouse for what MDM provides
  • Governance workflows provide the ability to escalate if actions are not taken in a timely manner
  • External sources like EPCID are becoming relevant in the product subject area

More to come on days 2 and 3.


]]>
http://www.b-eye-network.com/blogs/mcknight/archives/2011/11/lessons_from_th.php http://www.b-eye-network.com/blogs/mcknight/archives/2011/11/lessons_from_th.php Master Data Management Sat, 05 Nov 2011 23:57:02 -0700
Microsoft and Hippy-Made Hadoop: A Marriage Made with Windows

This week, at the PASS Summit, Microsoft unveiled its inevitable "big data" strategy.  The world of big data is the new unchartered land in information management and the big vendors are jumping on board.  "New economy" giants like eBay, twitter, FaceBook and Google are the early adopters - and many even built the big data tools that everything is based on. 

 

It would be too easy to dismiss big data as a Valley-only phenomenon, and you shouldn't.  Microsoft's information management tools serve perhaps the widest ranging set of clients anywhere.  They've either made their move to "keep up with the Joneses" (Oracle had some big data announcements last week) or there must be some Global 2000 budgets in it.  The industry will not thrive without some of the latter and that's what I'm betting on.

 

There's vast utility in unstructured and machine-generated data (somehow tweets count in this category) and many reasons, starting with monetary, why, once a company finds some use for it, they will choose a big data tool like Hadoop rather than a relational database management system to store the data.  Yes, and even live with the tradeoffs of lack of ACID compliance, lack of transactions, lack of SQL (although this is eroding by the day), lack of schema sharing, the need to user-assemble (although this is also eroding) and node failures being a way of life.  Indeed, the "secret sauce" of Hadoop is the distribution of data and node recovery failure - RAID-like, but less costly.

 

It's better to play with this "hippy developed" (as one skeptic referred to it as) Hadoop than ignore it at this point.  That's what Microsoft has done.  Microsoft is working to deploy Hadoop on Windows and cloud-based Azure.  This could really work in Microsoft's big data land grab.  It's a hedge against going too hard-core into the open-source world.  It's comfortable Windows combined with Hadoop.  For the many, many fence-sitters out there, this is good timing.  Many want to trace movements of physical objects, trace web clicks and other Web 2.0 activity.  They want to do this without sacrificing enterprise standards they are used to with products like Windows and its management toolset.

 

]]>
http://www.b-eye-network.com/blogs/mcknight/archives/2011/10/microsoft_and_h.php http://www.b-eye-network.com/blogs/mcknight/archives/2011/10/microsoft_and_h.php Microsoft Sat, 15 Oct 2011 14:12:39 -0700
Introducing Teradata Columnar

Potentially Teradata's most significant enhancement in a decade will be on display next week at the Teradata Partners conference.  And that is Teradata Columnar.  Few leading database players have altered the fundamental structure of having all of the columns of the table stored consecutively on disk for each record.  The innovations and practical use cases of "columnar databases" have come from the independent vendor world, where it has proven to be quite effective in the performance of an increasingly important class of analytic query.  Here is the first in a series of blogs where I discussed columnar databases. 

Teradata obviously is not a "columnar database" but would now be considered a hybrid, exhibiting columnar features upon those columns that are chosen to participate.  Teradata combines columnar capabilities with a feature-rich and requirements-matching DBMS already deployed by many large clients for their enterprise data warehouse.  Columnar is available in all Teradata platforms - Teradata Active Enterprise Data Warehouse, Teradata Data Warehouse Appliance, Teradata Extreme Data Appliance and Teradata Extreme Performance Appliance.

Teradata's approach allows for the mixing of row structure, column structures and multi-column structures directly in the DBMS in "containers."  The physical structure of each container can also be in row- (extensive page metadata including a map to offsets) which is referred to as "row storage format" or columnar- (the row "number" is implied by the value's relative position) format.  All rows of the table will be treated the same way, i.e., there is no column structure/columnar-format for the first 1 million rows and row structure for the rest.  However, (row) partition elimination is still very alive and, when combined with column structures, creates I/O that can now retrieve a very focused set of data for the price of a few metadata reads to facilitate the eliminations.

Each column goes in one container.  A container can have one or multiple columns.  Columns that are frequently access together should be put into the same container.  Physically, multiple container structures are possible for columns with a large number of rows. ]]>
http://www.b-eye-network.com/blogs/mcknight/archives/2011/09/introducing_ter.php http://www.b-eye-network.com/blogs/mcknight/archives/2011/09/introducing_ter.php Business Intelligence/Data Warehousing Fri, 30 Sep 2011 15:17:11 -0700
NoSQL is Yes Key-Value, Document, Column and Graph Stores

NoSQL solutions are solutions that do not accept the SQL language against their data stores.   Ancillary to this is the fact that most do not store data in the structure SQL was built for - tables.  Though the solutions are "no SQL", the idea is that "not only" SQL solutions are needed to solve information needs today.  The Wikipedia article states "Carlo Strozzi first used the term NoSQL in 1998 as a name for his open source relational database that did not offer a SQL interface".  Some of these NoSQL solutions are already becoming perilously close to accepting broad parts of the SQL language.  Soon, NoSQL may be an inappropriate label, but I suppose that's what happens when a label refers to something that it is NOT.


So what is it?  It must be worth being part of.  There are currently at least 122 products claiming the space.  As fine-grained as my information management assessments have had to be in the past year routing workloads across relational databases, cubes, stream processing, data warehouse appliances, columnar databases, master data management and Hadoop (one of the NoSQL solutions), there are many more viable categories and products in NoSQL that actually do meet real business needs for data storage and retrieval.

 

Commonalities across NoSQL solutions include high volume data which lends itself to a distributed architecture.  The typical data stored is not the typical alphanumeric data.  Hence the synonymous nature of NoSQL with "Big Data".  Lacking full SQL generally corresponds to a decreased need for real-time query.  And many use HDFS for data storage.  Technically, though columnar databases such as Vertica, InfiniDB, ParAccel, InfoBright and the extensions by Teradata 14, Oracle (Exadata), SQL Server (Denali) and Informix Warehouse Accelerator deviate from the "norm" of full-row-together storage, they are not NoSQL by most definitions (since they accept SQL and the data is still stored in tables).

 

They all require specialized skill sets quite dissimilar to traditional business intelligence.  This dichotomy in the people who perform SQL and NoSQL within an organization has already led to high walls between the two classes of projects and an influx of software connectors between "traditional" product data and NoSQL data.  At the least, a partnership with CloudEra and a connector to Hadoop seems to be the ticket to claiming Hadoop integration.

]]>
http://www.b-eye-network.com/blogs/mcknight/archives/2011/09/nosql_is_yes_ke.php http://www.b-eye-network.com/blogs/mcknight/archives/2011/09/nosql_is_yes_ke.php NoSQL Wed, 14 Sep 2011 18:22:15 -0700
Self-Service Business Intelligence vs. Outsourced BI

In business intelligence, we all know and espouse the fact that data integration is the most time-consuming part of the build process.  This is undeniably true.  However, if one were to look at the long-term (me: not a full-time analyst, but observant of the implementations I've been in for a full lifecycle over the past few years), I believe most long-term costs clearly fall into the data access layer.   This is where the reports, dashboards, alerts, etc. are built.


This is true for a variety of reasons, not the least of which is a short-cutting of the data modeling process, which, when done well, minimizes the gap between design and usage.  This aspect of BI is receiving only modest recognition.  The focus instead is on a new breed of disruptive data access tools that are architecturally doing side-runs around the legacy tools in how they use memory and advanced visualization.  Specifically, these tools are Tableau, QlikTech, and Spotfire.  These tools attack a very important component of the long-term cost of BI - the cost of IT having to continue to do everything post-production.


There are a few areas where these tools are getting recognition:


  1. They perform faster - this allows a user, in the 30 minutes of time he has to do an analysis, to get to a deeper level of root cause analysis
  2. They are seen as more intuitive - this empowers the end user so they can do more, versus getting IT involved, which stalls a thought stream and introduces delay which can obliterate the relevancy
  3. They visualize data differently - I won't expound on it here and I don't think it's necessarily due to the tool architecture, but many claim it's better

So why do I bring it up in opposition to outsourced business intelligence?  Because to truly set up business intelligence to work in a self-service capacity, you would overweigh the idea of working closely with users in the build process, which is a lever that gets deemphasized in outsourced BI.  You would see business intelligence as less a technical exercise and more as an empowerment exercise.   You would keep the build closer to home, where the support would be.  And you would not gear up an offshore group to handle a laborious process of maintaining the data layer over the years in the way users desire.  You would invest in users - culture, education, information use - instead of outsourced groups.  And this is just what many are doing now. 

]]>
http://www.b-eye-network.com/blogs/mcknight/archives/2011/08/self-service_bu_1.php http://www.b-eye-network.com/blogs/mcknight/archives/2011/08/self-service_bu_1.php Business Intelligence/Data Warehousing Sun, 14 Aug 2011 10:52:32 -0700
Perception Change Follows Product Line Updates at Teradata

I was at Teradata Influencer's Days this week, an annual 3-day invitation-only event where Teradata catches us up on the latest offerings and company strategy.  We were in Las Vegas this year and we had a fascinating visit to the Switch data center where eBay stores their Teradata EDW, Hadoop clusters and another large system where the thousands of jobs run daily to keep eBay on top of their game.

Teradata is undoubtedly a long-standing leader in information management.  They have been preparing for the heterogeneous future (or is it the heterogeneous present?) and diversifying their offerings for several years.  Teradata's moves should have everyone reconsidering any notion of Teradata as a high-hurdle company that wants you to put everything online in a single data warehouse.  And it seems to be working.  Teradata released earnings Wednesday showing revenue growth of 24 percent in 2Q11.

Aster Data - A "big data" acquisition for the management of the multi-structured data with patented SQL/MapReduce

Active Data Warehousing - Abilities built into the Teradata 5000 EDW series that support and promote fast, active, intra-day loading of the data warehouse as opposed to a batch-loaded warehouse

Aprimo - Marketing applications that put the information to work and a software-as-a-service model to build some of their future on

Master Data Management - The "system of record" for subject areas that need governance and need to be integrated in real-time, operationally

Hot-Cold Data Placement - Less-used data placed into lower-cost storage, with accompanying degraded performance

Appliance Family - Pre-loaded machines of varying specification according to workload that can get your data access up and running quickly; some are using the appliance for their data warehouse

I noted still something could be done where many analytics are going - to the operational world.  Something in complex event processing would further an information ecosystem.  

We discussed Teradata 14 and it will continue this theme of providing the range of platform options necessary today.

Now that some of these acquisitions are assimilated, we are seeing a reflection in the marketing.  With "Teradata Everywhere" as the imperative, the reference architecture is now the "Analytic Ecosystem" which is an environment that includes, but is not all-consumed by, the Enterprise Data Warehouse.  Consider the market sizes of the markets Teradata is going after, as shared by Teradata: Data Warehousing ($27B), Business Applications ($15B) and Big Data Analytics ($2B).  Teradata is embracing the heterogeneous future as a focused leader in information management.

]]>
http://www.b-eye-network.com/blogs/mcknight/archives/2011/08/perception_chan.php http://www.b-eye-network.com/blogs/mcknight/archives/2011/08/perception_chan.php Business Intelligence/Data Warehousing Sat, 06 Aug 2011 08:45:57 -0700
Self-Service Business Intelligence: Come and Get It

What do you think about when you hear the term "self-service"?  To some, it's a positive term connoting the removal of barriers to a goal.  I can, for example, go through the self-service checkout line at the grocery store and I'm limited only by my own scanning (and re-scanning) speed to getting out the door.  However, as we've seen with some chains eliminating self-service lines recently, self-service is not always desired by either party.  To some, "self-service" is a negative term, euphemistically meaning "no service" or "you're on your own."

As defined in Claudia Imhoff and Colin White's excellent report, "Self-Service Business Intelligence: Empowering Users to Generate Insights", self-service BI is defined as "the facilities within the BI environment that enable BI users to become more self-reliant and less dependent on the IT organization."

If you put up a poor data warehouse, it is a copy of operational data, only lightly remodeled from source and usually carrying many of the same data quality flaws from the source.  It solves a big problem - making the data available - but after this copy of data, the fun begins with each new query being a new adventure into data sources, tools, models, etc.  What has inevitably happened in some environments is that users take what they need, like it's raw data, and do the further processing required for the business department or function. 

This post-warehouse processing is frequently very valuable to the rest of the organization, if the organization could only get access to it.  However, data that is generated and calculated post-data warehouse has little hope of reaching any kind of shared state.  This data warehouse is not ready for self-service BI.

According to Imhoff and White, the BI environment needs to achieve four main objectives for self-service BI:

1.       Make BI tools easy to use

2.       Make BI results easy to consume and enhance

3.       Make DW solutions fast to deploy and easy to manage

4.       Make it easy to access source data

To achieve these goals, you need a solid foundation and solid processes.  Take account of your BI environment.  While IT and consultancy practices have coined "self-service business intelligence" to put some discipline to the idea of user empowerment, some of it is mere re-labeling of "no service" BI and does not attain and maintain a healthy relationship with the user community and healthy exploitation of the data produced in the systems.  We all know that IT budgets are under pressure, but this is not the time to cut vital services of support that maintain multi-million dollar investments.

]]>
http://www.b-eye-network.com/blogs/mcknight/archives/2011/07/self-service_bu.php http://www.b-eye-network.com/blogs/mcknight/archives/2011/07/self-service_bu.php Business Intelligence/Data Warehousing Thu, 28 Jul 2011 19:07:19 -0700
Reducing Credit Card Fraud

I was part of one of the pioneer credit card fraud detection projects.  It was at Visa and, together with all the similar projects taking advantage of early-stage data mining that were going on about the same time throughout the financial industry, drove credit card fraud down dramatically to all-time lows.  In recent years, as the technology changes, fraud has increased once again.  The financial industry has the online problem to deal with in addition to the ramifications from identity theft and the card skimming that was once falling.  Employees are compromising the data they come into contact with as well. 

Mass compromises occur routinely since thieves can divide and conquer - some can focus on getting the card numbers and others commit the fraud.  There is a robust, efficient black market for card numbers.  Consider the huge breach at Heartland Payment Systems in 2009.  Committing fraud is done with the detection systems in mind.  They often occur in "blitz" mode to overwhelm the system before it has a chance to react and stop transactions.

A recent study by Ovum studied 120 banks and found that counterfeit card fraud is the top issue, with wire fraud second.  Card readers can be purchased much more easily (i.e., on the iPhone) and the number of cards has proliferated, increasing potential for fraud.  While the UK has adopted "chip and pin" technology on the card, the US has not.  This may one day make it more difficult for criminals to cash in on credit card fraud in the US.

Personally, I just count on having to change my credit card numbers at least yearly either on account of outright fraud, the bank (I'll use "bank", but am referring to all financial companies in this article) being compromised or me making legitimate charges where the bank panics and decides to cancel the card.   All that good fraud detection comes with a price to the card holder.

I've worked on the fraud issue since then.  Other than the fact that it's working on the prevention of a negative to the company, these actually are fun, detective-work projects.  For those who have not had the opportunity, today I decided to share some of the architecture behind fraud prevention utilizing the approach of one of the leading international providers of payment systems, ACI Worldwide (Nasdaq: ACIW) and their product, ACI Proactive Risk Managerâ„¢ 8.0 (PRM). 

As the last step in the authorization process, PRM shares a score with the bank and, based on the tolerance the bank has set for the customer (balancing potential fraud with false positives), the bank's system decides whether to authorize or not.

Although the bank may have a data warehouse, all customer transaction sources feed PRM.  Some customers extend PRM's capabilities to make it their data warehouse.  One year's worth of backlogged transactions is recommended to start with - even though most are legally required to store seven years of data.

]]>
http://www.b-eye-network.com/blogs/mcknight/archives/2011/07/reducing_credit.php http://www.b-eye-network.com/blogs/mcknight/archives/2011/07/reducing_credit.php Other Wed, 13 Jul 2011 12:19:47 -0700