When designing a data warehouse, no matter whether it is a dimensional model, data vault or normalized design, I am in favor of using surrogate keys. They add a level of indirection, which I like, as well as make the table a bit more extensible/flexible. I also like to have an alternate key (or unique business key or unique natural key) defined on my table as well.
Let us define a table using all three options. Before I start, let me agree with you that there are lots of ways to design this. It is not a proper normalized or data vault, but I am trying to present an example.
Normalized
Dimensional
Data Vault
Implementing these, I usually define UNIQUE indexes on both the Surrogate Key and the Alternate Key (which by definition is UNIQUE).
The code would look something like:
Normalized
CREATE TABLE City
(
City_Key INTEGER NOT NULL ,
City_Name VARCHAR(20)
);
CREATE UNIQUE INDEX XPKCity ON City
(
City_Key ASC
)
;
ALTER TABLE City
ADD CONSTRAINT XPKCity PRIMARY KEY (City_Key);
CREATE TABLE Person
(
Person_Key INTEGER NOT NULL ,
Person_ID INTEGER ,
Person_Last_Name VARCHAR(20) ,
Person_First_Name VARCHAR(20) ,
City_Key INTEGER NOT NULL ,
State_Key INTEGER NOT NULL ,
Postal_Code_Key INTEGER NOT NULL
);
CREATE UNIQUE INDEX XPKPerson ON Person
(
Person_Key ASC
)
;
ALTER TABLE Person
ADD CONSTRAINT XPKPerson PRIMARY KEY (Person_Key);
CREATE UNIQUE INDEX XAK1Person ON Person
(
Person_ID ASC
)
;
CREATE INDEX XIF1Person ON Person
(
City_Key ASC
)
;
CREATE INDEX XIF2Person ON Person
(
State_Key ASC
)
;
CREATE INDEX XIF3Person ON Person
(
Postal_Code_Key ASC
)
;
CREATE TABLE Postal_Key
(
Postal_Code_Key INTEGER NOT NULL ,
Postal_Code CHAR(10)
);
CREATE UNIQUE INDEX XPKPostal_Key ON Postal_Key
(
Postal_Code_Key ASC
)
;
ALTER TABLE Postal_Key
ADD CONSTRAINT XPKPostal_Key PRIMARY KEY (Postal_Code_Key);
CREATE TABLE State
(
State_Key INTEGER NOT NULL ,
State_Code CHAR(2) ,
State_Name VARCHAR(20)
);
CREATE UNIQUE INDEX XPKState ON State
(
State_Key ASC
)
;
ALTER TABLE State
ADD CONSTRAINT XPKState PRIMARY KEY (State_Key);
ALTER TABLE Person
ADD CONSTRAINT R_2 FOREIGN KEY (City_Key) REFERENCES City (City_Key);
ALTER TABLE Person
ADD CONSTRAINT R_3 FOREIGN KEY (State_Key) REFERENCES State (State_Key);
ALTER TABLE Person
ADD CONSTRAINT R_4 FOREIGN KEY (Postal_Code_Key) REFERENCES Postal_Key (Postal_Code_Key);
Dimensional
CREATE TABLE Person_Dim
(
Person_Key INTEGER NOT NULL ,
Person_Last_Name VARCHAR(20) ,
Person_First_Name VARCHAR(20) ,
Person_ID INTEGER NOT NULL,
Person_City VARCHAR(20) ,
Person_State_Code CHAR(2) ,
Person_Postal_Code CHAR(10)
);
CREATE UNIQUE INDEX XPKPerson_Dim ON Person_Dim
(
Person_Key ASC
)
;
ALTER TABLE Person_Dim
ADD CONSTRAINT XPKPerson_Dim PRIMARY KEY (Person_Key);
CREATE UNIQUE INDEX XAK1Person_Dim ON Person_Dim
(
Person_ID ASC
)
;
Data Vault
CREATE TABLE Person_Hub
(
Person_Key INTEGER NOT NULL ,
Person_ID INTEGER NOT NULL,
SourceSystemID CHAR(18) ,
EDW_Date_Added DATE
);
CREATE UNIQUE INDEX XPKPerson_Hub ON Person_Hub
(
Person_Key ASC
)
;
CREATE UNIQUE INDEX XAK1Person_Hub ON Person_Hub
(
Person_ID ASC
)
;
ALTER TABLE Person_Hub
ADD CONSTRAINT XPKPerson_Hub PRIMARY KEY (Person_Key);
CREATE TABLE Person_Sat
(
Person_Key INTEGER NOT NULL ,
EDW_Begin_Date INTEGER NOT NULL ,
Person_Last_Name VARCHAR(20) ,
Person_First_Name VARCHAR(20) ,
Person_City VARCHAR(20) ,
Person_State_Code CHAR(2) ,
Person_Postal_Code CHAR(10)
);
CREATE UNIQUE INDEX XPKPerson_Sat ON Person_Sat
(
Person_Key ASC,
EDW_Begin_Date ASC
)
;
ALTER TABLE Person_Sat
ADD CONSTRAINT XPKPerson_Sat PRIMARY KEY (Person_Key,EDW_Begin_Date);
CREATE INDEX XIF1Person_Sat ON Person_Sat
(
Person_Key ASC
)
;
ALTER TABLE Person_Sat
ADD CONSTRAINT R_1 FOREIGN KEY (Person_Key) REFERENCES Person_Hub (Person_Key);
If there are a lot of Person_IDs in your table, you may wish to partition the data. Next time, I will discuss the different partitioning options and some "gotcha’s" that you need to look out for.
SOURCE: The Use of Surrogate Keys
-
Chuck Kelley
Chuck is an internationally known expert in database technology. He has more than 30 years of experience in the design and implementation of operational/production systems, operational data stores
and data warehouses (data marts). Chuck teaches seminars on a variety of database and data warehousing topics. He has co-authored or contributed to four books on data warehousing and has been
published in numerous trade magazines and written columns on database technology, data warehousing, metadata, master data management, data governance and enterprise data strategies. He may be
contacted at chuckkelley@usa.net.
Editor's Note: More articles, news and resources are available in Chuck Kelley's BeyeNETWORK
Expert Channel on Database Design Techniques. Be sure to visit today!
Recent articles by Chuck Kelley
Comments
Want to post a comment? Login or become a member today!
Be the first to comment!