Channel: Database Design Techniques - Chuck Kelley RSS Feed for Database Design Techniques - Chuck Kelley

 

The Use of Surrogate Keys

Originally published October 13, 2010

When designing a data warehouse, no matter whether it is a dimensional model, data vault or normalized design, I am in favor of using surrogate keys. They add a level of indirection, which I like, as well as make the table a bit more extensible/flexible. I also like to have an alternate key (or unique business key or unique natural key) defined on my table as well.

Let us define a table using all three options. Before I start, let me agree with you that there are lots of ways to design this. It is not a proper normalized or data vault, but I am trying to present an example.

Normalized




Dimensional


Data Vault




Implementing these, I usually define UNIQUE indexes on both the Surrogate Key and the Alternate Key (which by definition is UNIQUE).

The code would look something like:

Normalized

CREATE TABLE City
(
    City_Key              INTEGER  NOT NULL ,
    City_Name             VARCHAR(20)
);
CREATE UNIQUE INDEX XPKCity ON City
(
    City_Key              ASC
)
    ;
ALTER TABLE City
    ADD CONSTRAINT XPKCity  PRIMARY KEY (City_Key);

CREATE TABLE Person
(
    Person_Key            INTEGER  NOT NULL ,
    Person_ID             INTEGER ,
    Person_Last_Name      VARCHAR(20) ,
    Person_First_Name     VARCHAR(20) ,
    City_Key              INTEGER  NOT NULL ,
    State_Key             INTEGER  NOT NULL ,
    Postal_Code_Key       INTEGER  NOT NULL
);
CREATE UNIQUE INDEX XPKPerson ON Person
(
    Person_Key            ASC
)
    ;
ALTER TABLE Person
    ADD CONSTRAINT XPKPerson  PRIMARY KEY (Person_Key);
CREATE UNIQUE INDEX XAK1Person ON Person
(
    Person_ID             ASC
)
    ;
CREATE INDEX XIF1Person ON Person
(
    City_Key              ASC
)
    ;
CREATE INDEX XIF2Person ON Person
(
    State_Key             ASC
)
    ;
CREATE INDEX XIF3Person ON Person
(
    Postal_Code_Key       ASC
)
    ;
CREATE TABLE Postal_Key
(
    Postal_Code_Key       INTEGER  NOT NULL ,
    Postal_Code           CHAR(10)
);
CREATE UNIQUE INDEX XPKPostal_Key ON Postal_Key
(
    Postal_Code_Key       ASC
)
    ;
ALTER TABLE Postal_Key
    ADD CONSTRAINT XPKPostal_Key  PRIMARY KEY (Postal_Code_Key);

CREATE TABLE State
(
    State_Key             INTEGER  NOT NULL ,
    State_Code            CHAR(2) ,
    State_Name            VARCHAR(20)
);
CREATE UNIQUE INDEX XPKState ON State
(
    State_Key             ASC
)
    ;
ALTER TABLE State
    ADD CONSTRAINT XPKState  PRIMARY KEY (State_Key);
ALTER TABLE Person
    ADD CONSTRAINT R_2  FOREIGN KEY (City_Key) REFERENCES City (City_Key);
ALTER TABLE Person
    ADD CONSTRAINT R_3  FOREIGN KEY (State_Key) REFERENCES State (State_Key);

ALTER TABLE Person
    ADD CONSTRAINT R_4  FOREIGN KEY (Postal_Code_Key) REFERENCES Postal_Key (Postal_Code_Key);

Dimensional

CREATE TABLE Person_Dim
(
    Person_Key            INTEGER  NOT NULL ,
    Person_Last_Name      VARCHAR(20) ,
    Person_First_Name     VARCHAR(20) ,
    Person_ID             INTEGER NOT NULL,
    Person_City           VARCHAR(20) ,
    Person_State_Code     CHAR(2) ,
    Person_Postal_Code    CHAR(10)
);
CREATE UNIQUE INDEX XPKPerson_Dim ON Person_Dim
(
    Person_Key            ASC
)
    ;
ALTER TABLE Person_Dim
    ADD CONSTRAINT XPKPerson_Dim  PRIMARY KEY (Person_Key);
CREATE UNIQUE INDEX XAK1Person_Dim ON Person_Dim
(
    Person_ID             ASC
)
    ;

Data Vault

CREATE TABLE Person_Hub
(
    Person_Key            INTEGER  NOT NULL ,
    Person_ID             INTEGER NOT NULL,
    SourceSystemID        CHAR(18) ,
    EDW_Date_Added        DATE
);
CREATE UNIQUE INDEX XPKPerson_Hub ON Person_Hub
(
    Person_Key            ASC
)
    ;
CREATE UNIQUE INDEX XAK1Person_Hub ON Person_Hub
(
    Person_ID             ASC
)
    ;
ALTER TABLE Person_Hub
    ADD CONSTRAINT XPKPerson_Hub  PRIMARY KEY (Person_Key);
CREATE TABLE Person_Sat
(
    Person_Key            INTEGER   NOT NULL ,
    EDW_Begin_Date        INTEGER   NOT NULL ,
    Person_Last_Name      VARCHAR(20) ,
    Person_First_Name     VARCHAR(20) ,
    Person_City           VARCHAR(20) ,
    Person_State_Code     CHAR(2) ,
    Person_Postal_Code    CHAR(10)
);
CREATE UNIQUE INDEX XPKPerson_Sat ON Person_Sat
(
    Person_Key            ASC,
    EDW_Begin_Date        ASC
)
    ;
ALTER TABLE Person_Sat
    ADD CONSTRAINT XPKPerson_Sat  PRIMARY KEY (Person_Key,EDW_Begin_Date);
CREATE INDEX XIF1Person_Sat ON Person_Sat
(
    Person_Key            ASC
)
    ;
ALTER TABLE Person_Sat
    ADD CONSTRAINT R_1  FOREIGN KEY (Person_Key) REFERENCES Person_Hub (Person_Key);


If there are a lot of Person_IDs in your table, you may wish to partition the data.  Next time, I will discuss the different partitioning options and some "gotcha’s" that you need to look out for.

SOURCE: The Use of Surrogate Keys

  • Chuck KelleyChuck Kelley
    Chuck is an internationally known expert in database technology. He has more than 30 years of experience in the design and implementation of operational/production systems, operational data stores and data warehouses (data marts). Chuck teaches seminars on a variety of database and data warehousing topics. He has co-authored or contributed to four books on data warehousing and has been published in numerous trade magazines and written columns on database technology, data warehousing, metadata, master data management, data governance and enterprise data strategies. He may be contacted at chuckkelley@usa.net.

    Editor's Note: More articles, news and resources are available in Chuck Kelley's BeyeNETWORK Expert Channel on Database Design Techniques. Be sure to visit today!

Recent articles by Chuck Kelley


Related TechTarget Editorial Content


 

Comments

Want to post a comment? Login or become a member today!

Be the first to comment!