Platform

Data Modeling

Develop data models collaboratively in the cloud and share them with your organization in various modeling styles and formats with no coding or conversion required

Model Governance

Create and manage business metadata using a dedicated project role

Snowflake Schema Monitoring

Track and get notified of schema changes in live database environments

Integration

Strategic advisors

Kent Graziano

The Data Warrior, Strategic Advisor, Data Vault Master, Author, Speaker, and Tae Kwon Do Grandmaster

Gordon Wong

Leading organizations through analytics transformations, preference for social missions, healthcare, energy, education, and civic engagement

For cloud data platforms

SqlDBM offers secure native connectors to leading data platforms like Snowflake, Databricks, and BigQuery so you can reverse engineer and begin modeling in seconds.

Try modeling now

Data Modeling

Develop data models collaboratively in the cloud and share them with your organization in various modeling styles and formats with no coding or conversion required

Model Governance

Create and manage business metadata using a dedicated project role

Snowflake Schema Monitoring

Track and get notified of schema changes in live database environments

The Secret Life of Keys

Most people who have come into contact with a database will be familiar with basic concepts like primary and foreign keys. Yet, many people, including many engineers, lack a formal background in database modeling, let alone set theory, and may be unfamiliar with the many key-related terms used by other team members and in online forums. Technical semantics such as “business key”, “candidate key”, and “hash key” are used to simplify complex ideas into understandable shorthand, making communication more efficient. This article aims to gather and define all known key-related concepts in clear and concise terms. Let’s explore these concepts and their various synonyms to ensure you are well-equipped for any data-related conversation you may encounter. 

 

Keys that databases recognize

In a physical database, keys are defined by constraints, which are rules applied to the data in tables to ensure accuracy, consistency, and integrity. A database constraint is a declarative condition specified on a table or column that restricts the values that can be inserted, updated, or deleted to maintain valid and consistent data. While not universally supported across all databases, common constraints include primary key, foreign key, unique key, not null, check, and default. All keys are constraints, but not all constraints are keys. 

  • Primary key – a primary key (PK) is a set of columns that uniquely identifies each record in a table. It is used to ensure that there are no duplicate records and to locate specific entries efficiently. The term “primary” indicates that only one primary key can exist for any given table. A PK can not be null (e.g., a PK constraint implies that the column is Unique and NOT NULL [the same is true when the PK consists of multiple columns]). For example, if a customer_id uniquely identifies a customer in the customer table, it could be declared a PK. PKs can be declared inline or out-of-line as primary key constraints.
  • Foreign key – When PK columns from one table (referred to as the parent) are included in another table (known as the child), a foreign key (FK) is used. This foreign key establishes a relationship between the two tables, indicating to both the database and its users that the shared columns are not merely coincidental, but rather that they are related through a specific business context. For example, a customer_id, which is a PK in the customer table, would be an FK in the order table. An FK doesn’t just imply that tables are related, but that the values in the related PK/FK columns must also coincide—a concept known as referential integrity (e.g., that customer_id in the order table must contain only values that correspond to those found in the [parent] customer table). Unlike a PK, multiple FKs are permitted in a child table, and their values can be null. FKs are declared out-of-line as foreign key constraints.
    • Relationship – a less formal way to refer to an FK, often used in logical modeling context. 
  • Alternate key – as a table can have only one PK, an alternate key (AK) is any other column or combination of columns that can also uniquely identify a row in the table. Alternate keys follow the same rules as PKs: they can consist of multiple columns and must not contain null values. Additionally, a table can have multiple alternate keys. AKs are declared out-of-line as unique constraints.
    • Synonyms:
      • Unique (key)
      • Alternative key 
      • Secondary key

 

Key classifications that people understand

The following terms don’t refer to different types of keys, but rather, add context to how the key behaves or what it consists of.

  • Composite key – When a key consists of multiple columns, it is called a composite key (e.g., composite PK or composite AK). 
    • Compound key – synonymous with the above
  • Business key – a business key (BK) is any key with intrinsic meaning in the real world or within an organization. Examples include student_id or SKU. 
  • Surrogate key– the opposite of a BK, a surrogate key (SK) is a single-column, meaningless unique identifier assigned by the system (often an auto-incremented sequence) that serves as the primary key of a table. It is independent of the business data and has no intrinsic meaning. A surrogate key is used to avoid performance issues with long or composite natural keys or to simplify/standardize joins between tables (like in Data Vault). Another reason to use a surrogate key is to prevent the exposure of sensitive business information in joins. Note that the difference between SKs and BKs is subjective. While an employee_id may in fact be a system-generated code, it is still considered a BK if the code is the de facto way of identifying an employee within a company.
    • Synonyms:
      • Artificial key
      • Synthetic key 
      • Technical key
  • Hash Key – a type of SK, a hash key is derived by running one or more columns (typically business keys) through a hashing function such as MD5 or SHA. Hash keys give users a deterministic and consistent way to create SKs to improve the performance of joins (by reducing compound keys into SKs) and ensure privacy by obfuscating BKs. By hashing all table columns, a hash key can also be used to easily compare changes across records. 
  • Natural key – a more restrictive type of business key, natural keys are understood within the context of a company and in the real world. Examples of natural keys include VIN, ISBN, and SKU. 

 

Keys from relational or set theory 

Relational databases are grounded in mathematical theory laid out by Edgar F. Codd. Some of the nomenclature from relational theory is also often used in the database context. 

  • Candidate Key – A minimal set of attributes that can uniquely identify a tuple (row) within a relation (table). This includes PKs and AKs. For example, if a customer_id uniquely identifies a customer in the customer relation, it is considered a candidate key.
  • Superkey – A set of attributes that uniquely identifies a tuple (row) within a relation (table) that may be non-minimal (i.e., may be a candidate key and contain additional attributes). For example, if a customer_id uniquely identifies a customer in the customer relation, customer_id and customer_name could be considered a superkey. 

Lorem ipsum dolor sit amet, consectetur adipiscing elit. Ut elit tellus, luctus nec ullamcorper mattis, pulvinar dapibus leo.

Lorem ipsum dolor sit amet, consectetur adipiscing elit. Ut elit tellus, luctus nec ullamcorper mattis, pulvinar dapibus leo.

Strategic advisors

Kent Graziano

The Data Warrior, Strategic Advisor, Data Vault Master, Author, Speaker, and Tae Kwon Do Grandmaster

Gordon Wong

Leading organizations through analytics transformations, preference for social missions, healthcare, energy, education, and civic engagement

For cloud data platforms

SqlDBM offers secure native connectors to leading data platforms like Snowflake, Databricks, and BigQuery so you can reverse engineer and begin modeling in seconds.

Try modeling now

Platform

Data Modeling

Develop data models collaboratively in the cloud and share them with your organization in various modeling styles and formats with no coding or conversion required

Model Governance

Create and manage business metadata using a dedicated project role

Snowflake Schema Monitoring

Track and get notified of schema changes in live database environments