Most people who have come into contact with a database will be familiar with basic concepts like primary and foreign keys. Yet, many people, including many engineers, lack a formal background in database modeling, let alone set theory, and may be unfamiliar with the many key-related terms used by other team members and in online forums. Technical semantics such as “business key”, “candidate key”, and “hash key” are used to simplify complex ideas into understandable shorthand, making communication more efficient. This article aims to gather and define all known key-related concepts in clear and concise terms. Let’s explore these concepts and their various synonyms to ensure you are well-equipped for any data-related conversation you may encounter.
Keys that databases recognize
In a physical database, keys are defined by constraints, which are rules applied to the data in tables to ensure accuracy, consistency, and integrity. A database constraint is a declarative condition specified on a table or column that restricts the values that can be inserted, updated, or deleted to maintain valid and consistent data. While not universally supported across all databases, common constraints include primary key, foreign key, unique key, not null, check, and default. All keys are constraints, but not all constraints are keys.
- Primary key – a primary key (PK) is a set of columns that uniquely identifies each record in a table. It is used to ensure that there are no duplicate records and to locate specific entries efficiently. The term “primary” indicates that only one primary key can exist for any given table. A PK can not be null (e.g., a PK constraint implies that the column is Unique and NOT NULL [the same is true when the PK consists of multiple columns]). For example, if a customer_id uniquely identifies a customer in the customer table, it could be declared a PK. PKs can be declared inline or out-of-line as primary key constraints.
- Foreign key – When PK columns from one table (referred to as the parent) are included in another table (known as the child), a foreign key (FK) is used. This foreign key establishes a relationship between the two tables, indicating to both the database and its users that the shared columns are not merely coincidental, but rather that they are related through a specific business context. For example, a customer_id, which is a PK in the customer table, would be an FK in the order table. An FK doesn’t just imply that tables are related, but that the values in the related PK/FK columns must also coincide—a concept known as referential integrity (e.g., that customer_id in the order table must contain only values that correspond to those found in the [parent] customer table). Unlike a PK, multiple FKs are permitted in a child table, and their values can be null. FKs are declared out-of-line as foreign key constraints.
- Relationship – a less formal way to refer to an FK, often used in logical modeling context.
- Alternate key – as a table can have only one PK, an alternate key (AK) is any other column or combination of columns that can also uniquely identify a row in the table. Alternate keys follow the same rules as PKs: they can consist of multiple columns and must not contain null values. Additionally, a table can have multiple alternate keys. AKs are declared out-of-line as unique constraints.
- Synonyms:
- Unique (key)
- Alternative key
- Secondary key
- Synonyms:
Key classifications that people understand
The following terms don’t refer to different types of keys, but rather, add context to how the key behaves or what it consists of.
- Composite key – When a key consists of multiple columns, it is called a composite key (e.g., composite PK or composite AK).
- Compound key – synonymous with the above
- Business key – a business key (BK) is any key with intrinsic meaning in the real world or within an organization. Examples include student_id or SKU.
- Surrogate key– the opposite of a BK, a surrogate key (SK) is a single-column, meaningless unique identifier assigned by the system (often an auto-incremented sequence) that serves as the primary key of a table. It is independent of the business data and has no intrinsic meaning. A surrogate key is used to avoid performance issues with long or composite natural keys or to simplify/standardize joins between tables (like in Data Vault). Another reason to use a surrogate key is to prevent the exposure of sensitive business information in joins. Note that the difference between SKs and BKs is subjective. While an employee_id may in fact be a system-generated code, it is still considered a BK if the code is the de facto way of identifying an employee within a company.
- Synonyms:
- Artificial key
- Synthetic key
- Technical key
- Synonyms:
- Hash Key – a type of SK, a hash key is derived by running one or more columns (typically business keys) through a hashing function such as MD5 or SHA. Hash keys give users a deterministic and consistent way to create SKs to improve the performance of joins (by reducing compound keys into SKs) and ensure privacy by obfuscating BKs. By hashing all table columns, a hash key can also be used to easily compare changes across records.
- Natural key – a more restrictive type of business key, natural keys are understood within the context of a company and in the real world. Examples of natural keys include VIN, ISBN, and SKU.
Keys from relational or set theory
Relational databases are grounded in mathematical theory laid out by Edgar F. Codd. Some of the nomenclature from relational theory is also often used in the database context.
- Candidate Key – A minimal set of attributes that can uniquely identify a tuple (row) within a relation (table). This includes PKs and AKs. For example, if a customer_id uniquely identifies a customer in the customer relation, it is considered a candidate key.
- Superkey – A set of attributes that uniquely identifies a tuple (row) within a relation (table) that may be non-minimal (i.e., may be a candidate key and contain additional attributes). For example, if a customer_id uniquely identifies a customer in the customer relation, customer_id and customer_name could be considered a superkey.