From the Birth of Data Modeling to AI – Milestones and Timeline

These days, powerful cloud data platforms like Snowflake, Databricks, and BigQuery empower business-critical use cases from petabyte-scale analytics to cross-cloud data lakes and machine learning. But how did we get here?

Many young data engineers might be surprised to learn that the journey of databases as we know them today dates back to the 1970s. In fact, much of the groundwork for what we do today, from the way we conceptualize data to the SQL we use for querying it, was laid in the early decades before cloud computing, distributed computing, and even the internet.

This timeline highlights how foundational research in the 1970s laid the groundwork for widespread commercial adoption, subsequent standardization, and ongoing innovation, ultimately transforming relational databases into powerful and versatile cornerstones of modern data management.

For more detailed information about the milestones listed in this infographic, please see below.

1970 – E. F. Codd’s Seminal Paper

Key event: Edgar F. Codd publishes “A Relational Model of Data for Large Shared Data Banks.”
Significance: Lays the theoretical groundwork for relational databases, introducing the concepts of relations, tuples, and normalization.

1976 – Chen’s Entity-Relationship (ER) Model

Key event: Peter Chen’s Publication: “The Entity-Relationship Model—Toward a Unified View of Data.”
Significance: Chen notation introduces ER diagrams as a way to visually represent entities, relationships, and attributes, becoming a foundational approach for conceptual data modeling.

Late 1970s / Early 1980s – System R and INGRES, and SQL

Key event: IBM System R: Builds on Codd’s ideas and pioneers SQL (originally called SEQUEL). UC Berkeley develops INGRES, a precursor to Postgres.
Significance: Another influential system and research project solidified SQL as a means of communicating with databases and paved the way for future databases, such as PostgreSQL.

1979 – Oracle’s First Commercial RDBMS

Key Event: Oracle (then Relational Software, Inc.) releases the first commercially available relational database leveraging SQL.
Significance: Proves that relational theory can be successfully commercialized and broadly adopted in businesses.

1981 – Barker Notation

Key event: Richard Barker develops this notation while working on CASE tools for data modeling.
Significance: Barker notation focuses on entity-relationship modeling, featuring a cleaner and more streamlined diagram style that emphasizes clarity in large-scale database design.

Mid-1980s – IDEF1X Notation

Key event: Developed under U.S. Air Force projects for data modeling.
Significance: Standardizes data modeling techniques, specifying entities, relationships, and key constraints in a way particularly suitable for relational schema design and government/enterprise documentation.

1983 – IBM DB2

Key event: IBM introduces DB2 on mainframe systems.
Significance: Solidifies the enterprise use of relational databases on a large scale.

1985 Microsoft Excel

Key event: Microsoft releases Excel
Significance: Microsoft releases Excel, a spreadsheet app that organizes data in columns and rows, to compete with the market leader, Lotus 1-2-3. The popularity of Excel reinforces the understanding of columnar data structures, similar to those found in databases, in the public consciousness.

Mid-1980s – SQL Standardization

Key event: American National Standards Institute (ANSI) SQL (1986) and International Organization for Standardization (ISO) SQL (1987) standards adopted.
Significance: Solidifies SQL as the standard relational query language, facilitating interoperability across different vendors.

1992 – Bill Inmon and the Data Warehouse Concept

Key event: Bill Inmon introduces the Data Warehouse Concept
Significance: Bill Inmon introduced the term “data warehouse” in a series of articles and papers published in the late 1980s, and he formalized the concept in his influential book: “Building the Data Warehouse” – first published in 1992. In the book, Inmon defined a data warehouse as: “A subject-oriented, integrated, time-variant, and non-volatile collection of data in support of management’s decision-making process.” This definition and the architectural principles around it became the cornerstone of enterprise data warehousing strategies for decades.
Methodology:
- Top-down, enterprise-wide approach (Corporate Information Factory).
- Emphasizes a centralized data warehouse with normalized data, supporting broad historical analysis.

Mid-1990s – Proliferation of Commercial and Open-Source RDBMS

Key event: SQL Server, MySQL, and PostgreSQL emerge as popular RDBMS solutions for both enterprise and open-source projects.
Significance: Microsoft SQL Server emerges (building on Sybase code), becoming a major competitor on Windows platforms. In 1995, MySQL was released, offering a lightweight, open-source relational database option that fueled widespread adoption of the web. PostgreSQL evolved from the POSTGRES project at Berkeley, introducing advanced features such as object-relational capabilities.

1996 – Ralph Kimball and Dimensional Modeling

Key event: Ralph Kimball introduces dimensional modeling
Significance: Kimball begins formalizing his bottom-up, dimensional modeling philosophy in publications and training. Kimball and Margie Ross formalize this approach in their 1996 book, The Data Warehouse Toolkit. This contrasts with Inmon’s enterprise-focused approach, emphasizing star schemas, data marts, and business-process-centric design.
Methodology:
- Bottom-up approach with a focus on star schemas and data marts (facts and dimensions).
- Iterative development of data warehouses aligned with specific business processes.

1997 – Unified Modeling Language (UML)

Key event: Created by Grady Booch, Ivar Jacobson, and James Rumbaugh, standardized by the OMG (Object Management Group) around 1997.
Significance: Although primarily used for object-oriented software design, UML class diagrams are often employed to model data structures conceptually, thereby bridging application development and database design. The concept ultimately fails to live up to its promise of a “unified” single standard.

2001 Data Vault

Key event: Dan Linstedt introduces the Data Vault methodology
Significance: Dan Linstedt introduced Data Vault in the early 2000s through white papers, conference talks, and consulting engagements. The methodology was expanded in 2015 with the introduction of Data Vault 2.0, which included concepts such as big data and NoSQL.
Methodology:
- A modeling methodology for data warehouses designed to handle change, scalability, and historical tracking.
- Uses Hubs (business keys), Links (relationships), and Satellites (context/history) to decouple structure and improve agility.
- Its application is Ideal for enterprise data warehouses that require auditability, versioning, and agility under evolving business rules.

2006 – Hadoop and Distributed Computing

Key event: Hadoop and MapReduce popularized
Significance: Hadoop, inspired by Google’s MapReduce paper, enables large-scale, batch-oriented data processing on commodity hardware. Hadoop challenges the dominance of traditional RDBMS for certain analytic workloads and sparks the broader big data movement.

2006 – Amazon Web Services (AWS) Officially Launched

Key event: Amazon offers services like S3 (Simple Storage Service) and EC2 (Elastic Compute Cloud)
Significance: Introduces the world to scalable, on-demand cloud infrastructure, laying the groundwork for cloud-native databases and data warehouses.

Mid-2000s – The NoSQL Movement

Key event: NoSQL (Not Only SQL) databases released to meet the demands of big data
Significance: The demand for high availability, horizontal scalability, and flexible schemas in web-scale applications has driven innovations such as MongoDB, Cassandra, CouchDB, and Redis. NoSQL databases often drop rigid schemas in favor of document, key-value, or wide-column models.

2010 – Google BigQuery

Key event: Google releases BigQuery to beta, GA in 2011
Significance: A fully-managed, serverless data warehouse for large-scale analytics, using columnar storage and distributed querying. Showcases how cloud-native, on-demand services can disrupt traditional on-prem data warehousing.

2010 – Data Lake Concept Introduced

Key event: James Dixon introduces the concept of a data lake.
Significance: To contrast with a data mart, a data lake is a centralized repository designed to store and process large amounts of structured, semi-structured, and unstructured data in its native format, allowing for various types of analytics, including big data processing, real-time analytics, and machine learning.

2012 – Snowflake Founded

Key event: Snowflake founded and publicly available in 2015.
Significance: A cloud data platform to offer near-zero maintenance, scalable compute, and storage. Snowflake offers near-infinite scalability, concurrent workloads, and simplified administration.

2013 – Amazon Redshift

Key event: Redshift is released
Significance: A cloud data warehousing platform, Redshift is built on top of massively parallel processing technology and based on Postgres, to handle analytic workloads on big data sets stored by a column-oriented DBMS principle.

2013 – Databricks

Key event: Databricks founded
Significance: Unified analytics platform, combining data engineering, data science, and warehousing. Databricks later pioneered the lakehouse concept, which merges data lake scalability with warehousing performance, blurring traditional boundaries.

2018 – dbt Launches Commercial Offering

Key event: Fishtown Analytics releases dbt as a commercial product
Significance: Since 2016, dbt (data build tool) Core has been available as open source. In 2018, the dbt Labs team (then called Fishtown Analytics) released a commercial product on top of dbt Core. The tool enables engineers to decouple transformational logic from object parameters, allowing them to create dynamic data pipelines.

2019 – Data Mesh

Key event: Zhamak Dehghani introduces the Data Mesh framework
Significance: In a blog post titled “How to Move Beyond a Monolithic Data Lake to a Distributed Data Mesh”, Dehdhani introduces the world to Data Mesh. The framework takes a socio-technical approach to data architecture and promotes decentralized, domain-oriented ownership of data. Data Mesh treats data as a product, with cross-functional teams owning pipelines, quality, and access.
Methodology: Data mesh includes four key principles:
1. Domain-oriented ownership
2. Data as a product
3. Self-serve infrastructure
4. Federated computational governance

2019 – Data Lakehouse Concept

Key event: The Lakehouse concept blurs the line between data lake and warehouse
Significance: Cloud data platforms like Databricks pioneer a hybrid approach that can ingest a variety of raw data formats, similar to a data lake, yet provide ACID transactions and enforce data quality, much like a data warehouse.

2022 – ChatGPT Released by OpenAI

Key event: ChatGPT is introduced to the public

Significance: ChatGPT is a generative artificial intelligence chatbot developed by OpenAI and launched in 2022. It raises the bar for natural language processing and demonstrates that AI is much more capable than previously thought.

2023 – Generative AI Meets Structured Data

Key event: Cloud platforms like Databricks and Snowflake introduce AI copilots

Significance: LLMs integrated into data platforms for code generation, SQL generation, semantic querying, and autonomous agents for data exploration. The move reframes how users interact with data warehouses—moving from structured query writing to conversational and AI-assisted analytics.

Sign in

Platform

Data Modeling

Model Governance

Snowflake Schema Monitoring

Integration

See all

Why SqlDBM?

Have questions?

Resources

Strategic advisors

Kent Graziano

Gordon Wong

For cloud data platforms

Try modeling now

Data Modeling

Model Governance

Snowflake Schema Monitoring

Integration

See all

Why SqlDBM?

Have questions?

Try modeling now

Pricing

Security

From the Birth of Data Modeling to AI – Milestones and Timeline

1970 – E. F. Codd’s Seminal Paper

1976 – Chen’s Entity-Relationship (ER) Model

Late 1970s / Early 1980s – System R and INGRES, and SQL

1979 – Oracle’s First Commercial RDBMS

1981 – Barker Notation

Mid-1980s – IDEF1X Notation

1983 – IBM DB2

1985 Microsoft Excel

Mid-1980s – SQL Standardization

1992 – Bill Inmon and the Data Warehouse Concept

1996 – Ralph Kimball and Dimensional Modeling

1997 – Unified Modeling Language (UML)

2001 Data Vault

2006 – Hadoop and Distributed Computing

2006 – Amazon Web Services (AWS) Officially Launched

Mid-2000s – The NoSQL Movement

2010 – Google BigQuery

2010 – Data Lake Concept Introduced

2012 – Snowflake Founded

2013 – Amazon Redshift

2013 – Databricks

2018 – dbt Launches Commercial Offering

2019 – Data Mesh

2019 – Data Lakehouse Concept

2022 – ChatGPT Released by OpenAI

2023 – Generative AI Meets Structured Data

Info

Why SqlDBM?

Resources

Cookies policy

Resources

Strategic advisors

Kent Graziano

Gordon Wong

For cloud data platforms

Try modeling now

Platform

Data Modeling

Model Governance

Snowflake Schema Monitoring

Integration

See all

Why SqlDBM?

Have questions?