A3 Databases

For the new IB Diploma Computer Science syllabus to start teaching in August 2025, and for first examinations in May 2027.

Unit and lesson overviews will be gradually published as developed.

Lesson 1: Database fundamentals
Lesson 2: Schemas and data types
Lesson 3: Entity relationship diagrams
Lesson 4: Normalisation
Lesson 5: Designing 3NF databases
Lesson 6: Introducing SQL
Lesson 7: SQL joins
Lesson 8: SQL update & insert
Lesson 9: SQL aggregate functions (HL)
Lesson 10: Views (HL)
Lesson 11: Transactions (HL)
Lesson 12: Alternatives & warehouses (HL)
Lesson 13: Data mining & distributed databases (HL)
Lesson 13: Using SQL with Python (bonus)
Lesson 14: Exam style questions
Lesson 15: Assessment

Lesson 1: Database fundamentals

A3.1.1 Explain the features, benefits and limitations of a relational database.

Features: composite keys, foreign keys, primary keys, relationships, tables
Benefits of databases: community support, concurrency control, data consistency, data integrity, data retrieval, reduced data duplication, reduced redundancy, reliable transaction processing, scalability, security features
Limitations of databases: “big data” scalability issues, design complexity, hierarchical data handling, rigid schema, object-relational impedance mismatch, unstructured data handling

Lesson 2: Schemas and data types

A3.2.1 Describe database schemas.

Conceptual schema, logical schema, physical schema
Abstract definitions of the data structure and organization of the data at different levels

A3.2.3 Outline the different data types used in relational databases.

The importance of data type consistency
The potential effects of choosing the wrong data type

Lesson 3: Entity relationship diagrams

A3.2.2 Construct ERDs.

The significance of entity relationship diagrams (ERDs) in crafting organized, efficient database designs tailored for specific applications
The relationships between different data entities within a database
The roles of cardinality and modality in defining relationships in ERDs

A3.2.4 Construct tables for relational databases.

The relationship between tables using primary keys, foreign keys, composite keys and concatenated keys
The importance of well-defined tables in ensuring data integrity

Lesson 4: Normalisation

A3.2.5 Explain the difference between normal forms.

First normal form (1NF), second normal form (2NF), third normal form (3NF)
The terms atomicity, unique identification, functional dependencies, partial-key dependencies, non- key/transitive dependencies
Normalization issues can encompass data duplication, missing data, and a range of dependency concerns, including data dependencies, composite key dependencies, transitive dependencies, and multi-valued dependencies.

A3.2.7 Evaluate the need for denormalizing databases.

The advantages and disadvantages of normalizing and denormalizing databases
Situations where denormalization can enhance performance, particularly in read-intensive applications
The balance between straightforward query structures and the risk of data redundancy in denormalized schemas

Lesson 5: Designing 3NF databases

A3.2.6 Construct a database normalized to 3NF for a range of real-world scenarios

Examples may include library management, hospital management, e-commerce platforms, school management, employee management, inventory management, police crime reporting

Lesson 6: Introducing SQL

A3.3.1 Outline the differences between data language types within SQL.

Data language types must include data definition language (DDL) and data manipulation language (DML)
SQL statements to define data structures or to manipulate data

Lesson 7: SQL joins

A3.3.2 Construct queries between two tables in SQL.

Queries must include joins, relational operators, filtering, pattern matching, and ordering data
SQL commands: SELECT, DISTINCT, FROM, WHERE, BETWEEN, ORDER BY, GROUP BY, HAVING, ASC, DESC, JOIN, LIKE with % wildcard, AND, OR, NOT (note: Syntax may vary in different database systems)

Lesson 8: SQL update & insert

A3.3.3 Explain how SQL can be used to update data in a database.

Insert new records (INSERT INTO), modify data (UPDATE SET), remove data (DELETE)
The performance implications of updating data in indexed columns, and how indexes might need to be rebuilt or reorganized following significant data modifications

Lesson 9: SQL aggregate functions (HL)

A3.3.4 Construct calculations within a database using SQL’s aggregate functions. (HL only)

Aggregate functions on grouped data to aid reporting and decision-making
Aggregate commands: AVERAGE, COUNT, MAX, MIN, SUM

Lesson 10: Views (HL)

A3.3.5 Describe different database views. (HL only)

Virtual views and materialized (snapshot) views
Hiding data complexity, data consistency, independence, performance, query simplification, read-only data or updatable data, security

Lesson 11: Transactions (HL)

A3.3.6 Describe how transactions maintain data integrity in a database. (HL only)

The role of atomicity, consistency, isolation and durability (ACID) to ensure reliable processing of transactions
Transaction control language (TCL) commands: BEGIN TRANSACTION, COMMIT, ROLLBACK

Lesson 12: Alternatives & warehouses (HL)

A3.4.1 Outline the different types of databases as approaches to storing data.

Databases models: NoSQL, cloud, spatial, in-memory
Examples of the use of the database model in real-world scenarios may include e-commerce platforms, geographic information systems (GIS), managed services, real-time analytics, social media platforms, SaaS.

A3.4.2 Explain the primary objectives of data warehouses in data management and business intelligence.

The roles of append-only data, subject-oriented data, integrated data, time-variant data, non-volatile data and data optimized for query performance, to ensure efficient data storage and analysis

Lesson 13: Data mining & distributed databases (HL)

A3.4.3 Explain the role of online analytical processing (OLAP) and data mining for business intelligence.

Data mining techniques must include classification, clustering, regression, association rule discovery, sequential pattern discovery, anomaly detection (note: This links to “A4 Machine learning”).
The uses of the techniques in extracting meaningful information from large data sets

A3.4.4 Describe the features of distributed databases.

The need to maintain data consistency in a distributed database
The role of ACID to ensure reliable processing of transactions in distributed databases
Features of distributed databases: concurrency control, data consistency, data partitioning, data security, distribution transparency, fault tolerance, global query processing, location transparency, replication, scalability