Data Engineering Internship 2026
Chapter 12
Glossary
| Term | Definition |
|---|---|
| OLTP | Online Transaction Processing. Databases optimised for fast row-level reads and writes. Your app's production database. |
| OLAP | Online Analytical Processing. Systems optimised for complex aggregations across large datasets. Data warehouses. |
| ETL | Extract, Transform, Load. Data is transformed before loading into the target system. |
| ELT | Extract, Load, Transform. Data is loaded raw first, then transformed using SQL (the dbt approach). |
| Star Schema | A data warehouse schema with a central fact table surrounded by denormalised dimension tables. |
| Snowflake Schema | A star schema where dimension tables are further normalised into sub-dimension tables. |
| Fact Table | Stores measurable, quantitative data (ratings, counts, amounts). Contains foreign keys to dimensions. |
| Dimension Table | Stores descriptive context (movie titles, genres, user details, dates). |
| ERD | Entity-Relationship Diagram. A visual map of database tables and their relationships. |
| dbt | Data Build Tool. Runs SQL transformations and manages the dependency graph between models. |
| dbt model | A SQL SELECT statement saved as a .sql file. dbt compiles it into a table or view. |
| ref() | dbt function that references another dbt model. Builds the dependency graph automatically. |
| source() | dbt function that references a raw source table. Enables source freshness checks. |
| DAG | Directed Acyclic Graph. In Airflow, a DAG defines the tasks in a pipeline and their dependencies. |
| Operator | An Airflow class that defines what a task does. BashOperator, PythonOperator, etc. |
| DataFrame | A 2D tabular data structure in Pandas or PySpark. Like a spreadsheet in code. |
| Schema (dbt) | A dbt YAML file (schema.yml) that defines tests and documentation for models. |
| DAX | Data Analysis Expressions. The formula language used in Power BI for measures and calculated columns. |
| Measure (Power BI) | A dynamic calculation evaluated at query time based on filter context. Always use for aggregations. |
| Star Schema (Power BI) | The recommended data model layout in Power BI: fact table in centre, dimensions surrounding it. |
| Window function | SQL function that operates across a set of rows related to the current row. RANK, LAG, LEAD, SUM OVER. |
| CTE | Common Table Expression. A named subquery defined using WITH. Makes SQL modular and readable. |
| Cardinality | The uniqueness of data values in a column, or the type of relationship between tables (1:1, 1:N, N:M). |
| Data Lineage | The path data takes from source to final model. Visible in the dbt docs lineage graph. |