Data Engineering Internship 2026
Chapter 2
Day 0 — Setup Checklist
Complete every item before your first session. If anything fails, post in the help channel with the exact error message.
2.1 Operating System
All tools in this program run on Linux. Choose your path:
| If you use... | What to do |
|---|---|
| Windows | Install WSL2 (Windows Subsystem for Linux). All commands run inside WSL2, not PowerShell. |
| macOS | Use the built-in Terminal. All commands work natively. |
| Linux | You are ready. Skip to Step 2. |
WSL2 Installation (Windows only)
# Run in PowerShell as Administrator: wsl --install # Restart your machine, then open "Ubuntu" from the Start menu # Set a Unix username and password when prompted
2.2 Required Tools
Docker Desktop
# macOS: download from https://www.docker.com/products/docker-desktop # WSL2: install Docker Desktop for Windows, then enable WSL2 backend in settings docker --version # confirm installation docker compose version
Python 3.11+
# WSL2 / Ubuntu: sudo apt update && sudo apt install python3 python3-pip python3-venv -y python3 --version # macOS (if not installed): brew install python@3.11
PostgreSQL
# Start a PostgreSQL container (recommended — avoids local install issues): docker run --name postgres-dev \ -e POSTGRES_PASSWORD=postgres \ -e POSTGRES_USER=postgres \ -e POSTGRES_DB=bootcamp \ -p 5432:5432 -d postgres:16 # Verify connection: docker exec -it postgres-dev psql -U postgres -c "SELECT version();"
dbt Core
pip install dbt-postgres dbt --version
Apache Airflow
# Use the official Docker Compose setup (easiest for local dev): curl -LfO "https://airflow.apache.org/docs/apache-airflow/stable/docker-compose.yaml" mkdir -p ./dags ./logs ./plugins echo "AIRFLOW_UID=$(id -u)" > .env docker compose up airflow-init docker compose up -d # Open http://localhost:8080 (user: airflow, pass: airflow)
PySpark
pip install pyspark python3 -c "from pyspark.sql import SparkSession; print(SparkSession.builder.getOrCreate())"
VS Code + Extensions
- ms-python.python — Python IntelliSense (mandatory)
- ms-toolsai.jupyter — Jupyter notebooks
- innoverio.vscode-dbt-power-user — dbt IntelliSense and model runner
- ms-ossdata.vscode-postgresql — PostgreSQL query runner
- mtxr.sqltools — General SQL client
- ms-vscode-remote.remote-wsl — Connect VS Code to WSL2 (Windows only)
2.3 Verify Your Stack
python3 --version # 3.11+ docker --version # 24+ dbt --version # 1.8+ psql --version # 16+ (or via docker) python3 -c "import pyspark; print(pyspark.__version__)"
Watch Out
Do not proceed to Week 1 activities with a broken environment.
Every tool needs to work on Day 0. Debugging setup mid-week wastes everyone's time.
If docker compose up fails: check that Docker Desktop is running and WSL2 integration is enabled.