Data Engineering Internship 2026
Chapter 8
Troubleshooting
8.1 Docker / Environment
| Symptom | Fix |
|---|---|
| docker: command not found | Docker Desktop is not running. Open it from your Applications or Start menu. |
| Airflow UI not loading on port 8080 | docker compose up -d then wait 30 seconds for containers to start. |
| Permission denied on WSL2 file | chmod +x yourscript.sh — the file needs execute permission. |
| Container exits immediately | docker compose logs <service_name> to see the error. Usually a missing env variable. |
8.2 PostgreSQL
| Symptom | Fix |
|---|---|
| Connection refused (port 5432) | Is the postgres container running? docker ps | grep postgres |
| password authentication failed | Check POSTGRES_PASSWORD in your docker run command or .env file. |
| relation does not exist | The table was not created yet. Run your CREATE TABLE script or dbt run first. |
| duplicate key violates unique constraint | Use INSERT ... ON CONFLICT DO NOTHING or truncate the table before reloading. |
8.3 PySpark
| Symptom | Fix |
|---|---|
| Java not found / JAVA_HOME error | PySpark requires Java 11+. Install: sudo apt install openjdk-11-jdk then set JAVA_HOME. |
| py4j.protocol.Py4JJavaError | Read the full stack trace — the actual error is usually buried in it. |
| JDBC driver not found | Download postgresql-42.7.3.jar and pass it to .config("spark.jars", "/path/to/jar"). |
| Out of memory | Increase driver memory: .config("spark.driver.memory", "2g") |
8.4 dbt
| Symptom | Fix |
|---|---|
| "relation does not exist" on a ref() | The upstream model failed. Run: dbt run --select <upstream_model> first. |
| "Found 0 models" on dbt run | Check your model paths in dbt_project.yml match your folder structure. |
| profiles.yml not found | dbt looks in ~/.dbt/profiles.yml by default. Use --profiles-dir ./ to override. |
| Test failed: not_null on X | There are NULL values in that column. Add a WHERE clause to your model to exclude them. |
| Circular ref error | Model A refs B and B refs A. Restructure — one must be the upstream. |
8.5 Airflow
| Symptom | Fix |
|---|---|
| DAG not appearing in UI | Place the .py file in the ./dags folder. Wait 30 seconds for the scheduler to pick it up. |
| Task fails with ModuleNotFoundError | The Python package is not installed in the Airflow container. Add it to requirements.txt and rebuild. |
| Scheduler is not running | docker compose restart airflow-scheduler |
| DAG is paused | Toggle the pause button in the Airflow UI next to the DAG name. |
Common Mistake
The nuclear option when nothing else works:
1. docker compose down
2. docker system prune -f (WARNING: removes all stopped containers and unused images)
3. docker compose up -d
This fixes ~40% of mysterious environment errors.