Environment Variables¶
FlumenData uses environment variables to configure all services. Variables are defined in the .env file and used by Docker Compose and Makefile templates.
Configuration Management¶
Template System¶
Configuration files are generated from templates:
templates/ # Source templates
├── hive/
│ └── hive-site.xml.tpl
├── spark/
│ ├── spark-defaults.conf.tpl
│ └── spark-env.sh.tpl
├── jupyterlab/
│ └── spark-defaults.conf.tpl
├── minio/
│ └── policy-readonly.json.tpl
├── superset/
│ ├── superset.env.tpl
│ └── superset_config.py.tpl
└── trino/
└── catalog/
config/ # Generated files (DO NOT EDIT)
├── hive/
│ └── hive-site.xml
├── spark/
│ ├── spark-defaults.conf
│ ├── spark-env.sh
│ └── hive-site.xml (copied from hive/)
├── jupyterlab/
│ └── spark-defaults.conf
├── minio/
│ └── policy-readonly.json
├── superset/
│ ├── superset.env
│ └── superset_config.py
└── trino/
└── catalog/
Regenerating Configuration¶
After modifying .env, regenerate configuration:
# Regenerate all configuration files
make config
# Regenerate specific service
make config-hive
make config-spark
make config-minio
make config-jupyterlab
make config-trino
make config-superset
# Restart services to apply changes
make restart
Never Edit Generated Files
Files in the config/ directory are auto-generated. Always edit templates in templates/ and run make config.
Core Variables¶
PostgreSQL¶
# PostgreSQL configuration
POSTGRES_USER=flumen # Database user
POSTGRES_PASSWORD=flumen123 # User password
POSTGRES_DB=flumendata # Database name
POSTGRES_PORT=5432 # External port
Used by: - PostgreSQL container - Hive Metastore (JDBC connection) - All targets that access PostgreSQL
Examples:
# Connect to PostgreSQL
docker exec -it flumen_postgres psql -U ${POSTGRES_USER} -d ${POSTGRES_DB}
# Change password (requires restart)
POSTGRES_PASSWORD=newsecurepass123
make config
make restart
MinIO¶
# MinIO S3-compatible object storage
MINIO_ROOT_USER=minioadmin # Admin username
MINIO_ROOT_PASSWORD=minioadmin123 # Admin password (min 8 chars)
MINIO_SERVER_URL=http://minio:9000 # Internal S3 endpoint
MINIO_CONSOLE_PORT=9001 # Console UI port
MINIO_BUCKET=lakehouse # Delta Lake warehouse bucket
MINIO_STORAGE_BUCKET=storage # Staging bucket for raw files
Used by: - MinIO container - Spark (S3A configuration) - Hive Metastore (warehouse location)
Security Notes: - Use strong passwords in production - Default credentials are for development only - Console accessible at: http://localhost:9001
Examples:
# Change credentials (requires restart)
MINIO_ROOT_USER=admin
MINIO_ROOT_PASSWORD=SuperSecure123
make config
make restart
# Create additional buckets
docker exec flumen_minio mc mb /data/bronze
docker exec flumen_minio mc mb /data/silver
docker exec flumen_minio mc mb /data/gold
Hive Metastore¶
Used by: - Spark (catalog connection) - Any external tool connecting to Hive Metastore
Examples:
# Verify connectivity
docker exec flumen_spark_master nc -zv hive-metastore 9083
# Check metastore logs
make logs-hive
Spark¶
# Spark cluster configuration
SPARK_MASTER_HOST=spark-master # Master hostname
SPARK_MASTER_PORT=7077 # Spark protocol port
SPARK_MASTER_WEBUI_PORT=8080 # Web UI port
SPARK_WORKER_CORES=2 # Cores per worker
SPARK_WORKER_MEMORY=2g # Memory per worker
Used by: - Spark Master container - Spark Worker containers - spark-env.sh template
Examples:
# Increase worker resources
SPARK_WORKER_CORES=4
SPARK_WORKER_MEMORY=4g
make config-spark
make restart
# Access Spark UI
open http://localhost:8080
Delta Lake¶
# Delta Lake and dependency versions
DELTA_VERSION=4.0.0 # Delta Lake version
SCALA_BINARY_VERSION=2.13 # Scala version
POSTGRESQL_JDBC_VERSION=42.7.1 # PostgreSQL JDBC driver
HADOOP_AWS_VERSION=3.3.6 # Hadoop AWS (S3A) version
AWS_SDK_BUNDLE_VERSION=1.12.367 # AWS SDK version
Used by: - spark-defaults.conf template (JAR dependencies) - Spark Ivy cache
Changing Versions:
# Upgrade Delta Lake
DELTA_VERSION=4.1.0
make config-spark
# Clear Ivy cache to download new JARs
docker volume rm flumen_spark_ivy
make restart
Version Compatibility
Ensure Delta Lake, Spark, and Scala versions are compatible. Check Delta Lake releases.
Trino¶
# Trino coordinator
TRINO_PORT=8082 # Host/UI port mapped to internal 8080
TRINO_VERSION=450 # Trino image tag
TRINO_ENVIRONMENT=lakehouse # Value stored in node.properties
Used by:
- docker-compose.tier3.yml (ports, image tag)
- makefiles/trino.mk targets
- templates/trino/node.properties.tpl
Examples:
# Avoid port conflicts
TRINO_PORT=9090
make up-tier3
# Pin a specific Trino release
TRINO_VERSION=448
make up-tier3
Superset¶
# Superset BI UI
SUPERSET_VERSION=5.0.0 # Apache Superset image tag
SUPERSET_PORT=8088 # Host/UI port
SUPERSET_DB_NAME=superset # Metadata database inside PostgreSQL
SUPERSET_SECRET_KEY=flumen_superset_secret
SUPERSET_ADMIN_USERNAME=admin
SUPERSET_ADMIN_PASSWORD=admin123
SUPERSET_ADMIN_EMAIL=admin@flumen.local
SUPERSET_ADMIN_FIRST_NAME=Superset
SUPERSET_ADMIN_LAST_NAME=Admin
Used by:
- templates/superset/superset.env.tpl and superset_config.py.tpl
- makefiles/superset.mk (config + database bootstrap)
- docker-compose.tier3.yml (service definition, ports, command)
- docker/superset.Dockerfile build args
Examples:
# Serve Superset on a different port
SUPERSET_PORT=8090
make up-tier3
# Rotate admin credentials
SUPERSET_ADMIN_USERNAME=bi_admin
SUPERSET_ADMIN_PASSWORD="$(openssl rand -hex 12)"
make config-superset
# Upgrade Superset version (remember to rebuild the image)
SUPERSET_VERSION=5.1.0
make build-superset
make up-tier3
## Advanced Configuration
### S3A Performance Tuning
Add to `.env` (requires template modifications):
```bash
# S3A connection pool
S3A_CONNECTION_MAXIMUM=100
S3A_CONNECTION_TIMEOUT=60000
S3A_ESTABLISH_TIMEOUT=30000
# S3A upload settings
S3A_BLOCK_SIZE=134217728 # 128 MB
S3A_MULTIPART_SIZE=134217728 # 128 MB
S3A_MULTIPART_THRESHOLD=268435456 # 256 MB
Spark Performance¶
# Spark SQL optimization
SPARK_SQL_ADAPTIVE_ENABLED=true
SPARK_SQL_SHUFFLE_PARTITIONS=200
SPARK_SQL_FILES_MAX_PARTITION_BYTES=134217728
# Spark memory
SPARK_DRIVER_MEMORY=2g
SPARK_EXECUTOR_MEMORY=2g
Logging Levels¶
Control verbosity:
Environment-Specific Configurations¶
Development¶
# .env.dev
POSTGRES_PASSWORD=dev123
MINIO_ROOT_PASSWORD=dev123
SPARK_WORKER_CORES=2
SPARK_WORKER_MEMORY=2g
Staging¶
# .env.staging
POSTGRES_PASSWORD=staging_secure_pass
MINIO_ROOT_PASSWORD=staging_secure_pass
SPARK_WORKER_CORES=4
SPARK_WORKER_MEMORY=4g
Production¶
# .env.prod
POSTGRES_PASSWORD=$(generate_secure_password)
MINIO_ROOT_PASSWORD=$(generate_secure_password)
SPARK_WORKER_CORES=8
SPARK_WORKER_MEMORY=16g
# Enable TLS
MINIO_SERVER_URL=https://minio:9000
Validation¶
Check Current Values¶
Verify Configuration Applied¶
# Check Spark configuration
docker exec flumen_spark_master cat /opt/spark/conf/spark-defaults.conf
# Check Hive configuration
docker exec flumen_hive_metastore cat /opt/hive/conf/hive-site.xml
Test Changes¶
# After changing .env
make config # Regenerate configs
make health # Verify services still healthy
make test # Run integration tests
Troubleshooting¶
Changes not taking effect¶
Services fail after config change¶
# View logs for errors
make logs
# Revert to previous .env
git checkout .env
# Regenerate and restart
make config
make restart
Template rendering errors¶
# Check Makefile render function
make config 2>&1 | grep ERROR
# Verify all required variables are set
env | grep -E "POSTGRES|MINIO|SPARK|HIVE|DELTA"
Best Practices¶
- Version Control: Commit
.env.example, never commit.env - Secrets: Use secrets management in production (Vault, AWS Secrets Manager)
- Documentation: Comment your
.envfile with explanations - Testing: Test configuration changes in development first
- Backups: Backup
.envbefore making changes - Validation: Run
make healthafter any configuration change
Next Steps¶
- Make Commands Reference - All available commands
- Architecture - Understand component relationships