Job Details
Job Description
Responsibilities:
Spark Optimization: Act as the internal SME for Spark internals; manage memory, shuffle tuning, and partitioning for cost-effective performance.
Cloud-Agnostic Development: Build pipelines using Python and Delta Lake, decoupling code from specific cloud providers and reducing reliance on GUI tools (e.g., ADF).
Refactoring & Modernization: Migrate complex SQL-based ETL into modular, testable, and maintainable Python libraries.
Lakehouse Engineering: Manage Medallion Architecture (Bronze/Silver/Gold) using Delta Lake, ensuring storage performance via Z-Ordering and Vacuuming.
Code-First Orchestration: Support the transition to code-centric patterns (Airflow, Dagster) to prioritize portability.
Technical Excellence: Lead code reviews, mentor junior engineers, and implement automated testing frameworks (Pytest).
Minimum Requirements:
- Education: Bachelor’s degree in Computer Science, Information Systems, Engineering, or a related field.
Spark Mastery: 6+ years of Spark/PySpark experience; expert ability to diagnose bottlenecks via Spark UI and optimize complex DAGs.
Advanced Python: Proficiency in production-grade Python, including building reusable libraries and automated testing.
Azure Ecosystem: Strong experience with Azure Synapse, Dedicated SQL Pools, and Data Factory.
Modern Data Stack: Hands-on experience with Delta Lake, Parquet, and containerization (Docker).
Migration Skills: Solid T-SQL skills to interpret and migrate legacy logic into Python-centric environments.
Security & Governance: Proven ability to implement high levels of security and compliance across data processes.
Benefits:
- Competitive salary based on experience (salary can potentially be more based on experience/skills)
IF you meet the above requirements and want to make a career-changing move, apply today by emailing your CV to [email protected]