Role Overview
We are seeking a highly skilled Data Engineer with strong experience in Azure and Databricks, who will play a critical role in designing, transforming, and operationalizing data pipelines within a modern Lakehouse architecture.
The role primarily focuses on transforming data from the Bronze layer into curated analytics-ready datasets, building automated CI/CD pipelines, and developing high-quality Python and PySpark-based data solutions. The engineer will also collaborate closely with Data Scientists and Software Engineers and should be open to contributing to data-driven UI/UX initiatives.
Data Engineering & Transformation
- Design, develop, and maintain scalable data transformation pipelines using Python (with tools like PySpark, ADF) and SQL in Azure Databricks
- Implement transformation logic to move data from Bronze to Silver/Gold layers following data engineering best practices
- Apply strong data engineering principles to ensure data reliability, quality, performance, and reusability
- Work with structured and semi-structured data at scale
Databricks, Azure & Cloud ETL
- Build and manage Databricks notebooks, jobs, Delta Lake tables, and orchestrated workflows
- Hands-on experience with Cloud-based ETL platforms
(Preferred: Microsoft Azure Databricks, Synapse, Azure Functions; otherwise AWS or Google Cloud)
- Optimize data pipelines for performance, scalability, and cost efficiency
Python Applications, APIs & Automation
- Design, develop, and maintain Python applications, scripts, and APIs for data processing and automation
- Write production-grade Python code with strong focus on readability, maintainability, and testing
- Leverage Python for orchestration, validation, and integration with downstream systems
Collaboration with Data Science & Engineering Teams
- Collaborate closely with Data Scientists and Data Analysts to understand data, analytical models, and consumption requirements
- Enable and support advanced analytics and data science workflows by preparing high-quality feature datasets
- Translate analytical needs into scalable data engineering solutions
CI/CD, DevOps & Platform Engineering
- Build and maintain automated CI/CD pipelines for data and Databricks workloads
- Hands-on experience with DevOps tools and practices, including Git-based version control
- Exposure to containerization and orchestration platforms such as Kubernetes / OpenShift
- Ensure smooth promotion of code and pipelines across environments (Dev/Test/Prod)
Data Modeling & Querying
- Design and implement robust data models optimized for analytics and reporting
- Strong hands-on knowledge of SQL and exposure to KQL or other query languages
- Apply best practices in data structures, indexing, and performance tuning UI / UX & Data Applications (Additional Advantage)
- Open to contributing to data-driven UI/UX components, dashboards, or lightweight data applications
- Work with analytics and business teams to improve data usability and customer experience
Required Skills & Qualifications
Must-Have
- Strong hands-on expertise in Python (with frameworks like PySpark)
- Solid foundation in Data Engineering principles and large-scale data processing
- Experience with Azure Databricks and cloud-based ETL platforms
- Strong knowledge of SQL and data querying techniques
- Experience with CI/CD pipelines and DevOps practices
- Experience in pipeline monitoring and alerting
- Ability to design efficient, scalable solutions to complex data problems
Good-to-Have
- Experience with Azure Synapse, Azure Functions
- Exposure to AWS or Google Cloud data platforms
- Hands-on experience with OpenShift
- Knowledge of data science concepts and workflows
- Familiarity with analytics platforms, dashboards, and UI/UX considerations