Published 11 hours ago
AI/ML & Advanced Analytics
Develop, train, and optimize ML models using Python, PySpark, MLflow, and Databricks Machine Learning.
Conduct exploratory data analysis (EDA) to identify patterns, trends, and insights in large datasets.
Deploy ML models into production using MLflow, Databricks Workflows, or other MLOps pipelines.
Build analytics solutions such as forecasting, anomaly detection, segmentation, or recommendation systems.
Design ML architectures aligned with Databricks Lakehouse on Azure.
Data Engineering & Lakehouse Architecture
Architect and build scalable ETL/ELT pipelines using PySpark, SQL, and Databricks Workflows.
Implement Delta Lake best practices, including OPTIMIZE, ZORDER, partitioning, and schema evolution.
Design lakehouse layers (Bronze/Silver/Gold) with strong separation of compute and serving layers.
Optimize cluster performance and jobs using Spark tuning, caching, and shuffle minimization.
Work with multi-terabyte, time-series, high‑velocity data in a distributed environment.
Ensure robust data availability for downstream ML and analytics workloads.
AWS Cloud Integration
Architect end-to-end data and ML solutions using Azure services, including:
S3 for storage
IAM for identity & access
Glue Catalog for metadata management
Networking for secure, high‑throughput data movement
Integrate Databricks with AWS-native compute, API layers, and low-latency endpoints.
Business Collaboration & Leadership
Translate business problems into scalable analytical or ML architectures.
Communicate complex statistical and architectural concepts to non‑technical stakeholders.
Collaborate with product, engineering, and business leaders to drive data-informed initiatives.
Provide design leadership while remaining hands-on in execution.