Senior Data Engineer, Databricks

Posted 80ds ago

Employment Information

Education
Salary
Experience
Job Type

Report this job

Job expired or something wrong with this job?

Job Description

Senior Data Engineer designing and optimizing data pipelines for leading Generative AI Platform for Commerce. Building scalable data solutions using Databricks and focusing on production-grade data management.

Responsibilities:

  • Design and implement enterprise-scale data pipelines using Databricks on AWS, leveraging both cluster-based and serverless compute paradigms
  • Architect and maintain medallion architecture (Bronze/Silver/Gold) data lakes and lakehouses
  • Develop and optimize Delta Lake tables for ACID transactions and efficient data management
  • Build and maintain real-time and batch data processing workflows
  • Create reusable, modular data transformation logic using DBT to ensure data quality and consistency across the organization
  • Develop complex Python applications for data ingestion, transformation, and orchestration
  • Write optimized SQL queries and implement performance tuning strategies for large-scale datasets
  • Implement comprehensive data quality checks, testing frameworks, and monitoring solutions
  • Design and implement CI/CD pipelines for automated testing, deployment, and rollback of data artifacts
  • Configure and optimize Databricks clusters, job scheduling, and workspace management
  • Implement version control best practices using Git and collaborative development workflows
  • Partner with data analysts, data scientists, and business stakeholders to understand requirements and deliver solutions
  • Mentor junior engineers and promote best practices in data engineering
  • Document technical designs, data lineage, and operational procedures
  • Participate in code reviews and contribute to team knowledge sharing

Requirements:

  • 5+ years of experience in data engineering roles
  • Expert-level proficiency in Databricks (Unity Catalog, Delta Live Tables, Workflows, SQL Warehouses)
  • Strong understanding of cluster configuration, optimization, and serverless SQL compute
  • Advanced SQL skills including query optimization, indexing strategies, and performance tuning
  • Production experience with DBT (models, tests, documentation, macros, packages)
  • Proficient in Python for data engineering (PySpark, pandas, data validation libraries)
  • Hands-on experience with Git workflows (branching strategies, pull requests, code reviews)
  • Proven track record implementing CI/CD pipelines (Jenkins, GitLab CI)
  • Working knowledge of Snowflake architecture and migration patterns

Benefits:

  • Monitoring and analyzing Databricks DBU (Databricks Unit) consumption and cloud infrastructure costs
  • Implementing cost optimization strategies including cluster right-sizing, autoscaling configurations, and spot instance usage
  • Optimizing job scheduling to leverage off-peak pricing and minimize idle cluster time
  • Establishing cost allocation tags and chargeback models for different teams and projects
  • Conducting regular cost reviews and providing recommendations for efficiency improvements