Databricks Developer – AWS

Posted 22ds ago

Employment Information

Education
Salary
Experience
Job Type

Report this job

Job expired or something wrong with this job?

Job Description

Databricks Developer designing and implementing ETL pipelines in cloud environments, focusing on data quality checks and integration with AWS.

Responsibilities:

  • Design and implement scalable ETL/ELT pipelines in Databricks using PySpark/Spark SQL/Scala.
  • Build and manage Delta Lake tables with ACID transactions, schema evolution, time travel, Z-Ordering, and Optimize/Vacuum routines.
  • Develop batch and near-real-time pipelines (e.g., Structured Streaming, Kafka/MSK, Kinesis).
  • Implement robust data quality checks (e.g., expectations/constraints, anomaly detection) and unit/integration tests.
  • Integrate Databricks with Amazon S3 (bronze/silver/gold zones), AWS Glue Data Catalog or Unity Catalog, and Lake Formation where applicable.
  • Configure IAM roles & instance profiles.
  • Orchestrate jobs and workflows using Databricks Workflows and/or AWS Step Functions, Airflow.
  • Implement CI/CD using GitHub; manage repos with Git.
  • Utilize observability: CloudWatch, Databricks audit logs, metrics, cost monitors, and alerting.

Requirements:

  • Bachelor’s in Computer Science, Engineering, or related field (or equivalent experience).
  • 3–7 years in data engineering, including 2+ years hands-on with Databricks on AWS.
  • Strong in Apache Spark (optimizations, joins, partitioning, caching)
  • Solid experience with Delta Lake, S3, Glue Catalog / Unity Catalog, and Lakehouse design.
  • Proficiency with SQL, performance tuning, and cost optimization on Databricks.
  • Familiar with AWS services: S3, IAM, Glue, Lambda, CloudWatch, KMS, Step Functions, MSK/Kinesis, VPC.
  • Version control (Git) and CI/CD for Databricks (e.g., Repos, Databricks CLI, Terraform, GitHub Actions/CodePipeline).
  • Experience in Agile/Scrum delivery.

Benefits:

  • Health insurance
  • Flexible work arrangements