Data Engineer – Mid-level

Posted 14ds ago

Employment Information

Education
Salary
Experience
Job Type

Report this job

Job expired or something wrong with this job?

Job Description

Data Engineer designing and building scalable ETL/ELT pipelines working remotely. Collaborating with cross-discipline teams to implement data quality and governance frameworks.

Responsibilities:

  • Design and build scalable ETL/ELT data pipelines using dbt, PySpark and other modern transformation tools.
  • Develop and maintain data ingestion pipelines for GenAI workloads, including document processing, chunking and embedding workflows.
  • Orchestrate workflows using Airflow, Dagster or cloud-native orchestration tools.
  • Plan and execute data migration projects, including source data analysis, schema mapping, validation and rollback strategies.
  • Implement Change Data Capture (CDC) solutions using industry-standard tools.
  • Build and maintain data quality frameworks with automated tests and validations.
  • Ensure data governance, security and compliance, including proper handling of PII (personally identifiable information) and enforcement of RBAC (role-based access control) policies.
  • Collaborate with AI Engineers and Full-Stack Developers to support RAG pipelines and GenAI-based applications.
  • Apply event-driven architecture concepts to design scalable and reliable data processing solutions.

Requirements:

  • Proven experience developing and deploying production-scale data pipelines.
  • Strong proficiency in Python, PySpark and advanced SQL (window functions, CTEs, performance optimization).
  • Hands-on experience with data migration projects.
  • Experience with at least one major cloud platform (AWS, Azure or GCP).
  • Experience with Databricks, AWS data services or Microsoft Fabric for pipeline development.
  • Experience with modern data warehouses such as Snowflake, BigQuery, Redshift or Databricks.
  • Experience with relational databases (PostgreSQL, MySQL) and NoSQL databases (MongoDB, DynamoDB).
  • Experience with data migration tools for on-premises or cloud environments (e.g., SSIS).
  • Practical experience with Apache Spark / PySpark and workflow scheduling (AWS Glue or similar).
  • Familiarity with Infrastructure-as-Code and containerization tools (Terraform, Docker).
  • Experience with CI/CD pipelines (preferably GitHub Actions).
  • Strong knowledge of data modeling (Star Schema, Data Vault, Dimensional Modeling).

Benefits:

  • Remote work