Data Engineer – Mid-level
Posted 14ds ago
Employment Information
Report this job
Job expired or something wrong with this job?
Job Description
Data Engineer designing and building scalable ETL/ELT pipelines working remotely. Collaborating with cross-discipline teams to implement data quality and governance frameworks.
Responsibilities:
- Design and build scalable ETL/ELT data pipelines using dbt, PySpark and other modern transformation tools.
- Develop and maintain data ingestion pipelines for GenAI workloads, including document processing, chunking and embedding workflows.
- Orchestrate workflows using Airflow, Dagster or cloud-native orchestration tools.
- Plan and execute data migration projects, including source data analysis, schema mapping, validation and rollback strategies.
- Implement Change Data Capture (CDC) solutions using industry-standard tools.
- Build and maintain data quality frameworks with automated tests and validations.
- Ensure data governance, security and compliance, including proper handling of PII (personally identifiable information) and enforcement of RBAC (role-based access control) policies.
- Collaborate with AI Engineers and Full-Stack Developers to support RAG pipelines and GenAI-based applications.
- Apply event-driven architecture concepts to design scalable and reliable data processing solutions.
Requirements:
- Proven experience developing and deploying production-scale data pipelines.
- Strong proficiency in Python, PySpark and advanced SQL (window functions, CTEs, performance optimization).
- Hands-on experience with data migration projects.
- Experience with at least one major cloud platform (AWS, Azure or GCP).
- Experience with Databricks, AWS data services or Microsoft Fabric for pipeline development.
- Experience with modern data warehouses such as Snowflake, BigQuery, Redshift or Databricks.
- Experience with relational databases (PostgreSQL, MySQL) and NoSQL databases (MongoDB, DynamoDB).
- Experience with data migration tools for on-premises or cloud environments (e.g., SSIS).
- Practical experience with Apache Spark / PySpark and workflow scheduling (AWS Glue or similar).
- Familiarity with Infrastructure-as-Code and containerization tools (Terraform, Docker).
- Experience with CI/CD pipelines (preferably GitHub Actions).
- Strong knowledge of data modeling (Star Schema, Data Vault, Dimensional Modeling).
Benefits:
- Remote work




















