Lead Data Engineer

Posted 3ds ago

Employment Information

Education
Salary
Experience
Job Type

Report this job

Job expired or something wrong with this job?

Job Description

Lead Data Engineer at Mercator.ai building scalable data infrastructure and distributed data pipelines. Ensuring performance and reliability while mentoring junior engineers.

Responsibilities:

  • Lead the architecture and evolution of scalable, distributed data pipelines, ensuring high availability and performance at scale
  • Design and implement robust data models to support reporting and advanced data applications
  • Build and maintain distributed web scraping systems using tools such as Playwright, Selenium, and BeautifulSoup
  • Develop systems capable of handling anti-scraping measures, proxy rotation, and high-volume data extraction
  • Integrate AI and LLMs into engineering workflows for code generation, automation, and optimization
  • Apply prompt engineering techniques to improve data processing, documentation, and troubleshooting
  • Identify and implement system and process improvements to optimize performance and efficiency
  • Manage and scale cloud-based data infrastructure, including data warehouses, object storage, and search systems
  • Deploy and maintain containerized workloads using Kubernetes
  • Implement data quality monitoring and governance processes to ensure accuracy and reliability
  • Mentor junior engineers through code reviews, documentation, and knowledge sharing
  • Communicate technical concepts clearly and provide business context for engineering decisions

Requirements:

  • 5+ years of experience in Data Engineering with a track record of scaling systems
  • Expert proficiency in Python and advanced SQL, including performance tuning and optimization
  • Strong experience with workflow orchestration tools such as Airflow or Prefect and transformation tools such as dbt
  • Proven experience building resilient web scraping systems using Playwright, Selenium, and BeautifulSoup
  • Deep understanding of relational and NoSQL databases including Postgres, MongoDB, and ElasticSearch
  • Experience working with large-scale data systems such as BigQuery
  • Strong proficiency with CI/CD pipelines, Git, and Docker
  • Experience designing and maintaining distributed systems with high availability and fault tolerance