Senior Data Engineer
Posted 3hrs ago
Employment Information
Report this job
Job expired or something wrong with this job?
Job Description
Senior Data Engineer at Ceresti designing end-to-end data architecture to improve dementia care outcomes. Collaborating with cross-functional teams and ensuring data quality for healthcare solutions.
Responsibilities:
- Design and own Ceresti’s end-to-end data architecture: a landing zone with secure cloud object storage for raw partner files and API payloads, validated ingestion pipelines into our transactional Postgres, and a curated analytics layer that decouples reporting and AI workloads from production
- Build ingestion pipelines for the data we receive today, including partner data files (CSV/JSON/XML/HL7/X12 as applicable) and REST/SFTP API integrations with schema validation, quarantine of bad records, and full lineage from raw bytes to curated row
- Stand up and operate the curated layer (data warehouse / lakehouse-lite) so analytics and ML models can consume data without slowing down the transactional system
- Choose, integrate, and operate the smallest set of tools needed, including object storage, an orchestrator (Dagster, Prefect, Airflow, etc.), dbt or similar for transformations, a single validation library (Great Expectations / Pandera / Soda)
- Design and enforce data governance for a HIPAA-regulated environment: PHI/PII classification, encryption in transit and at rest, role-based access, audit logging, retention and minimum-necessary policies, and de-identification where appropriate
- Partner with backend, ML, product, and clinical stakeholders to define data contracts with our health plan and ACO partners and hold the line on data quality
- Build and maintain reliable feature data for ML models, including embeddings (e.g., pgvector) and curated feature tables for risk stratification, engagement, and outcomes work
- Instrument the data platform for observability including pipeline SLAs, data freshness, schema drift, quality metrics, and act on what the data tells you
- Participate fully in our Agile process: backlog grooming, sprint planning, demos, and retrospectives
- Mentor engineers across the team on SQL, schema design, and the craft of building data systems that are boring in the best possible way
Requirements:
- BS/BA degree or higher in Computer Science, Engineering, or a related technical field
- 8+ years of professional data engineering experience, with a track record of shipping production data systems end-to-end
- Mastery of PostgreSQL: schema design, indexing, query tuning, partitioning, logical replication, JSONB, extensions (pg_partman, pg_cron, pgvector, etc.), and operating Postgres at scale
- Strong experience designing and operating data pipelines, including file-based ingestion (SFTP / object storage drops) and API-based ingestion (REST, webhooks)
- Hands-on experience with one or more cloud platforms (AWS preferred) and their data primitives: object storage (S3), managed Postgres
- Experience designing data warehouses and/or data lakes and the judgment to know which one a given problem actually needs
- Strong experience with dbt (or equivalent SQL-based transformation framework) and modern data modeling patterns (Kimball dimensional, Data Vault, One Big Table — and an opinion about when each is right)
- Experience with at least one orchestration framework (Dagster, Prefect, or Airflow) and a clear point of view on which to use when
- Strong Python skills for ingestion, validation, and tooling
- Experience with data validation and data-quality frameworks (Great Expectations, Pandera, Soda, or equivalent)
- Experience with change-data-capture from Postgres (logical replication, or equivalent)
- Data governance experience in a HIPAA-regulated environment or, at minimum, demonstrated instincts for protecting PHI and PII (encryption, least privilege, audit, de-identification, BAA-aware vendor selection); HITRUST or SOC 2 experience is a strong plus
- Comfortable with infrastructure-as-code and CI/CD for data systems
- Experience supporting ML workloads: building feature tables, managing training data, serving features at inference time; familiarity with embeddings, vector search (pgvector or equivalent), and LLM integration patterns (RAG, prompt-grounded analytics) is a plus
- Excellent written and verbal communication skills: you can explain a tricky schema decision to a business stakeholder and a data contract to a partner with equal clarity
- Demonstrated experience working in Agile/Scrum teams
Benefits:
- Competitive salary and benefits package
- Opportunities for professional growth and development
- Collaborative and dynamic work environment
- Flexible work arrangements and remote work options
- Access to cutting-edge technologies and tools




















