JobHunter AI
Lead Data Engineer
C the Signs
Location
United States
Work Mode
Remote
Type
Full-Time
Sector
Education
First Seen
2026-07-04
Source
himalayas
Remote United States Education Data MEAL HR Deadline Unclear Remote
Job Description
<p>We are seeking a Lead Data Engineer to architect, build, and scale our next-generation healthcare data platform. In this role, you will lead the effort to design robust pipelines, modernize data architecture, and ensure high-quality ingestion and transformation of clinical and operational data. You’ll collaborate closely with product, analytics, clinical informatics, machine learning, and engineering teams to deliver trusted, timely, and compliant insights.</p><p>This is a hands-on leadership role ideal for someone who enjoys setting technical direction while still contributing code and guiding stakeholders through complex healthcare data challenges.</p><h3><strong>Responsibilities</strong></h3><h3>Architecture &amp; Strategy</h3><ul><li>Lead design and evolution of our cloud-native data platform built primarily on Google Cloud Platform, including BigQuery, Cloud Storage, Pub/Sub, Cloud Run, Airflow (Cloud Composer), and Healthcare API.</li><li>Inform strategic decisions around multi-cloud or AWS interoperability when needed.</li><li>Establish data engineering best practices, coding standards, and architectural patterns.</li></ul><h3>Pipeline Development</h3><ul><li>Build scalable ETL/ELT pipelines using dbt for transformations and Airflow for orchestration.</li><li>Develop ingestion pipelines for clinical and administrative data in HL7, FHIR, DICOM, and custom formats.</li><li>Develop ingestion and transformation pipelines to be used for AI/ML development and model training.</li><li>Implement streaming and batch dataflows using Pub/Sub, Dataflow, and serverless compute.</li><li>Support or guide integrations with AWS-based partner systems or AWS-hosted data sources when applicable.</li></ul><h3>Data Modeling &amp; Warehousing</h3><ul><li>Design and maintain BigQuery datasets, semantic layers, and warehouse structures.</li><li>Leverage industry standards such as FHIR resources for canonical healthcare models.</li><li>Provide guidance on data modeling and warehouse best practices across both GCP and AWS ecosystems.</li></ul><h3>Data Quality, Observability &amp; Governance</h3><ul><li>Implement data quality frameworks, automated testing, and monitoring.</li><li>Ensure HIPAA compliance and proper handling of PHI/PII across all pipelines and cloud environments.</li><li>Drive lineage, documentation, metadata governance, and dbt docs adoption.</li></ul><h3>Leadership &amp; Collaboration</h3><ul><li>Partner with analytics, product, clinical informatics, and security teams to deliver high-quality, trustworthy data products.</li><li>Provide oversight and technical direction for multi-cloud data integrations with AWS-based systems or partners.</li><li>Assist in the recruitment and development of junior data engineers</li></ul><h3>Requirements</h3><ul><li>7+ years of data engineering experience; 2–3+ years in a lead or senior technical role.</li></ul><ul><li>Deep, hands-on expertise in GCP, particularly:</li><ul><li>BigQuery</li><li>GCP Healthcare API (FH