Senior Data Engineer
Kiddom
Data Science
San Francisco, CA, USA
USD 150k-220k / year + Equity
Kiddom’s Content & AI Systems team is building the data layer that powers the next generation of AI-assisted curriculum authoring and content delivery. This role sits at the intersection of data engineering and content systems — owning the pipelines, schemas, and validation frameworks that turn raw curriculum content into structured, AI-ready data products.
This is not a traditional data engineering role. Curriculum content is messy, inconsistent, and deeply domain-specific. You will work closely with Instructional Designers, AI engineers, and the Content Agents team to define data requirements, design schemas, and build the infrastructure that makes AI-powered authoring workflows possible.
You will...
Design and own the schema and data models representing Kiddom’s curriculum content (lessons, activities, standards alignments) for downstream use
Build ingestion pipelines that process content from varied, inconsistent source formats — XML, JSON, PDF-derived, and API-delivered
Develop Python-based parsers, transformers, and validation scripts that enforce schema conformance and content quality at scale
Collaborate directly with Instructional Designers and product teams to translate content authoring workflows into data engineering requirements
Build and maintain embedding and vector database pipelines that feed Kiddom’s AI-powered content features as they scale
Work in Git-based workflows — treating data artifacts with the same rigor as software: versioned, reviewed, and documented
What we're looking for...
4+ years of data engineering experience with strong Python skills — you’ve written parsers, validators, and transformation scripts for real-world messy data
Schema design instincts — you think carefully about how data should be structured for downstream use, not just how to move it
Data quality mindset — you build validation and completeness checks in from the start, not as an afterthought
Cross-functional collaborator — comfortable working with non-engineers to define requirements and translate domain knowledge into data structures
Provisioning and monitoring of infrastructure for data systems, familiarity with IaC tools such as Terraform and Terragrunt
The data system operates, ECS, EKS clusters, provision lambdas and S3 buckets
Bonus:
Background in education, curriculum design, or ed-tech — understanding how instructional content is authored and structured is a genuine differentiator
Experience with vector databases (Pinecone, Weaviate, pgvector) or embedding pipeline tooling
Familiarity with agentic AI patterns or Model Context Protocol (MCP)
150000 - 220000 USD a year