Senior Software Engineer, ML Platform & Ops
World
Location
Munich
Employment Type
Full time
Location Type
On-site
Department
AI & Biometrics
About the Company:
World is a network of real humans, built on privacy-preserving proof-of-human technology, and powered by a globally inclusive financial network that enables the free flow of digital assets for all. It is built to connect, empower, and be owned by everyone.
This opportunity would be with Tools for Humanity.
About the AI & Biometrics Team:
The AI & Biometrics team is building a biometric recognition system that can work reliably with more than a billion users and enables them to claim their free share of WLD. We use cutting-edge machine learning models deployed on custom hardware to enable high-quality image acquisition, identification, and fraud prevention, all while requiring minimal user interaction.
We are building a biometric recognition and fraud detection engine that works on the 1bn people scale. Therefore, its performance needs to out-perform all the current recognition technologies. We leverage our powerful custom-made iris recognition and presentation attack detection device, the Orb, combined with the latest research from the field of AI and Deep Learning.
About the Opportunity:
To reach our next milestone—continuous, trustworthy ML innovation across millions of edge devices—we’re hiring a Senior ML Platform & Ops Engineer to own the ML lifecycle from data to device. You’ll design and operate production-grade pipelines that transform state-of-the-art ML research into deployed models with clear telemetry, rollback, and reproducibility. If you thrive on building self‑service platforms that turn research ideas into reliable, observable production systems, we’d love to meet you.
This role is onsite and sits in our Munich office
What you’ll own & drive
End‑to‑end model lifecycle pipelines that take a model from pull‑request to device in minutes
Edge‑aware rollout services with staged deployment, A/B experimentation and instant rollback across Orbs, Orb Mini and Mobile Apps.
A model and dataset registry with lineage, reproducibility, and diffing across versions.
Data provenance and governance tooling that lets researchers trace every model back to the exact data slice, feature set and config used.
Developer experience tooling (CLI, SDKs, notebooks, dashboards) to allow ML engineers to self-serve training jobs, schedule retraining, evaluations and labelling campaigns
Future platform roadmap and culture: define ML platform best practices, standards, mentor engineers and help grow the ML Platform discipline at TFH.
Day‑to‑day responsibilities
Design, build, and operate reliable, observable infrastructure for training, evaluation, telemetry ingestion, and deployment.
Maintain CI/CD workflows and OTA pipelines for secure rollout of models to cloud and edge devices.
Develop secure APIs and backend services that expose governed datasets and model artefacts at scale.
Implement automated checks, drift detection, and alerting for real-time model monitoring.
Champion best practices in data lineage, reproducibility, privacy‑by‑design, security and secure edge delivery.
Collaborate across ML research, product, and firmware teams to streamline delivery and feedback loops.
About You:
5+ years building ML infrastructure, data platforms, or production ML systems at scale.
Track record of delivering platforms and CI/CD pipelines used daily by ML or data teams.
Hands-on experience running large-scale training on multi-tenant GPU clusters to maximize throughput and reliability.
You’ve built versioned dataset & lineage systems with slice-level provenance and governed access, making every model reproducible to the exact data, features, code, and config used.
Deep understanding of containerisation (Docker) and orchestration (Kubernetes/EKS) plus Infrastructure‑as‑Code (Terraform/CDK/Cloudformation).
Strong backend engineering skills in Python and/or Go; you value clean, maintainable code.
Deep understanding of modern CI/CD, model packaging, and observability practices.
Comfortable operating production systems, defining SLAs, and handling rollout or incident workflows.
Nice to Have:
Knowledge of Rust for high‑performance services.
Understanding of OTA (over-the-air) workflows, hardware/software integration, or embedded systems.
--
By submitting your application, you consent to the processing and internal sharing of your CV within the company, in compliance with the GDPR
Pay transparency statement (for CA and NY based roles):
The reasonably estimated salary for this role at TFH in our San Francisco office ranges from $193,000 - $240,000 plus a competitive long term incentive package. Actual compensation is based on factors such as the candidate's skills, qualifications, and experience. In addition, TFH offers a wide range of best in class, comprehensive and inclusive employee benefits for this role including healthcare, dental, vision, 401(k) plan and match, life insurance, flexible time off, commuter benefits, professional development stipend and much more!
The reasonably estimated salary for this role at TFH in Munich ranges from €138,000 - €160,000, plus a competitive long term incentive package, and may include variable compensation. Actual compensation is based on factors such as the candidate's skills, qualifications, and experience.