Staff Data Engineer

Sarvam AI

Sarvam AI

Software Engineering, Data Science

Bengaluru, Karnataka, India

Posted on May 5, 2026

Location

Bengaluru

Employment Type

Full time

Location Type

On-site

Department

Engineering

About Sarvam

Sarvam is building the bedrock of Sovereign AI for India. The company is developing India's full-stack sovereign AI platform, building across research, models, infrastructure and applications with a singular focus on making AI genuinely work for India. Sarvam works with leading enterprises and public institutions and is backed by Lightspeed, Peak XV, and Khosla Ventures. Sarvam partners with India's leading brands, including Tata Capital, SBI Life, CRED, IDFC, and LIC.


The Role

We're hiring a Staff Engineer to design and build Sarvam's Data & Analytics Platform from the ground up. This is a high-ownership, high-leverage role — you will be the technical owner of a system that every Sarvam product writes to and every Sarvam customer reads from.

The platform has three jobs:

1. Ingest every meaningful event from every Sarvam product — outbound calls, agent turns, user turns, knowledge-base lookups, lead creation, deployments, model invocations, and more — through a clean API layer that handles auto schema evolution and dynamic table creation per event type.

2. Replicate product databases into the analytics store via CDC, so operational data stays queryable alongside event streams without putting load on transactional systems.

3. Expose the data through a tenant-aware query API and a no-code dashboarding layer — powering customer-facing product analytics, internal BI, and the source-of-truth feed for our finance team's billing pipeline.

You will own architecture, build the core systems hands-on, set the engineering bar, and shape the team that grows around this platform. Databases, Kafka, and internal stores are never exposed directly — every read and write goes through APIs you design.

What you'll build

- A high-throughput event ingestion API in Go that accepts arbitrary event payloads, validates them, and lands them in ClickHouse with automatic table creation, column addition, and safe schema evolution as event shapes change.

- CDC pipelines using Debezium and Kafka to mirror product Postgres/MySQL databases into ClickHouse, with deduplication, ordering guarantees, and replay tooling.

- A multi-tenant query API with strict RBAC, per-tenant isolation, query budgets, and the same surface powering customer-facing dashboards, internal analytics, and the finance team's billing extracts.

- A no-code dashboard layer that lets customers build product-analytics views without writing SQL — and lets internal teams ship dashboards in hours, not days.

- The operational backbone: ClickHouse cluster design (sharding, replication, MergeTree families, materialized views), capacity planning, cost controls, observability, on-call playbooks, and SLOs.

- The engineering culture for this team — design reviews, RFC process, testing standards, and the bar that future hires will be measured against.

What we're looking for

Experience

- 5+ years of backend / data-platform engineering, with a meaningful chunk spent designing and operating analytics systems in production.

- Track record of building 0-to-1 platforms that other engineering teams depend on.

Must-haves

- ClickHouse in production at scale — sharding, replication, MergeTree variants, materialized views, projection design, query tuning, and operational know-how (backups, upgrades, capacity planning).

- Golang for high-throughput backend services — strong grasp of concurrency, performance, and API design (REST/gRPC).

- Kafka as a primary streaming backbone — partitioning strategy, consumer groups, exactly-once / idempotency patterns, schema registry.

- Debezium-based CDC — you've built and operated CDC pipelines from transactional databases into analytical stores.

- Strong systems instincts: you reason about throughput, latency, cost, and failure modes before you write code.

Nice-to-haves

- Spark + Airflow for batch enrichment, backfills, and scheduled aggregations.

- Experience building multi-tenant SaaS platforms with row-level security, per-tenant quotas, and tenant-aware RBAC.

- Worked on or integrated with BI / dashboarding tools (Superset, Metabase, Cube, or similar) — bonus if you've built embedded customer-facing analytics.

- Exposure to billing / metering data feeds where correctness is non-negotiable.

You'll thrive here if you

- Want to be the technical owner of a foundational platform, not a feature contributor on someone else's system.

- Like writing code as much as writing design docs — and refuse to do one without the other.

- Care about cost-per-query and dollar-per-event, not just p99 latency.

- Are comfortable making decisions with imperfect information, and revisiting them when the data changes.

- Want to work on-site in Bengaluru with a small team of senior builders, shipping fast.