Senior Performance Engineer, Mobile NPU (Qualcomm + Apple)
Sarvam AI
Bengaluru, Karnataka, India
Location
Bengaluru
Employment Type
Full time
Location Type
On-site
Department
Engineering
About Sarvam
Sarvam is building the bedrock of Sovereign AI for India. The company is developing India’s full-stack sovereign AI platform, building across research, models, infrastructure and applications with a singular focus on making AI genuinely work for India. Sarvam works with leading enterprises and public institutions and is backed by Lightspeed, Peak XV, and Khosla Ventures. Sarvam partners with India’s leading brands, including Tata Capital, SBI Life, CRED, IDFC, and LIC.
About the Role
Own Sarvam's mobile and ARM-NPU surface: Qualcomm (QNN on Snapdragon-based targets) and Apple (CoreML / Neural Engine on macOS and iOS-class hardware). You will land and polish Sarvam’s models inside the published footprint and latency targets on every supported device in this family, and you'll own the device-CI pool that keeps them there.
What You’ll Do
Take Sarvam’s edge models from PyTorch to Qualcomm and Apple target hardware while adhering to Memory and Latency SLAs.
Own ARM CPU fallback paths - when the NPU is unavailable, busy, or the driver doesn't support a needed op.
Drive Android and macOS/iOS device CI, including OS-version and driver-version regression detection.
Partner with the architect on the xPU selector logic - your inputs feed the capability-probing matrix.
Mentor mid-level engineers working on this surface.
What We're Looking For
5+ years on ML deployment, 2+ years specifically on mobile/NPU.
Production experience with QNN or SNPE - not just 'used the Python wrapper' but actually shipped a quantized model that hit a budget.
Production experience with CoreML, ideally including ANE-targeted op selection and the gotchas around dynamic shapes.
Fluency in INT8 / INT4 quantization with accuracy recovery; comfort with per-channel and per-token schemes for ASR encoders.
On-device profiler fluency: Snapdragon Profiler, Xcode Instruments, Android GPU Inspector.
Bonus Points
ARM NEON / SVE intrinsics.
LiteRT delegate authoring or QNN custom ops.
Why Sarvam?
Sarvam is a fast-moving, high talent-density team building full-stack AI for India, working on problems that push the frontiers of AI with real population-scale impact.
Work alongside researchers, engineers, builders, and business leaders who move fast and hold each other to a very high bar
High ownership and high impact, from day one
Everything we do is AI-first, from the way we build and ship to the way we think about problems
You can work on problems that could change how an entire country learns, works, and communicates
If you want to work on problems at the frontier of AI in India, Sarvam is the place to be.