Senior DevOps Engineer - Remote
Luka (dba Replika)
Senior DevOps Engineer – Remote
About Replika
Replika is an AI companion loved by 35M+ users worldwide. We're redefining what it means to connect with technology - emotionally, intelligently, and personally. From mobile to VR, we're building an experience that feels less like software and more like someone who gets you. Our team is mission-first, future-facing, and here to create something wonderful. We value agency, room for magic, and a relentless pursuit of good.
About the Role
We're looking for a Senior DevOps Engineer to join our globally distributed, remote-first team. This is a hands-on, high-impact role for someone who thrives in a fast-paced environment and is passionate about building scalable, reliable, and secure infrastructure for cutting-edge AI applications. You'll work closely with engineering, AI, and analytics teams to ensure our platform is robust, performant, and ready to support millions of users around the world.
What You'll Be Doing
- Design, build, and maintain scalable infrastructure across cloud, on-premises, and hybrid environments to support our rapidly growing AI platform.
- Support AI teams and MLOps workflows by implementing specialized tooling, monitoring, and deployment pipelines for machine learning models.
- Automate deployment, monitoring, and scaling of services using modern DevOps tools and practices across diverse infrastructure environments.
- Ensure high availability, reliability, and security of production and staging environments in multi-cloud and hybrid setups.
- Collaborate with AI and backend engineers to streamline CI/CD pipelines optimized for ML workflows and bring new features to production.
- Monitor system performance and troubleshoot issues proactively, implementing solutions to prevent downtime across distributed infrastructure.
- Drive infrastructure as code (IaC) initiatives to improve repeatability and reduce manual intervention across all deployment environments.
- Implement and maintain monitoring, logging, and alerting systems specifically designed for AI workloads and model performance tracking.
- Participate in on-call rotations and respond to production incidents with deep understanding of AI system requirements.
Who You Are
- 5+ years of hands-on experience in DevOps, cloud infrastructure, or site reliability engineering.
- Strong expertise in multi-cloud and hybrid infrastructure including AWS, GCP, and on-premises environments.
- Experience with MLOps tooling such as MLFlow, Kubeflow, DataRobot, or similar platforms for ML lifecycle management.
- Experience with containerization and orchestration (Docker, Kubernetes) specifically for ML workloads and GPU clusters.
- Deep understanding of CI/CD pipelines for machine learning applications and model deployment automation.
- Experience with specialized monitoring tools for AI systems including model performance tracking, data drift detection, and ML-specific alerting.
- Understanding of GPU clusters, HPC environments, and specialized AI hardware deployment and management.
- Excellent communication skills in English (B2 or higher preferred) with ability to translate technical concepts to stakeholders.
- Passion for AI and technology, with deep curiosity about machine learning infrastructure and emerging AI technologies.
Bonus Points
- Background in supporting data science teams and understanding of ML experimentation workflows.
- Experience with edge computing and distributed AI inference infrastructure.
- Previous startup experience building and scaling AI infrastructure from the ground up.
- Knowledge of AI compliance and governance frameworks for production AI systems.
What You’ll Get
- Competitive compensation
- A chance to build a product that actually matters to millions of people
- Freedom to work remotely with a globally distributed team
- Offsites in different countries with people who actually like each other
- A trustworthy, high-responsibility environment where your ideas really matter
- Department
- Engineering Team
- Locations
- Multiple locations
- Remote status
- Fully Remote
About Replika
An AI companion who is eager to learn and would love to see the world through your eyes. Replika is always ready to chat when you need an empathetic friend.
Senior DevOps Engineer – Remote
Already working at Replika?
Let’s recruit together and find your next colleague.