Site Reliability Engineer (Events SRE Team)
RingCentral
Say hello to opportunities.
RingCentral is the industry leader in Unified Communications as a Service, and routinely outperforms UCaaS competitors such as 8x8 and Fuze, as well as legacy competitors such as Cisco and Broadsoft. We are the largest and fastest growing pure play cloud communications company in the world, and we’re looking for exceptional talent to help continue our success.
About RingCentral Events
RingCentral Events is a robust, all-in-one platform for creating and managing professional, engaging events for any audience. It provides a complete solution that simplifies the entire event lifecycle - from planning, promotion, and live execution to post-event analytics. The platform's broad set of features supports virtual, hybrid, and in-person events, ensuring a seamless experience whether the audience is online, in-person, or a mix of both.
Position Overview
As a Site Reliability Engineer for RingCentral Events, you're not just an infrastructure owner - you're a crucial part of our mission to deliver flawless, high-scale experiences for global audiences. Your role is central to our ability to deliver a reliable and performant platform. You will be a key contributor to our software delivery flow, ensuring that changes move from development to production with speed, safety, and consistency. Additionally, you will proactively eliminate observability gaps and build a self-healing infrastructure to ensure our system performs under pressure.
Responsibilities:
Manage cloud infrastructure on AWS and EKS, leveraging IaC and GitOps to ensure scalability
Participate in service capacity planning, software performance analysis, and system tuning
Design, consult, re-platform, and re-factor the observability of current cloud infrastructure
Participate in release management, working closely with engineering teams to bring GitOps principles to our release process and manage CI/CD pipelines using GitLab CI
Take part in 24/7 on-call responsibilities (~2 days/month based on rotation schedule) to ensure continuous availability and quick response to issues in production
Conduct blameless post-mortems to learn from incidents and prevent future ones
Develop and test disaster recovery plans and runbooks to ensure business continuity
Implement security best practices and controls within the infrastructure to meet compliance standards and prepare for audits
Requirements:
Experience running mission critical services at scale without disruption
Hands-on experience with Kubernetes and infrastructure as code (IaC) using Terraform, focusing on infrastructure management and scalability
Experience maintaining pipelines using tools like Gitlab CI/CD
Experience with monitoring, APM, logging, and analytics tools
Strong problem-solving skills and an ability to analyze and debug complex distributed systems. Troubleshoot from the kernel to the web, tracing requests and data flow through multiple services to pinpoint the root cause of issues
Have a sense for identifying, exploiting and elevating bottlenecks
Prefer taking iterative action over waiting for things to happen or to be perfect
Familiarity with incident, problem and change management procedures and practices
Nice to have:
A reliability-oriented mindset with a focus on designing and building resilient architectures
Previous SRE experience or knowledge, giving you a heightened awareness of what data to collect, how to display it, and how users can benefit from it
Knowledge of scripting languages such as Python or Go
Familiarity with GitOps principles and tools like ArgoCD
Knowledge of caching mechanisms, such as Redis
Experience with messaging queues like MSK Kafka, SQS or RabbitMQ
Familiarity with database management systems like AWS Aurora and PostgreSQL
We offer:
Well-coordinated professional team;
Cutting edge technologies, interesting and challenging tasks, dynamic project, great opportunities for self-realization, professional and career growth;
Additional Health and Life Insurance Package;
25 vacation days;
200 BGN food vouchers per month
120 BGN gross to be paid with the salary for working expenses allowance.
About RingCentral
RingCentral, Inc. (NYSE: RNG) is a leading provider of business cloud communications and contact center solutions based on its powerful Message Video Phone™ (MVP™) global platform. More flexible and cost effective than legacy on-premises PBX and video conferencing systems that it replaces, RingCentral® empowers modern mobile and distributed workforces to communicate, collaborate, and connect via any mode, any device, and any location. RingCentral is headquartered in Belmont, California, and has offices around the world.
RingCentral is an equal opportunity employer that truly values diversity. We do not discriminate on the basis of race, religion, color, national origin, gender, sexual orientation, age, marital status, veteran status, or disability status.