People Matter

Site Reliability Engineer - Architect/Principal

Ayasdi

Ayasdi

Software Engineering, IT
India · Bengaluru, Karnataka, India
Posted on Feb 18, 2026
Introduction

SymphonyAI is at the forefront of innovation, leveraging cutting-edge artificial intelligence and
machine learning technologies to transform industries and drive business growth. As a global leader in
AI-powered solutions, we empower organizations to harness the full potential of data-driven insights.
SymphonyAI enterprise applications rapidly deliver transformative business value across retail, CPG,
financial services, manufacturing, media, Enterprise IT (SymphonyAI Summit) and the public sector.
SymphonyAI combines unrivalled AI technology, vertical expertise and industry-specific data and
insights into applications that drive the highest value for customers. We are one of the largest and
fastest growing AI portfolios. We are on a mission to build a “World Class Engineering Team” with a
high-performance culture.


Job Description

Our solutions hosted on our Iris Smart Manufacturing platform combines equipment and process domain expertise in Mining & Metals, Oil & Gas, Chemicals & Petrochemicals with the state-of-the-art in data sciences, machine learning and process optimization. The IRIS platform can work in hybrid mode and built using microservices architecture.

We are seeking a highly skilled SRE Architect / Principal to design, implement, and maintain highly available, scalable, and secure systems across cloud and on-premise environments. The ideal candidate will combine deep technical expertise with a strategic mindset to drive reliability, automation, and performance across mission-critical applications. This is a hands-on role with architect-level responsibilities, including mentoring teams, shaping platform reliability practices, and influencing operational strategy.

Roles & Responsibilities:

  • Contribute to the IRIS Platform operations road map and execute planned research and development
  • Lead the design, deployment, and operations of large-scale systems on AWS (EKS) or Azure (AKS), ensuring reliability, scalability, and security.
  • Serve as the principal architect for platform reliability, performance, and disaster recovery strategies.
  • Build and maintain CI/CD pipelines for microservices and containerized applications deployable across EKS/AKS clusters.
  • Implement Infrastructure as Code using Terraform, CloudFormation, or AWS CDK for cloud and Kubernetes environments.
  • Apply SRE best practices, including SLIs, SLOs, error budgets, incident management, and post-mortem analysis.
  • Conduct root cause analysis for production incidents and drive continuous improvement in reliability and operational efficiency.
  • Implement monitoring, logging, and alerting across Kubernetes clusters using Prometheus, Grafana and EFK Stack.
  • Optimize platform performance, scalability, and cost in cloud and hybrid environments.
  • Design, implement, and optimize monitoring and alerting systems using Prometheus & Grafana
  • Mentor junior engineers and act as a technical authority on reliability, cloud architecture, and DevOps practices.
  • Ensure compliance with security, governance, and operational standards across all deployments.

Mandatory Skills & Experience

  • 7+ years of experience in Site Reliability Engineering (SRE).

  • 3+ years of hands-on experience working with Linux systems.

  • 4+ years of commercial experience with Kubernetes.

  • 2+ years of experience working with Docker.

  • 4+ years of experience setting up and managing CI/CD pipelines.

  • 4+ years of experience working with automation tools such as Terraform and Ansible.

  • Experience with containerization technologies, including Helm, and CI/CD pipelines.

  • Good knowledge of security best practices and vulnerability management tools such as Acunetix, Snyk, CheckMarx, or Trivy.

  • Experience troubleshooting production issues and performing root cause analysis.

  • Ability to work effectively in an Agile environment.

Desirable:

  • Good working or operational knowledge of databases (Postgres, Elastic search, Redis or similar)
  • Exposure to configuring web servers such as Nginx
  • Working knowledge of monitoring tools such as Grafana and Prometheus
  • Working knowledge of a messaging framework such as Event Hub, Kafka, RabbitMQ or similar

About Us

Why Join SymphonyAI?
• Opportunity to work with cutting-edge AI technology in a dynamic and innovative environment
• Collaborative and inclusive company culture
• Competitive compensation and benefits package
• Professional growth and development opportunities