Kubernetes Systems Engineer, EngProd
Arista Networks
Company Description
Arista Networks is an industry leader in data-driven, client-to-cloud networking for large data center, campus and routing environments. What sets us apart is our relentless pursuit of innovation. We leverage the latest advancements in cloud computing, artificial intelligence, and software-defined networking to provide our clients with a competitive edge in an increasingly interconnected world. Our solutions are designed to not only meet the current demands of the digital landscape but to also anticipate and adapt to future challenges.
At Arista we value the diversity of thought and perspectives that each employee brings to the table. We believe that fostering an inclusive environment, where individuals from various backgrounds and experiences feel welcome, is essential for driving creativity and innovation.
Our commitment to excellence has earned us several prestigious awards, such as Best Engineering Team, Best Company for Diversity, Compensation, and Work-Life Balance. At Arista, we take pride in our track record of success and strive to maintain the highest standards of quality and performance in everything we do.
Job Description
Who You'll Work With
Arista Networks is looking for a skilled professional for our Engineering Productivity team to help maintain and support our rapidly expanding infrastructure and internal user base. The ideal candidate is someone who can wear many hats, can be versatile and is enthusiastic about learning new technologies. As a part of the software engineering team, you will work with other team members to design, build and administer secure, scalable and fault-tolerant tools and infrastructure in a hybrid cloud environment.
Working in the Engineering Productivity (EngProd) group, you will collaborate and work with other engineers to design, build, scale, and operate the systems that the rest of Arista’s development teams use. The EngProd team uses industry-standard systems like Ansible, Jenkins, Kubernetes, Grafana, Spinnaker, MySQL, ElasticSearch, Google Cloud, and Varnish and also internal systems that we’ve built from the ground-up to automate CI/CD, testing, analysis, and visualization.
What You'll Do
- Work with existing k8s admin team to own different aspects of managing a production k8s cluster (eg: upgrades, monitoring, capacity planning, security, developer experience etc)
- Proactively monitor, respond to, and enhance alerts and set up automated alert handling where applicable
- Create and maintain the incident response runbooks working with the service dev teams
- Debug and resolve issues impacting developer user experience and infrastructure stability around the k8s platform
- Adopt current best practices in k8s cluster management. Evaluate and adopt OSS projects that simplify k8s cluster management.
- Set up guidelines and paved paths for service dev teams improving developer experience around the k8s platform.
- Work with Arista’s software engineers to identify bottlenecks and limitations in our workflows, tooling, and infrastructure around k8s and provide fixes for those problems.
- Engage with 3rd party vendor support as part of triage
Qualifications
- At least BSc Computer Science or Engineering + 3 years’ experience, MS Computer Science or Engineering + 2 years’ experience, or Ph.D. in Computer Science or equivalent work experience.
- Knowledge of one or more of Go, Python, Javascript. Experience with shell Scripting to be able to implement medium complexity automation workflows.
- Knowledge of Linux (or UNIX).
- Experience in operating software systems at scale.
- Strong understanding of the fundamentals of storage and networking.
- Comfortable with Ansible and GitOps.
- Strong expertise with managing on-prem/baremetal Kubernetes clusters.
- Applied understanding of software engineering principles.
- Strong problem solving and software troubleshooting skills.
- Ability to design a solution and implement features independently. Ability to work in small teams.
- Comfortable with security principles and able to study source code of OSS projects, conduct experiments as necessary to debug issues.
- Proven expertise with debugging complex issues that span the technology stack.
- Experience dealing with network proxies and containerized storage.
#LI-SZ1