Site Reliability Engineer
At-Bay combines world-class technology with industry-leading insurance to help clients meet risk head-on. Partnering with brokers and business owners alike, we provide modern insurance products and active risk monitoring services for companies of every size and in every industry. Our team boasts many backgrounds and skills, from analysts and developers to designers and underwriters, and everything in between — all working together to redefine what it means to be an insurance company.
We’re proud to be a diverse company and to have expertise from multiple industries driving our culture. At-Bay is expanding rapidly, and as we grow, we’re prioritizing inclusive hiring practices and supportive team environments. We’re committed to building a company culture where people of all identities and backgrounds are empowered to thrive, develop their career, and bring their full self to work.
At-Bay is a globally distributed company with hubs in Atlanta, New York City, San Francisco, and Tel Aviv. To date, we have raised $292 million in funding from Acrew Capital, Glilot Capital, Icon Ventures, ION Crossover Partners, Khosla Ventures, Lightspeed Venture Partners, M12, entrepreneur Shlomo Kramer, and Qumra Capital.
What we are looking for
A SRE is ultimately responsible for system reliability, developer productivity and reducing time to market by striving to reduce technical debt of the services on SRE supports. We seek managers who are passionate about system reliability to influence and drive the strategic SRE mission.
We are looking for a SRE who is able to successfully establish and build SRE posture. The successful candidate will have a passion for technology and a commitment to delivering high-quality results. They must be able to work independently and in a team environment.
- Establish and build SRE operations, including monitoring, alerting and stability.
- Develop, implement, and maintain SRE processes, standards, and policies.
- Develop, Evaluate, and implement tools and systems to maintain service reliability.
- Measure and report performance metrics for SRE.
- Ensure performance and scalability of SRE services.
- Identify and solve performance and operational issues in collaboration with the engineering team.
- Build dev to production process process such upgrades and new capabilities
- Work with different teams for dashboarding and alerting standardization.
Attributes and Qualifications
- 4+ years’ experience in SRE operations.
- Experience with DevOps and CI/CD processes.
- Knowledge of SRE best practices.
- Experience with monitoring and logging tools. Sumologic - Advantage
- Working with multiple k8s workloads
- Strong knowledge of working with AWS services
- Strong problem-solving skills.
- Excellent communication and interpersonal skills.
- Provide technical guidance and support to other technical teams.
- Identify and implement automation opportunities for SRE processes.
- Implement and maintain monitoring solutions for SRE services.
What you'll get
- A competitive salary, and equity in a super fast growing company, taking over commercial insurance
- A strong emphasis on work-life balance
- Beautiful offices in the heart of Tel Aviv, near the train station and main bus stops
- Passionate, smart, and fun people to work with
- You will never lack a challenge, we are a unique blend of a fast growing tech startup, an international firm, and an insurance company