People Matter

Senior Software Engineer, Sustaining

Berkshire Grey

Berkshire Grey

Software Engineering
Bedford, MA, USA
Posted on May 23, 2025

About The Job

At Berkshire Grey, our robots run 24/7 in e-commerce and logistics environments. As a Software Sustaining Engineer, you’ll be a go-to expert for keeping our codebase performant in production environments - driving improvements in mean time between failures (MTBF), mean time to recovery (MTTR), and Availability/Uptime. You’ll partner with developers, QA, DevOps and field service to turn production data into actionable fixes, then develop and shepherd patches from code review into customer deployments.

This role is ideal for someone who excels at root-cause analysis, works seamlessly across teams, and is driven to strengthen system reliability.

Responsibilities

  • Lead investigation of field and lab failures; own root-cause analysis and drive fixes
  • Instrument code with metrics/logs; develop health checks and self-healing routines
  • Design, build, test, and deploy hotfixes and maintenance releases
  • Identify recurring issues; propose and implement design or process changes to raise MTBF and lower MTTR
  • Work with development teams to bake reliability into new features; train support teams on diagnostics
  • Maintain clear runbooks; track and report on reliability KPIs

Minimum Qualifications

  • Bachelor’s degree in computer science, or related field
  • 3+ years in software development or reliability engineering
  • Strong coding skills in Python
  • Experience in a fast-paced, agile environment
  • Demonstrated ability to:
    • Investigate and triage production issues end-to-end
    • Analyze logs, metrics, and telemetry to pinpoint root causes
    • Develop fixes or workarounds under tight SLAs
    • Ship stable patches and rollouts with minimal disruption
    • Communicate status and technical tradeoffs clearly to stakeholders
  • Comfortable with:
    • Linux (Ubuntu)
    • Version control (Git)
    • Issue tracking (Jira)

Preferred Qualifications

  • Master’s degree in CS, Robotics, or related field
  • 5+ years solving reliability or sustaining challenges in production systems
  • Familiarity with:
    • Monitoring stacks (Elastic/Kibana, Prometheus/Grafana)
    • Distributed in-code tracing frameworks (OpenTelemetry)
    • Container orchestration (Docker, Kubernetes)
    • Automated test frameworks (pytest, unit/system tests)
  • Hands-on experience with robotic applications or other high-uptime systems
  • Data-driven mindset: profiling, statistics, pandas

Why Berkshire Grey?

  • Opportunity to work with cutting-edge AI-powered robotic solutions that are transforming the supply chain and logistics industry.
  • A culture of innovation and collaboration, with a commitment to professional development and growth.
  • Competitive compensation and comprehensive benefits package.

6111-2503MS