People Matter

ML Infrastructure Engineer

Scaled Cognition

Scaled Cognition

Software Engineering, Other Engineering, Data Science
United States
Posted on Tuesday, June 25, 2024

Scaled Cognition is developing a new generation of rational, controllable AI models deployable as domain experts for grounded, real-world applications.

As a ML Infrastructure Engineer at Scaled Cognition you will:

  • Design and develop the GPU infrastructure that powers our AI models.
  • Ensure that our infrastructure can scale.
  • Collaborate with research scientists and product engineers to streamline the end-to-end process from model development to production deployment.

Example projects could include:

  • Automated GPU Provisioning: Design an automated provisioning system to dynamically allocate GPUs based on workload demands.
  • Benchmarking Pipeline: Develop a pipeline for running benchmarking experiments, enabling comparative analysis of model performance.
  • Experiment Analysis: Build monitoring and analysis tools to track experiment progress, monitor performance metrics, and visualize results.

You might be the right person for the job if you:

  • Are a continuous learner and are eager to explore new tools and technologies.
  • Thrive in a dynamic environment and are comfortable adapting to changing priorities while maintaining a focus on delivering high-quality solutions.
  • Have successfully navigated projects with significant product and technical ambiguity, and you excel at the intersection of complex technical challenges and user-focused solutions.

Preferred Qualifications:

  • Prior experience designing and implementing GPU infrastructure.
  • Experience with ML Ops tools such as MLflow, TensorBoard, or Weights & Biases.
  • A strong sense for scalability and developing secure, highly reliable environments.