Senior Software Engineer, GPU
Velo3D
Job Responsibilities
Evaluate and select the appropriate GPU computing technologies and frameworks (e.g., CUDA, Kokkos, or other modern GPU programming models) based on performance, portability, maintainability, and long-term architectural goals.
Design and implement the GPU computing layer within our desktop software stack, introducing GPU acceleration for computationally intensive workloads.
Integrate GPU development into the existing build and tooling ecosystem, including configuring the build system, dependency management, CI/CD workflows, and developer tooling to support GPU targets.
Port and optimize mesh processing algorithms and other performance-critical components from CPU implementations to GPU-accelerated implementations.
Analyze performance bottlenecks and apply GPU optimization techniques such as memory layout optimization, kernel design, and efficient data transfer between CPU and GPU.
Establish best practices, documentation, and architectural guidelines for GPU development to enable maintainable and scalable use of GPU acceleration across the codebase.
Collaborate with other engineers to identify additional opportunities for GPU acceleration and ensure seamless integration with the broader application architecture.
Requirements
- 5-8 years of experience
Strong experience developing GPU-accelerated software using frameworks such as CUDA, Kokkos, OpenCL, or similar technologies.
Solid understanding of GPU architecture and parallel programming concepts, including memory hierarchies, kernel execution models, synchronization, and performance optimization.
Experience evaluating and comparing different GPU programming models and frameworks, and making informed technical decisions about trade-offs such as performance, portability, and developer productivity.
Experience integrating GPU tooling and compilers into modern build systems and development environments.
Strong C++ programming skills and experience working in performance-sensitive codebases.
Ability to translate CPU algorithms into efficient parallel GPU implementations.
Strong problem-solving and performance-analysis skills, including profiling and debugging GPU code.