[Remote] Senior AI Kernel Engineer
Note: The job is a remote job and is open to candidates in USA. Modular is on a mission to revolutionize AI infrastructure by rebuilding the AI software stack. They are seeking a Senior AI Kernel Engineer to lead the design and optimization of high-performance kernels for AI inference on GPUs and custom accelerators, collaborating closely with various teams to enhance performance.
Responsibilities
- Design, implement, and optimize performance-critical kernels for AI inference workloads (e.g., GEMM, attention, communication, fusion)
- Lead kernel-level optimization efforts across single-GPU, multi-GPU, and heterogeneous hardware environments
- Make informed trade-offs between latency, throughput, memory footprint, and numerical precision
- Drive adoption of new hardware features (e.g., Tensor Cores, asynchronous execution, advanced memory spaces)
- Analyze performance using profilers, hardware counters, and microbenchmarks; translate insights into concrete improvements
- Work closely with compiler and runtime teams to influence code generation, scheduling, and kernel fusion strategies
- Review and mentor other engineers on kernel design, performance tuning, and best practices
- Contribute to technical roadmaps and long-term performance strategy for AI inference
Skills
- 5+ years of experience in performance-critical systems or kernel development (or equivalent depth of expertise)
- Strong proficiency in C/C++ and low-level programming
- Extensive hands-on experience with GPU kernel programming (CUDA, HIP, or equivalent)
- Deep understanding of GPU architecture, including memory hierarchies, synchronization, and execution models
- Proven track record of delivering measurable performance improvements in production systems
- Strong problem-solving skills and ability to work independently on complex, ambiguous performance challenges
- Experience with PTX, assembly-level tuning, or code generation frameworks (e.g., Triton)
- Experience optimizing distributed or multi-GPU inference pipelines
- Familiarity with custom AI accelerators or domain-specific hardware
- Understanding of modern AI models (e.g., transformers, LLMs, diffusion) from a systems and performance perspective
- Contributions to open-source kernel libraries, compilers, or performance tools
- Experience collaborating directly with hardware or compiler teams
Benefits
- Premier insurance plans
- Up to 5% 401k matching
- Flexible paid time off
- Stock options
- Annual target bonus
- Equity
- Team Building Events
- Regular team onsites and local meetups in Los Altos, CA as well as different cities
- Traveling 2-4 times a year is expected for all roles
Company Overview
Company H1B Sponsorship
Apply To This Job