[Remote] Software Co-Design AI HPC Systems
Note: The job is a remote job and is open to candidates in USA. Microsoft is a leading technology company dedicated to empowering individuals and organizations. The Software Co-Design AI HPC Systems role focuses on architecting and optimizing next-generation AI systems, collaborating across hardware and software to enhance performance and efficiency.
Responsibilities
- Lead the co-design of AI systems across hardware and software boundaries, spanning accelerators, interconnects, memory systems, storage, runtimes, and distributed training/inference frameworks
- Drive architectural decisions by analyzing real workloads, identifying bottlenecks across compute, communication, and data movement, and translating findings into actionable system and hardware requirements
- Co-design and optimize parallelism strategies, execution models, and distributed algorithms to improve scalability, utilization, reliability, and cost efficiency of large-scale AI systems
- Develop and evaluate what-if performance models to project system behavior under future workloads, model architectures, and hardware generations, providing early guidance to hardware and platform roadmaps
- Partner with compiler, kernel, and runtime teams to unlock the full performance of current and next-generation accelerators, including custom kernels, scheduling strategies, and memory optimizations
- Influence and guide AI hardware design at system and silicon levels, including accelerator microarchitecture, interconnect topology, memory hierarchy, and system integration trade-offs
- Lead cross-functional efforts to prototype, validate, and productionize high-impact co-design ideas, working across infrastructure, hardware, and product teams
- Mentor senior engineers and researchers, set technical direction, and raise the overall bar for systems rigor, performance engineering, and co-design thinking across the organization
Skills
- Bachelor's Degree in Computer Science or related technical field AND 6+ years technical engineering experience with coding in languages including, but not limited to, C, C++, C#, Java, JavaScript, or Python OR equivalent experience
- Master's Degree in Computer Science or related technical field AND 8+ years technical engineering experience with coding in languages including, but not limited to, C, C++, C#, Java, JavaScript, or Python OR Bachelor's Degree in Computer Science or related technical field AND 12+ years technical engineering experience with coding in languages including, but not limited to, C, C++, C#, Java, JavaScript, or Python OR equivalent experience
- Strong background in one or more of the following areas: AI accelerator or GPU architectures, Distributed systems and large-scale AI training/inference, High-performance computing (HPC) and collective communications, ML systems, runtimes, or compilers, Performance modeling, benchmarking, and systems analysis, Hardware–software co-design for AI workloads
- Proficiency in systems-level programming (e.g., C/C++, CUDA, Python) and performance-critical software development
- Proven ability to work across organizational boundaries and influence technical decisions involving multiple stakeholders
- Experience designing or operating large-scale AI clusters for training or inference
- Deep familiarity with LLMs, multimodal models, or recommendation systems, and their systems-level implications
- Experience with accelerator interconnects and communication stacks (e.g., NCCL, MPI, RDMA, high-speed Ethernet or InfiniBand)
- Background in performance modeling and capacity planning for future hardware generations
- Prior experience contributing to or leading hardware roadmaps, silicon bring-up, or platform architecture reviews
- Publications, patents, or open-source contributions in systems, architecture, or ML systems are a plus
Benefits
- Certain roles may be eligible for benefits and other compensation. Find additional benefits and pay information here: https://careers.microsoft.com/us/en/us-corporate-pay
Company Overview
Company H1B Sponsorship
Apply To This Job