[Remote] Platform Engineer
Note: The job is a remote job and is open to candidates in USA. Harrison Clarke is an early-stage company building advanced AI systems, and they are seeking a senior platform engineer to take ownership of their core platform. The role involves managing multi-region Kubernetes clusters, GPU orchestration, and ensuring infrastructure security while partnering closely with machine learning engineers.
Responsibilities
- Design and manage multi-region Kubernetes clusters across cloud and GPU-focused providers using infrastructure-as-code
- Own the deployment lifecycle through GitOps practices (Helm, Kustomize, automated releases, continuous delivery)
- Manage GPU infrastructure, including scheduling efficiency, workload placement, and cold-start optimization
- Oversee networking systems such as ingress, gateways, load balancing, and cross-region connectivity
- Build and maintain observability across metrics, logs, traces, and performance profiling
- Ensure infrastructure security across identity, secrets, and encryption
- Maintain CI/CD workflows supporting a monorepo of services and deployment artifacts
- Partner closely with ML engineers to optimize model serving and GPU utilization
Skills
- Strong experience operating Kubernetes in production environments, including troubleshooting, autoscaling, and upgrades
- Proven background with infrastructure-as-code tools (e.g., Terraform, Pulumi)
- Hands-on experience running GPU workloads on Kubernetes and understanding resource optimization
- Familiarity with GitOps tooling such as ArgoCD or Flux, and Helm-based deployments
- Experience with in-memory data systems (e.g., Redis) and distributed architectures
- Solid understanding of observability tooling and practices
- Strong networking fundamentals, particularly in low-latency or distributed systems
- Experience working in environments with broad ownership across infrastructure
- Exposure to GPU cloud providers beyond major hyperscalers
- Experience with real-time or streaming infrastructure
- Proficiency in Go or Python
- Familiarity with ML model deployment and optimization
- Experience managing infrastructure cost, particularly for GPU-heavy workloads
Company Overview
Apply To This Job