GPGPU Performance Tooling Engineer

Initio Capital

New York, NY, USA

Published: 6/14/2022

Technology

Full Time

Job Description

Location: Hybrid – Santa Clara, CA or New York, NY
Type: Full-Time | Salary: $150K–$300K + Competitive Equity
Visa Sponsorship: H-1B, O-1, OPT Available

🚀 About the Opportunity

Initio Capital is hiring a Performance Tooling Engineer on behalf of a stealth-stage systems company building custom RISC-V infrastructure with AI acceleration at its core. The company is led by silicon and systems veterans and backed by tier-1 investors. Their vision: deliver ultra-efficient, secure, and high-performance compute across ML, analytics, and next-gen workloads.

This role focuses on performance visibility at the lowest levels—instrumenting how deep learning workloads actually perform across simulators, FPGAs, and physical hardware.

🧠 About the Role

As a GPGPU Performance Tooling Engineer, you’ll own and extend the company’s profiling infrastructure—building low-overhead instrumentation to track performance bottlenecks and throughput gaps on GPU-like accelerators.

You’ll work hands-on with frameworks like Perfetto, contribute to open-source tooling, and collaborate closely with hardware and compiler teams to align insights with optimization strategies.

🔧 What You’ll Do

Build and extend internal performance tooling, with a focus on Perfetto-based profiling
Develop instrumentation layers for real-time and post-run analysis across simulators, emulation, FPGAs, and silicon
Analyze bottlenecks in memory bandwidth, latency, and compute throughput on custom GPGPU-like architectures
Collaborate with software, compiler, and silicon design teams to prioritize optimizations
Automate collection and visualization of performance signals for hardware bring-up and AI inference workflows
Contribute back to open-source projects where appropriate

✅ What We’re Looking For

2–5+ years of experience in low-level systems profiling or performance tooling
Deep fluency in Perfetto, Protobuf, and systems programming (C or C++)
Strong understanding of computer architecture, memory systems, and runtime behavior
Experience building and interpreting GPGPU performance traces
Ability to work independently and collaboratively across deep technical domains

🟢 Bonus Points

Experience profiling GPGPU execution and optimizing ML workloads
Familiarity with deep learning frameworks like PyTorch or TensorFlow
Knowledge of memory subsystem bottlenecks (e.g., DRAM bandwidth, shared memory stalls)
Working proficiency in Rust or scripting languages used in performance tooling
Contributions to open-source observability, tracing, or instrumentation frameworks

💸 Compensation & Perks

Salary: $150K – $300K
Equity: Competitive early-stage grant
Hybrid in Santa Clara, CA or New York, NY
Visa sponsorship available (H-1B, O-1, OPT)
Join a founding engineering team at the edge of silicon and software
Shape the performance visibility layer that powers next-gen AI acceleration

If you want to build the tools that uncover what truly limits performance in modern compute systems—this is the role.

Apply now to join a deeply technical, mission-driven team.