Job Description
Job Description
Location: Hybrid – Santa Clara, CA or New York, NY
Type: Full-Time | Salary: $150K–$300K + Competitive Equity
Visa Sponsorship: H-1B, O-1, OPT Available
🚀 About the Opportunity
Initio Capital is hiring a Performance Tooling Engineer on behalf of a stealth-stage systems company building custom RISC-V infrastructure with AI acceleration at its core. The company is led by silicon and systems veterans and backed by tier-1 investors. Their vision: deliver ultra-efficient, secure, and high-performance compute across ML, analytics, and next-gen workloads.
This role focuses on performance visibility at the lowest levels—instrumenting how deep learning workloads actually perform across simulators, FPGAs, and physical hardware.
🧠 About the Role
As a GPGPU Performance Tooling Engineer, you’ll own and extend the company’s profiling infrastructure—building low-overhead instrumentation to track performance bottlenecks and throughput gaps on GPU-like accelerators.
You’ll work hands-on with frameworks like Perfetto, contribute to open-source tooling, and collaborate closely with hardware and compiler teams to align insights with optimization strategies.
🔧 What You’ll Do
-
Build and extend internal performance tooling, with a focus on Perfetto-based profiling
-
Develop instrumentation layers for real-time and post-run analysis across simulators, emulation, FPGAs, and silicon
-
Analyze bottlenecks in memory bandwidth, latency, and compute throughput on custom GPGPU-like architectures
-
Collaborate with software, compiler, and silicon design teams to prioritize optimizations
-
Automate collection and visualization of performance signals for hardware bring-up and AI inference workflows
-
Contribute back to open-source projects where appropriate
✅ What We’re Looking For
-
2–5+ years of experience in low-level systems profiling or performance tooling
-
Deep fluency in Perfetto, Protobuf, and systems programming (C or C++)
-
Strong understanding of computer architecture, memory systems, and runtime behavior
-
Experience building and interpreting GPGPU performance traces
-
Ability to work independently and collaboratively across deep technical domains
🟢 Bonus Points
-
Experience profiling GPGPU execution and optimizing ML workloads
-
Familiarity with deep learning frameworks like PyTorch or TensorFlow
-
Knowledge of memory subsystem bottlenecks (e.g., DRAM bandwidth, shared memory stalls)
-
Working proficiency in Rust or scripting languages used in performance tooling
-
Contributions to open-source observability, tracing, or instrumentation frameworks
💸 Compensation & Perks
-
Salary: $150K – $300K
-
Equity: Competitive early-stage grant
-
Hybrid in Santa Clara, CA or New York, NY
-
Visa sponsorship available (H-1B, O-1, OPT)
-
Join a founding engineering team at the edge of silicon and software
-
Shape the performance visibility layer that powers next-gen AI acceleration
If you want to build the tools that uncover what truly limits performance in modern compute systems—this is the role.
Apply now to join a deeply technical, mission-driven team.