Search

GPGPU Performance Tooling Engineer

Initio Capital
locationNew York, NY, USA
PublishedPublished: 6/14/2022
Technology
Full Time

Job Description

Job Description

Location: Hybrid – Santa Clara, CA or New York, NY
Type: Full-Time | Salary: $150K–$300K + Competitive Equity
Visa Sponsorship: H-1B, O-1, OPT Available

🚀 About the Opportunity

Initio Capital is hiring a Performance Tooling Engineer on behalf of a stealth-stage systems company building custom RISC-V infrastructure with AI acceleration at its core. The company is led by silicon and systems veterans and backed by tier-1 investors. Their vision: deliver ultra-efficient, secure, and high-performance compute across ML, analytics, and next-gen workloads.

This role focuses on performance visibility at the lowest levels—instrumenting how deep learning workloads actually perform across simulators, FPGAs, and physical hardware.

🧠 About the Role

As a GPGPU Performance Tooling Engineer, you’ll own and extend the company’s profiling infrastructure—building low-overhead instrumentation to track performance bottlenecks and throughput gaps on GPU-like accelerators.

You’ll work hands-on with frameworks like Perfetto, contribute to open-source tooling, and collaborate closely with hardware and compiler teams to align insights with optimization strategies.

🔧 What You’ll Do

  • Build and extend internal performance tooling, with a focus on Perfetto-based profiling

  • Develop instrumentation layers for real-time and post-run analysis across simulators, emulation, FPGAs, and silicon

  • Analyze bottlenecks in memory bandwidth, latency, and compute throughput on custom GPGPU-like architectures

  • Collaborate with software, compiler, and silicon design teams to prioritize optimizations

  • Automate collection and visualization of performance signals for hardware bring-up and AI inference workflows

  • Contribute back to open-source projects where appropriate

✅ What We’re Looking For

  • 2–5+ years of experience in low-level systems profiling or performance tooling

  • Deep fluency in Perfetto, Protobuf, and systems programming (C or C++)

  • Strong understanding of computer architecture, memory systems, and runtime behavior

  • Experience building and interpreting GPGPU performance traces

  • Ability to work independently and collaboratively across deep technical domains

🟢 Bonus Points

  • Experience profiling GPGPU execution and optimizing ML workloads

  • Familiarity with deep learning frameworks like PyTorch or TensorFlow

  • Knowledge of memory subsystem bottlenecks (e.g., DRAM bandwidth, shared memory stalls)

  • Working proficiency in Rust or scripting languages used in performance tooling

  • Contributions to open-source observability, tracing, or instrumentation frameworks

💸 Compensation & Perks

  • Salary: $150K – $300K

  • Equity: Competitive early-stage grant

  • Hybrid in Santa Clara, CA or New York, NY

  • Visa sponsorship available (H-1B, O-1, OPT)

  • Join a founding engineering team at the edge of silicon and software

  • Shape the performance visibility layer that powers next-gen AI acceleration

If you want to build the tools that uncover what truly limits performance in modern compute systems—this is the role.

Apply now to join a deeply technical, mission-driven team.

Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...