Get the latest tech news

EGPU: Extending eBPF Programmability and Observability to GPUs


DOI: https://doi.org/10.1145/3723851.3726984 HCDS '25: 4th Workshop on Heterogeneous Composable and Disaggregated Systems, Rotterdam, Netherlands, March 2025 Precise GPU observability and programmability are essential for optimizing performance in AI workloads and other computationally intensive high-performance computing (HPC) applications. In this paper, we introduce eGPU, the first framework and eBPF runtime that dynamically offloads eBPF bytecode onto GPUs via dynamic PTX injection.

We detail the design and implementation of eGPU, which integrates kernel-level and user-space eBPF instrumentation hooks, runtime PTX generation, and shared-memory synchronization, providing a seamless, low-overhead observability platform for modern HPC and AI workloads. Finally, fleet-level resource attribution dashboards break down GPU hours or FLOPs usage by model, user, or product group, ensuring that optimization efforts target the largest consumers of compute time. Collectively, these workflows and tools from Meta's observability stack demonstrate how systematic performance monitoring, automated data analysis, and a layered telemetry architecture can enable large-scale AI system efficiency, aligning closely with related research on kernels, dynamic instrumentation, and just-in-time optimization strategies in data-intensive computing environments.

Get the Android app

Or read this on Hacker News

Read more on:

Photo of GPUs

GPUs

Photo of Observability

Observability

Photo of eBPF

eBPF

Related news:

News photo

Show HN: Coroot – eBPF-based, open source observability with actionable insights

News photo

Parasail says its fleet of on-demand GPUs is larger than Oracle’s entire cloud

News photo

OpenAI Says 'Our GPUs Are Melting' As It Limits ChatGPT Image Generation Requests