SearchLondonJobs.co.uk

🏛️ London's Premier Job Portal

← Back to London Jobs

Senior Networking Solution Test Engineer – AI Cluster Debugging

Company: NVIDIA

Location: Remote, London

Posted: June 03, 2026

Apply for This Position

Submit Application

Position Details

We are looking for a Senior Networking Test Engineer with strong system‑level debugging skills to join our End‑to‑End Verification team! You will work on pioneering NVLink, Ethernet and InfiniBand ‑ based AI clusters. Additionally, you will ow complex issues across hardware, system software and AI workloads.


What you’ll be doing:
+ Design and review test and product requirements across the NVLink, Ethernet and InfiniBand / NIC / DPU / Switch portfolio, focusing on large‑scale AI cluster behavior.
+ Build and maintain realistic customer‑like testbeds, including heterogeneous hardware, OS / driver combinations and complex network fabrics.
+ Own end‑to‑end cluster troubleshooting: reproduce customer scenarios, triage across the stack and drive issues to root cause and fix.
+ Read and understand relevant source code to identify defects, validate fixes and improve logging and instrumentation.
+ Collaborate closely with development teams to debug NCCL, RoCE/RD...