SearchLondonJobs.co.uk

🏛️ London's Premier Job Portal

← Back to London Jobs

Lead Inference Platform Support Engineer - AI I

Company: PowerToFly

Location: toronto, London

Posted: May 25, 2026

Apply for This Position

Submit Application

Position Details

About the Role

As a Lead Inference Platform Engineer, you will:

  • Optimize LLMs and ML models for high-performance inference using techniques such as quantization, pruning, distillation, and hardware specific tuning
  • Deploy and scale inference workloads on GPUs across AWS, Azure, GCP and internal Kubernetes clusters, ensuring predictable performance during peak traffic hours, especially during business hours
  • Implement routing and failover strategies for OpenAI/Anthropic/Vertex AI traffic
  • Integrate models into production grade APIs supporting TR products and enterprise workflows.
  • Develop highly optimized environment and eliminate performance bottlenecks to reduce latency
  • Collaborate with Platform Engineering teams (Landing Zones, Network, Storage, Compute, AI) to ensure inference workloads align with TR’s cloud native patterns (AWS, Azure, GCP, OCI)
  • Build and optimize containerized inference pipelines...