Senior Site Reliability Engineer (GPU & ML Infrastructure), Paris, France / Grenoble, France

Senior Site Reliability Engineer (GPU & ML Infrastructure)

Hybrid -

Paris, France
Grenoble, France

Engineering

Permanent - Full Time

Job ID: r20468

What You'll Do:

At Criteo, the Platform Core group builds the foundational infrastructure powering our global advertising platform. We design and operate large-scale, resilient systems supporting real-time decision-making and data processing across thousands of services.

As we expand our distributed computing and ML infrastructure capabilities, we are building a new team focused on GPU platforms and high-performance model serving technologies.

As a Site Reliability Engineer in the GPU team, you will help design, operate, and scale the infrastructure powering machine learning training and inference workloads.

You will work on technologies such as:

Ray on Kubernetes

Build and operate scalable Ray clusters running on Kubernetes.
Develop reliable self-service distributed computing platforms for ML workloads.
Improve provisioning, observability, reliability, and operational efficiency of ray-as-a-service environments.

NVIDIA Triton Inference Server

Operate and optimize large-scale inference platforms using Triton.
Improve latency, throughput, scalability, and GPU utilization for deep learning inference workloads.

You will collaborate closely with ML engineers, data scientists, and infrastructure teams to deliver reliable, production-grade ML platforms accelerating innovation across Criteo.

Who You Are:

5+ years of experience in backend engineering, Site Reliability Engineering, or platform engineering roles focused on distributed systems.
Strong experience with Kubernetes, including workload scheduling, dynamic provisioning, and custom controllers/operators.
Hands-on experience running or optimizing GPU-based workloads in production, ideally for ML training or inference systems.
Strong software engineering skills in C#, Python, Go, or similar languages, with a focus on building reliable distributed systems.
Experience building or operating production-grade infrastructure with strong requirements around performance, scalability, and reliability.
Strong interest in automation, observability, and designing systems that scale efficiently under high load.

Bonus Points

Experience with distributed ML frameworks such as Ray or similar systems.
Familiarity with inference serving stacks such as NVIDIA Triton or TensorRT.
Experience with GPU scheduling, resource management, or multi-tenant GPU platforms.
Exposure to cloud-native GPU orchestration (GKE, EKS, or on-prem Kubernetes GPU clusters).

We acknowledge that many candidates may not meet every single role requirement listed above. If your experience looks a little different from our requirements but you believe that you can still bring value to the role, we’d love to see your application!

Who We Are:

We’re Criteo, the Commerce Intelligence Platform. Criteo helps businesses turn shopper signals into commerce outcomes while delivering more relevant experiences for shoppers. We use proprietary commerce intelligence and AI decisioning to drive relevance for shoppers and performance for businesses.

At Criteo, our culture is as unique as it is diverse. From our offices across the globe or from the comfort of home, our 3,600 Criteos collaborate together to build an open, impactful, and forward-thinking environment.

We foster a workplace where everyone is valued, and employment decisions are based solely on skills, qualifications, and business needs—never on non-job-related factors or legally protected characteristics.

What We Offer:

🏢 Ways of working – Our hybrid model blends home with in-office experiences, making space for both.
📈 Grow with us – Learning, mentorship & career development programs.
💪 Your wellbeing matters – Health benefits, wellness perks & mental health support.
🤝 A team that cares – Diverse, inclusive, and globally connected.
💸 Fair pay & perks – Attractive salary, with performance-based rewards and family-friendly policies, plus the potential for equity depending on role and level.

Additional benefits may vary depending on the country where you work and the nature of your employment with Criteo.

Great people shape greater opportunities.

At Criteo, we believe great people and reliable products are essential to build an open future for all.

Everything you need to know about our culture, vision, values, and how we live them with people, both in and out of Criteo, is in our Culture Book. Discover who we are, what we do, and how we act as a company and as Criteos!

Our culture

Meet our Criteos in France!

Criteo came to spring off of a few great minds, back in 2005. From the start-up incubator to the global company we are now, Criteo has grown, standing now for 37 offices and further coworking spaces across the globe. In France, we welcome around 1,000 Criteos in our two spacious and collaborative hubs of Paris and Grenoble.

Find out More

Senior Site Reliability Engineer (GPU & ML Infrastructure)

What You'll Do:

Who You Are:

Who We Are:

What We Offer:

Perks & Benefits for all Criteos.

Ways of Working

Learning & Career Development

Corporate Social Responsibility

Great people shape greater opportunities.

Meet our Criteos in France!

The Future is Yours.