Senior Machine Learning Engineer

AI/ML - Remote - Full-time

Department: AI/ML

Location: Remote

Type: Full-time

Posted: 2026-06-23

Job Description

Our client is a European industrial technology group moving machine learning from pilots into production across its manufacturing operations. The first production use cases are predictive maintenance on rotating equipment and automated visual inspection on production lines. This is production engineering work, not research. You embed long-term in the client's ML platform team and own models from training pipeline through serving endpoint to drift monitoring. You take direction from the client's ML lead and work alongside their data scientists, platform engineers, and plant-side process specialists.

What you'll be doing

Ship PyTorch models for predictive maintenance and visual inspection from prototype to monitored production service
Build and operate model serving on Kubernetes: containerized inference, autoscaling, GPU scheduling
Own training pipelines in MLflow, covering experiment tracking, model registry, and reproducible retraining runs
Run batch scoring and Kafka-fed streaming inference against live plant data
Instrument data and model drift monitoring, with alerting wired to retraining triggers
Harden the release path: CI/CD for models, canary rollouts, fast rollback
Profile and cut inference latency and GPU cost on vision workloads
Review feature pipelines with the client's data engineers and define the input contracts models depend on

What you'll need

5+ years of ML engineering, or backend engineering with 2+ years owning production ML systems
Strong Python engineering: typed, tested, reviewed code built for production
Production experience with PyTorch, training and deploying vision or time-series models
Hands-on Kubernetes: you have deployed, debugged, and scaled inference services on it
Experience with MLflow or comparable tooling for experiment tracking, model registry, and deployment
Experience monitoring models in production: data drift, model drift, and performance degradation
Working knowledge of Kafka or a comparable streaming platform for online inference
Professional working proficiency in English
Based in the EU with working hours overlapping CET

Nice to have

Industrial computer vision experience across camera systems, defect detection, and edge deployment constraints
Triton Inference Server, TorchServe, or KServe in production
CKAD or an equivalent Kubernetes certification
German at B2 or above
Availability for occasional business travel within the EU

Engagement terms

Remote-first. Deliverable-based, no time tracking.
Monthly wellness allowance, scaling with tenure.
Annual learning budget, scaling with tenure.
Home office setup allowance, refreshed every two years.
25 days annual leave plus one additional day per year of tenure.
Birthday off.
Family leave and private healthcare coverage.