Senior Machine Learning Engineer
AI/ML - Remote - Full-time
Department: AI/ML
Location: Remote
Type: Full-time
Posted: 2026-06-23
Job Description
Our client is a European industrial technology group moving machine learning from pilots into production across its manufacturing operations. The first production use cases are predictive maintenance on rotating equipment and automated visual inspection on production lines. This is production engineering work, not research. You embed long-term in the client's ML platform team and own models from training pipeline through serving endpoint to drift monitoring. You take direction from the client's ML lead and work alongside their data scientists, platform engineers, and plant-side process specialists.
What you'll be doing
- Ship PyTorch models for predictive maintenance and visual inspection from prototype to monitored production service
- Build and operate model serving on Kubernetes: containerized inference, autoscaling, GPU scheduling
- Own training pipelines in MLflow, covering experiment tracking, model registry, and reproducible retraining runs
- Run batch scoring and Kafka-fed streaming inference against live plant data
- Instrument data and model drift monitoring, with alerting wired to retraining triggers
- Harden the release path: CI/CD for models, canary rollouts, fast rollback
- Profile and cut inference latency and GPU cost on vision workloads
- Review feature pipelines with the client's data engineers and define the input contracts models depend on
What you'll need
- 5+ years of ML engineering, or backend engineering with 2+ years owning production ML systems
- Strong Python engineering: typed, tested, reviewed code built for production
- Production experience with PyTorch, training and deploying vision or time-series models
- Hands-on Kubernetes: you have deployed, debugged, and scaled inference services on it
- Experience with MLflow or comparable tooling for experiment tracking, model registry, and deployment
- Experience monitoring models in production: data drift, model drift, and performance degradation
- Working knowledge of Kafka or a comparable streaming platform for online inference
- Professional working proficiency in English
- Based in the EU with working hours overlapping CET
Nice to have
- Industrial computer vision experience across camera systems, defect detection, and edge deployment constraints
- Triton Inference Server, TorchServe, or KServe in production
- CKAD or an equivalent Kubernetes certification
- German at B2 or above
- Availability for occasional business travel within the EU
Engagement terms
- Remote-first. Deliverable-based, no time tracking.
- Monthly wellness allowance, scaling with tenure.
- Annual learning budget, scaling with tenure.
- Home office setup allowance, refreshed every two years.
- 25 days annual leave plus one additional day per year of tenure.
- Birthday off.
- Family leave and private healthcare coverage.