Bakı,
Azərbaycan
12/02/2026
-
26/02/2026
İş haqqında məlumat
Infrastructure & Platform Development
- Design and implement scalable ML infrastructure on premises and cloud platforms
- Build and maintain ML experimentation and production environments
- Develop and manage container orchestration systems for ML workloads
- Implement GPU resource management and optimization strategies
- Design storage solutions for datasets, models, and artifacts
ML Pipeline & Automation
- Create CI/CD pipelines for ML model training, validation, and deployment
- Implement automated model retraining and versioning systems
- Build orchestration workflows for data processing and model training
- Develop automated testing frameworks for ML models and pipelines
- Design and implement feature stores for feature engineering and reuse
Monitoring & Operations
- Implement model monitoring systems for performance, drift, and data quality
- Set up logging, alerting, and observability for ML systems
- Establish model governance and compliance tracking
- Create dashboards for model performance and infrastructure metrics
- Develop incident response procedures for production ML systems
Collaboration & Best Practices
- Partner with data scientists and AI engineers to productionize ML models
- Establish MLOps best practices and standards across teams
- Provide technical guidance on deployment architectures
- Document processes, systems, and runbooks
- Mentor junior engineers and data scientists on MLOps practices
Tələblər
Education
- Bachelor's degree in Computer Science, Engineering, or related field (or equivalent experience)
- Master's degree preferred but not required with sufficient practical experience
Experience
- 2+ years working as ML/Software/DevOps Engineer
- Proven track record of building production ML systems at scale
- Experience supporting data science teams in enterprise environments
Technical Skills
- Strong proficiency in Python and some experience with at least one low-level programming language (C/C++, Go, Rust)
- Deep understanding of containerization (Docker, Kubernetes)
- Hands-on experience with CI/CD tools (Jenkins, GitLab CI, GitHub Actions, etc.)
- Knowledge of ML frameworks (TensorFlow, PyTorch, scikit-learn)
- Experience with workflow orchestration (Airflow, Kubeflow, Prefect, etc.)
- Hands-on experience with experiment tracking tools (MLflow, ClearML)
Core Competencies
- Solid understanding of ML lifecycle and model development processes
- Strong Linux/Unix systems administration skills
- Experience with version control systems (Git) and branching strategies
- Knowledge of networking, security, and compliance in cloud and on-prem environments
- Understanding of distributed computing and parallel processing
- Knowledge of microservices architecture and API design
Soft Skills:
- Strong problem-solving and debugging abilities
- Excellent communication skills with both technical and non-technical stakeholders
- Ability to work independently and manage multiple priorities
- Collaborative mindset with emphasis on enabling others
- Adaptability to rapidly changing technology landscape
- Pragmatic approach to balancing innovation with reliability
Preferred Qualifications:
If you know at least 3+ skills from the sections below, please apply.
Technical skills:
- Experience with cloud platforms (Azure ML, AWS SageMaker, or GCP Vertex AI)
- Experience with GitOps practices and tools (ArgoCD, Flux, GitLab with GitOps) for declarative infrastructure and ML pipeline management
- Experience with feature stores (Feast, Tecton, Hopsworks, or similar)
- Experience with model monitoring solutions (Evidently, WhyLabs, Fiddler, Arize, Whylogs)
- Experience with ML explainability tools (SHAP, LIME, Captum, Alibi, InterpretML)
- Hands-on experience with hyperparameter optimization tools (Optuna, Ray Tune, Hyperopt, Katib)
- Experience with distributed training frameworks (Ray Train, Horovod, DeepSpeed, PyTorch DDP, Megatron)
- Experience with model serving frameworks (TensorFlow Serving, TorchServe, Triton, MLServer, or similar)
- Experience with data versioning tools (DVC, Pachyderm, LakeFS)
- Experience with GPU optimization (CUDA, TensorRT, ONNX Runtime, flash-attention)
- Knowledge of GPU allocation, sharing, management and profiling
LLM Ops:
- Experience with LLM inference frameworks (vLLM, TGI, TensorRT-LLM)
- Familiarity with agent orchestration frameworks (LangChain, LangGraph, LlamaIndex)
- Experience with LLM optimization: quantization, KV cache management, continuous batching
- Experience with prompt engineering and versioning tools (LangSmith, PromptLayer, Weights & Biases Prompts, Helicone)We offer
- 5/2, 09.00-18.00;
- Meal allowance;
- Annual performance bonuses;
- Corporate health program: VIP voluntary insurance and special discounts for gyms;
- Access to Digital Learning Platforms.