Caspian Innovation Center

MLOps Engineer

Caspian Innovation Center
Ə/h razılaşma ilə
5563
Tam iş günü
Bakı, Azərbaycan
12/02/2026 - 26/02/2026

İş haqqında məlumat

Infrastructure & Platform Development

  • Design and implement scalable ML infrastructure on premises and cloud platforms
  • Build and maintain ML experimentation and production environments
  • Develop and manage container orchestration systems for ML workloads
  • Implement GPU resource management and optimization strategies
  • Design storage solutions for datasets, models, and artifacts

ML Pipeline & Automation

  • Create CI/CD pipelines for ML model training, validation, and deployment
  • Implement automated model retraining and versioning systems
  • Build orchestration workflows for data processing and model training
  • Develop automated testing frameworks for ML models and pipelines
  • Design and implement feature stores for feature engineering and reuse

Monitoring & Operations

  • Implement model monitoring systems for performance, drift, and data quality
  • Set up logging, alerting, and observability for ML systems
  • Establish model governance and compliance tracking
  • Create dashboards for model performance and infrastructure metrics
  • Develop incident response procedures for production ML systems

Collaboration & Best Practices

  • Partner with data scientists and AI engineers to productionize ML models
  • Establish MLOps best practices and standards across teams
  • Provide technical guidance on deployment architectures
  • Document processes, systems, and runbooks
  • Mentor junior engineers and data scientists on MLOps practices

Tələblər

Education

  • Bachelor's degree in Computer Science, Engineering, or related field (or equivalent experience)
  • Master's degree preferred but not required with sufficient practical experience

Experience

  • 2+ years working as ML/Software/DevOps Engineer
  • Proven track record of building production ML systems at scale
  • Experience supporting data science teams in enterprise environments

Technical Skills

  • Strong proficiency in Python and some experience with at least one low-level programming language (C/C++, Go, Rust)
  • Deep understanding of containerization (Docker, Kubernetes)
  • Hands-on experience with CI/CD tools (Jenkins, GitLab CI, GitHub Actions, etc.)
  • Knowledge of ML frameworks (TensorFlow, PyTorch, scikit-learn)
  • Experience with workflow orchestration (Airflow, Kubeflow, Prefect, etc.)
  • Hands-on experience with experiment tracking tools (MLflow, ClearML)

Core Competencies

  • Solid understanding of ML lifecycle and model development processes
  • Strong Linux/Unix systems administration skills
  • Experience with version control systems (Git) and branching strategies
  • Knowledge of networking, security, and compliance in cloud and on-prem environments
  • Understanding of distributed computing and parallel processing
  • Knowledge of microservices architecture and API design

Soft Skills:

  • Strong problem-solving and debugging abilities
  • Excellent communication skills with both technical and non-technical stakeholders
  • Ability to work independently and manage multiple priorities
  • Collaborative mindset with emphasis on enabling others
  • Adaptability to rapidly changing technology landscape
  • Pragmatic approach to balancing innovation with reliability

Preferred Qualifications:

If you know at least 3+ skills from the sections below, please apply.

Technical skills:

  • Experience with cloud platforms (Azure ML, AWS SageMaker, or GCP Vertex AI)
  • Experience with GitOps practices and tools (ArgoCD, Flux, GitLab with GitOps) for declarative infrastructure and ML pipeline management
  • Experience with feature stores (Feast, Tecton, Hopsworks, or similar)
  • Experience with model monitoring solutions (Evidently, WhyLabs, Fiddler, Arize, Whylogs)
  • Experience with ML explainability tools (SHAP, LIME, Captum, Alibi, InterpretML)
  • Hands-on experience with hyperparameter optimization tools (Optuna, Ray Tune, Hyperopt, Katib)
  • Experience with distributed training frameworks (Ray Train, Horovod, DeepSpeed, PyTorch DDP, Megatron)
  • Experience with model serving frameworks (TensorFlow Serving, TorchServe, Triton, MLServer, or similar)
  • Experience with data versioning tools (DVC, Pachyderm, LakeFS)
  • Experience with GPU optimization (CUDA, TensorRT, ONNX Runtime, flash-attention)
  • Knowledge of GPU allocation, sharing, management and profiling

LLM Ops:

  • Experience with LLM inference frameworks (vLLM, TGI, TensorRT-LLM)
  • Familiarity with agent orchestration frameworks (LangChain, LangGraph, LlamaIndex)
  • Experience with LLM optimization: quantization, KV cache management, continuous batching
  • Experience with prompt engineering and versioning tools (LangSmith, PromptLayer, Weights & Biases Prompts, Helicone)We offer
  • 5/2, 09.00-18.00;
  • Meal allowance;
  • Annual performance bonuses;
  • Corporate health program: VIP voluntary insurance and special discounts for gyms;
  • Access to Digital Learning Platforms.

Oxşar vakansiyalar