ML Engineer - Infrastructure

Witness AI logo

Witness AI

About the Role

WitnessAI is a leader in providing innovative networking solutions designed to enhance security, performance, and reliability for businesses of all sizes. We are seeking an ML Infrastructure Engineer to optimize, deploy, and scale machine learning models in production environments. You will play a critical role in scaling GPU resources, building continuous learning pipelines, and integrating a variety of inference frameworks. Your expertise in model quantization, pruning, and other optimization techniques will ensure our models run efficiently and effectively.

Responsibilities

Develop and Optimize

  • Design and manage scalable GPU infrastructures for model training and inference.
  • Build automated pipelines to accelerate ML workflows.
  • Implement feedback loops for continuous learning.
  • Enhance model efficiency in resource-constrained environments.

Implement Advanced Inference Solutions

  • Evaluate and integrate inference platforms like NVIDIA Triton and vLLM to ensure high availability, scalability, and reliability of deployed models.

Collaborate for Impact

  • Work closely with applied scientists, software engineers, and DevOps professionals to deploy models that drive our company's mission forward.
  • Document best practices to support team knowledge sharing and improve code quality and reproducibility.

Qualifications

  • Bachelor's or Master's degree in Computer Science, Engineering, or a related field.
  • 2+ years of experience building and scaling machine learning systems.
  • Proven experience in scaling GPU resources for ML applications.
  • Experience with inference platforms like NVIDIA Triton, vLLM, or similar.
  • Expertise in model quantization, pruning, and optimization (e.g., TensorRT, ONNX).
  • Skilled in automating data collection, preprocessing, model retraining, and deployment.
  • Proficient with cloud platforms (AWS preferred, GCP, or Azure) for deploying and managing GPU instances.
  • Strong Python skills; familiarity with other scripting languages is a plus.
  • Experience with CUDA packages.
  • Proficiency in PyTorch, TensorFlow, or similar frameworks.
  • Skilled in Docker and Kubernetes.
  • Experience with Jenkins, GitHub CI/CD, or similar tools.
  • Experience with Prometheus, Grafana, or similar monitoring solutions.

Soft Skills

  • Strong problem-solving and analytical abilities.
  • Excellent communication and teamwork skills.
  • Ability to work independently and manage multiple tasks effectively.
  • Proactive attitude toward learning and adopting new technologies.

Benefits

  • Hybrid work environment
  • Competitive salary
  • Health, dental, and vision insurance
  • 401(k) plan
  • Opportunities for professional development and growth
  • Generous vacation policy

Location

    San Francisco, US

Job type

  • Fulltime

Role

Engineering

Keywords

  • ML Infrastructure
  • GPU Optimization
  • Model Deployment