Solution Engineer - AI/HPC Platforms Infrastructure

Drivenets logo

Drivenets

The Role

As a Solution Engineer, you will play a pivotal role in designing, deploying, and optimizing Drivenets’ Network Cloud AI Infrastructure solutions. This individual contributor role requires a blend of technical expertise, leadership, and hands-on experience to implement cutting-edge solutions for our customers. You will collaborate with sales engineering teams, customers, and cross-functional teams, including Product Management, Solution Architects, Engineering, and Marketing, to define technical requirements, articulate solution value, and ensure successful deployment on-site.

Key responsibilities include guiding customers through the design and deployment process, aligning technical solutions with business needs, and providing critical feedback to improve Drivenets’ product offerings. This position demands strong technical acumen, exceptional communication skills, and the ability to lead complex, high-impact projects in dynamic environments.

Responsibilities

  • Building robust AI/HPC infrastructure for new and existing customers.
  • Technical hands-on role in building and supporting NVIDIA/AMD-based platforms.
  • Support operational and reliability aspects of large-scale AI clusters, focusing on performance at scale, training stability, real-time monitoring, logging, and alerting.
  • Administer Linux systems, ranging from powerful GPU-enabled servers to general-purpose compute systems.
  • Design and plan rack layouts and network topologies to support customer requirements.
  • Design and evaluate automation scripts for network operations, configuring server and switch fabrics.
  • Perform data center upgrades and ensure smooth deployment of Drivenets solutions.
  • Install and configure Drivenets products, ensuring optimal performance and customer satisfaction.
  • Maintain services once they are live by measuring and monitoring availability, latency, and overall system health.
  • Engage in and improve the whole lifecycle of services from inception and design through deployment, operation, and refinement.
  • Provide feedback to internal teams such as opening bugs, documenting workarounds, and suggesting improvements.
  • Engage with sales teams and customers to ensure success with major opportunities and deployments.
  • Introduce new products to the Drivenets sales and support teams and to Drivenets customers.
  • Deliver technical training and TOIs for support/sales engineers, partners, and customers.
  • Collaborate on product definition through customer requirement gathering and roadmap planning.

Requirements

  • 5+ years of previous experience deploying and administrating AI/HPC clusters or general-purpose compute systems.
  • 5+ years of hands-on Linux experience (e.g., RHEL, CentOS, Ubuntu) and production infrastructure support (e.g., networking, storage, monitoring, compute, installation, configuration, maintenance, upgrade, retirement).
  • Proficiency in cloud, virtualization, and container technologies.
  • Deep understanding of operating systems, computer networks, and high-performance applications.
  • Hands-on experience with Bash, Python, and configuration management tools (e.g., Ansible).
  • Established record of leading technical initiatives and delivering results.
  • Ability to write extensive technical content (white papers, technical briefs, test reports, etc.) for external audiences with a balance of technical accuracy, strategy, and clear messaging.
  • Ability to travel domestically and internationally up to 20% of the time.

Ways to Stand Out from the Crowd

  • Familiarity with AI-relevant data center infrastructure and networking technologies such as Infiniband, RoCEv2, lossless Ethernet technologies (PFC, ECN), accelerated computing, GPU, DPU.
  • Familiarity with GPU resource scheduling managers (Slurm, Kubernetes).
  • Familiarity with monitoring tools (e.g., Prometheus, Grafana, ELK Stack) and telemetry (gRPC, gNMI, OTLP).
  • Understanding of data center operations fundamentals in networking, cooling, and power.
  • Proven experience with one or more Tier-1 clouds (AWS, Azure, GCP, OCI) or emerging neoclouds, and cloud-native architectures and software.
  • Expertise with parallel filesystems (e.g., Lustre, GPFS, BeeGFS, WekaIO) and high-speed interconnects (InfiniBand, Omni Path, Ethernet).
  • Understanding of AI workload requirements and how they interact with other parts of the system like networking, storage, deep learning frameworks.
  • Knowledge of AI/ML frameworks (e.g., TensorFlow, PyTorch) and associated tooling is an advantage.

More About DriveNets

Enjoy a competitive salary, benefits, and opportunities for career growth.

If your experience is close but doesn’t fulfill all requirements, please apply. DriveNets is on a mission to build a special company comprised of individuals with different backgrounds, perspectives, and experiences.

DriveNets is an equal opportunity employer. We do not discriminate based on race, religion, color, national origin, sexual orientation, gender identity, gender expression, age, status as a protected veteran, status as an individual with a disability, or other applicable legally protected characteristics.

Based in Israel with locations in Romania, the US, and Japan, as well as extended teams, DriveNets operations cover more than 10 countries. With recognition by industry analysts and through numerous industry awards, DriveNets is pushing market momentum, allowing for faster service innovation from the network core to the edge.

Location

    Ra-anana, Israel

Job type

  • Fulltime

Role

Engineering

Keywords

  • AI-infrastructure