About Sanas
Sanas is revolutionizing communication with the world’s first real-time algorithm designed to modulate accents, eliminate background noise, and enhance speech clarity. Founded in 2020 by seasoned startup founders, Sanas has grown into a 200-person team and secured over $100 million in funding from top investors like Insight Partners, Google Ventures, Quadrille Capital, General Catalyst, and Quiet Capital.
Our technology has been adopted by several Fortune 100 companies, and our mission is to redefine the future of speech communication.
Position: Senior Applied Machine Learning Scientist
Key Responsibilities
- Architect, train, and optimize large-scale speech AI models, including speech-to-speech, speech restoration, and speech translation
- Leverage self-supervised learning, contrastive learning, and transformer-based architectures (e.g., wav2vec, Whisper, GPT-style models) to improve model accuracy and adaptability
- Develop efficient model distillation and quantization strategies for low-latency inference
- Innovate on cross-lingual and multilingual speech processing using large-scale pretraining and fine-tuning
- Curate and scale diverse, multilingual, and multimodal datasets for robust training
- Apply active learning, domain adaptation, and synthetic data generation to overcome data limitations
- Lead data quality assessment, augmentation, and curation for large-scale pipelines
- Implement distributed training strategies using cloud and on-prem GPU clusters
- Design scalable model evaluation frameworks (WER, MOS, latency tracking)
- Optimize real-time inference pipelines for high-throughput, low-latency speech processing
- Stay up-to-date with foundational models, generative AI, and speech modeling advancements
- Collaborate with academic and open-source communities to foster innovation
- Work closely with MLOps, Data Engineering, and Product teams for system deployment
- Integrate foundational models with edge devices, real-time applications, and cloud platforms
- Translate research into production-ready models for real-world communication
Must Have Qualifications
- Bachelor’s, Master’s, or Ph.D. in Computer Science, Electrical Engineering, or related field with a focus on Machine Learning, Deep Learning, or Speech Processing
- 5+ years of industry experience developing and implementing systems such as:
- Speech-to-text (ASR)
- Text-to-speech (TTS)
- Voice conversion and speech enhancement
- Speech translation and multimodal learning
- Strong proficiency in transformer-based architectures (e.g., wav2vec 2.0, Whisper, GPT, BERT)
- Expertise in deep learning frameworks (PyTorch, TensorFlow) and large-scale training
- Experience with distributed training and optimization on multi-GPU clusters
- Solid understanding of self-supervised learning, contrastive learning, and generative speech modeling
- Hands-on experience with cloud platforms (AWS, GCP, Azure) and model deployment
Preferred Experience
- Developing multimodal AI models integrating speech, text, and vision
- Publishing in top-tier AI/ML conferences
- Optimizing large models for real-time edge inference
- Applying MLOps best practices for production model deployment and monitoring
- Familiarity with open-source ASR/TTS toolkits