Member of Technical Staff (Software Engineer)
Cerebras Systems is a leader in AI technology, known for building the world's largest AI chip. They are seeking a Member of Technical Staff (Software Engineer) to implement infrastructure for high-performance inference services and collaborate with cross-functional teams to enhance the inference pipeline.
Responsibilities
- Implement infrastructure to support high-performance, low-latency inference service
- Deploy and configure Kubernetes services to ensure scalability and reliability of inference workloads
- Optimize resource allocation and auto-scaling policies to handle variable inference demand while minimizing operational costs
- Integrate inference services with containerized environments using Docker and Kubernetes for orchestration
- Ensure high availability and fault tolerance by implementing multi-region deployments and disaster recovery strategies
- Develop Python-based scripts and APIs to streamline data preprocessing, inference execution, and post-processing for real-time inference tasks
- Collaborate with machine learning engineers to validate inference accuracy and performance against functional and latency requirements
- Triage and resolve defects in the service by analyzing logs, metrics, and distributed traces
- Debug issues related to model deployment, container orchestration, or networking configurations, documenting steps to reproduce and root-cause defects
- Collaborate with cross-functional teams to address performance regressions, scalability issues, or integration failures in the inference pipeline
- Develop automated scripts to detect and mitigate common failure modes, improving system reliability
- Author detailed technical documentation for infrastructure configurations, inference workflows, and APIs, ensuring clarity for internal teams and external customers
- Work with product management and user experience teams to define requirements for inference service interfaces, including configuration, monitoring, and event logging
- Document and track defects, enhancements, and release notes using tools like Jira and Git, ensuring version control and traceability
- Participate in release planning and prioritization discussions to align infrastructure development with customer needs and business objectives
Skills
- Master's degree or foreign equivalent degree in Computer Science, or a related field and 1 year of experience as Software Developer, Student/Intern (Software Developer), Member of Technical Staff (Software Engineer), Software Engineer, or a related occupation required
- Docker and Kubernetes
- Java or C++
- ActiveMQ and Kafka
- Python or Groovy
- JavaScript or TypeScript
- Linux
- SQL, OracleDB, and Redis
- Git
Benefits
- Telecommuting permitted
- Job stability with startup vitality
- Simple, non-corporate work culture that respects individual beliefs
- Continuous learning, growth and support of those around them
Company Overview
Company H1B Sponsorship