GenAI Ops Engineer
-
Location
Washington
-
Sector:
-
Job type:
-
Salary:
Negotiable
-
Contact:
Alyssa Hank
-
Contact email:
a.hank@ioassociates.com
-
Job ref:
BBBH146622_1726532982
-
Consultant:
Alyssa Hank
Location: Remote
Contract Duration: 3 month contract (with possibility of extension)
Job Overview:
We are seeking a skilled GenAI Ops Engineer to join a 3+ month platform support project. The ideal candidate will play a critical role in ensuring the smooth operation of AI/ML models and APIs, providing both user-level and project-level support. You will work in an agile environment alongside a small, collaborative team to maintain and optimize platform operations on AWS SageMaker and Kubernetes.
Key Responsibilities:
- User-Level Support:
- Provide user support, including troubleshooting access issues, responding to user inquiries, and offering education and documentation to ensure effective usage of GenAI tools and platforms.
- Project-Level Support:
- Handle new requests and escalations related to GenAI models and APIs.
- Provide hands-on maintenance of deployed AI/ML models and ensure the platform is functioning optimally.
- Platform Maintenance and Engineering:
- Oversee infrastructure, particularly AWS SageMaker, to ensure model deployments are efficient and reliable.
- Collaborate with platform engineering teams to support the SageMaker Inference, Kubernetes services, and troubleshoot any issues that arise.
- Model and API Management:
- Maintain and optimize the API layer, ensuring fast and reliable access to deployed models.
- Work with TensorRT, TGI, and similar frameworks to manage inference for Large Language Models (LLMs).
Required Skills and Experience:
- AWS SageMaker Inference:
Experience in deploying and managing AI models on AWS SageMaker or a similar ML platform. - Kubernetes Service Layer:
Hands-on experience with Kubernetes, particularly in managing service layers implemented in Golang. - TGI or LLM Frameworks:
Exposure to TensorRT, TGI, or LLM inference frameworks is essential, especially for troubleshooting and optimizing model performance. - Golang:
Experience with Golang is a plus, particularly if you've worked with proxies or backend services in Golang.
Nice to Have:
- Experience working in an agile team environment.
- Experience with troubleshooting application issues at both the platform and application levels.