At Qodo, we're building a multi-agent platform for developers to move fast and confidently across the SDLC. Our multi-agent platform currently provides two main agents: one for code generation via different IDEs and the second for Code Review that lives in your git.

We're seeking an exceptional Backend Tech Lead to partner with our AI Research team in building the infrastructure and capabilities that power our AI-led products. This role is the critical bridge between cutting-edge research and production-grade systems, focusing on MLOps, agent engineering, and scalable ML infrastructure.

Responsbilities:

Research-to-Production Partnership: Collaborate closely with AI researchers to transform experimental concepts into robust, production-ready systems and infrastructure
MLOps Infrastructure: Design and build comprehensive ML pipelines including model versioning, experiment tracking, evaluation frameworks, continuous monitoring, and automated deployment systems
Agent Engineering Infrastructure: Develop scalable frameworks for multi-agent orchestration, workflow management, state persistence, and agent-to-agent communication protocols
Scalable ML Backend Systems: Architect high-performance infrastructure for serving, vector databases, embeddings pipelines, real-time inference, and distributed reasoning at scale
Data Pipelines for AI: Design and implement data collection, processing, and storage systems that support training, fine-tuning, and continuous learning workflows
Observability & Monitoring: Build comprehensive monitoring, logging, and alerting systems specific to AI/ML workloads including latency tracking, token usage, model performance metrics, and failure analysis
Technical Innovation: Scope and lead technical initiatives that unlock new product capabilities, improve system performance, and reduce operational costs

Our Technical Stack:

Backend: Python, FastAPI,
AI/ML: LiteLLM, LangChain/LangGraph, LangSmith
Data & Vector Stores: PostgreSQL, Redis
Infrastructure: GCP, GKE, Docker, Kubernetes

Requirements

Must Have:

6+ years building high-performing, internet-scale SaaS APIs with deep expertise in Python
3+ years working with ML/AI systems in production, including experience with LLM-based systems, MLOps, model deployment, and inference infrastructure
Deep cloud infrastructure knowledge: Expertise in GCP/AWS, particularly ML-focused services (Vertex AI, SageMaker, Bedrock)
Research collaboration skills: Proven ability to work effectively with researchers, translating novel ideas into engineered systems while maintaining scientific rigor
RESTful APIs & microservices: Strong background in designing scalable, maintainable service architectures
Independent & entrepreneurial mindset: Thrives in fast-paced, research-driven environments with evolving requirements

Bonus Points:

Vector databases & RAG systems: Experience architecting semantic search, embeddings pipelines, and retrieval-augmented generation systems
Production LLM experience: Practical work integrating, optimizing, and monitoring LLM-based systems at scale
Hands-on agent systems experience: Building multi-agent frameworks, orchestration systems, or complex agentic workflows
Experience building developer tools or code intelligence platforms (static analysis, code understanding, IDE integrations)
Contributions to open-source projects

Apply now

See more open positions at CodiumAI

Privacy policy Cookie policy