Back End Tech Lead
CodiumAI
Software Engineering
Tel Aviv-Yafo, Israel
Posted on Nov 19, 2025
Back End Tech Lead
- Engineering
- Tel-Aviv
Description
At Qodo, we're building a multi-agent platform for developers to move fast and confidently across the SDLC. Our multi-agent platform currently provides two main agents: one for code generation via different IDEs and the second for Code Review that lives in your git.
We're seeking an exceptional Backend Tech Lead to partner with our AI Research team in building the infrastructure and capabilities that power our AI-led products. This role is the critical bridge between cutting-edge research and production-grade systems, focusing on MLOps, agent engineering, and scalable ML infrastructure.
Responsbilities:
- Research-to-Production Partnership: Collaborate closely with AI researchers to transform experimental concepts into robust, production-ready systems and infrastructure
- MLOps Infrastructure: Design and build comprehensive ML pipelines including model versioning, experiment tracking, evaluation frameworks, continuous monitoring, and automated deployment systems
- Agent Engineering Infrastructure: Develop scalable frameworks for multi-agent orchestration, workflow management, state persistence, and agent-to-agent communication protocols
- Scalable ML Backend Systems: Architect high-performance infrastructure for serving, vector databases, embeddings pipelines, real-time inference, and distributed reasoning at scale
- Data Pipelines for AI: Design and implement data collection, processing, and storage systems that support training, fine-tuning, and continuous learning workflows
- Observability & Monitoring: Build comprehensive monitoring, logging, and alerting systems specific to AI/ML workloads including latency tracking, token usage, model performance metrics, and failure analysis
- Technical Innovation: Scope and lead technical initiatives that unlock new product capabilities, improve system performance, and reduce operational costs
Our Technical Stack:
- Backend: Python, FastAPI,
- AI/ML: LiteLLM, LangChain/LangGraph, LangSmith
- Data & Vector Stores: PostgreSQL, Redis
- Infrastructure: GCP, GKE, Docker, Kubernetes
Requirements
Must Have:
- 6+ years building high-performing, internet-scale SaaS APIs with deep expertise in Python
- 3+ years working with ML/AI systems in production, including experience with LLM-based systems, MLOps, model deployment, and inference infrastructure
- Deep cloud infrastructure knowledge: Expertise in GCP/AWS, particularly ML-focused services (Vertex AI, SageMaker, Bedrock)
- Research collaboration skills: Proven ability to work effectively with researchers, translating novel ideas into engineered systems while maintaining scientific rigor
- RESTful APIs & microservices: Strong background in designing scalable, maintainable service architectures
- Independent & entrepreneurial mindset: Thrives in fast-paced, research-driven environments with evolving requirements
Bonus Points:
- Vector databases & RAG systems: Experience architecting semantic search, embeddings pipelines, and retrieval-augmented generation systems
- Production LLM experience: Practical work integrating, optimizing, and monitoring LLM-based systems at scale
- Hands-on agent systems experience: Building multi-agent frameworks, orchestration systems, or complex agentic workflows
- Experience building developer tools or code intelligence platforms (static analysis, code understanding, IDE integrations)
- Contributions to open-source projects