Overview
This organization is a global managed services provider delivering end-to-end IT and business solutions to enterprises. Its offerings include consulting, custom application development, AI, machine learning, data science, and technology operations. The company was formed through the merger of two established entities and focuses on leveraging advanced technologies—especially AI—to drive business outcomes and long-term value for clients.
It serves a diverse range of industries and emphasizes strong partnerships, adaptability, and delivering ethical and socially responsible solutions.
Job Description
- 3-5 years QA experience with growing expertise in AI systems testing
- Proficiency in Python test frameworks (Pytest) and LLM evaluation basics
- Hands-on experience with LLM observability and tracing tools (LangFuse, LangSmith, or similar)
- Understanding of core LLM evaluation concepts:
-
- Intent classification and semantic similarity testing
- Output consistency and regression testing for prompt changes
- Basic hallucination and factual accuracy checks
- Response quality scoring using automated evaluators
- Familiarity with LLM-as-a-Judge evaluation patterns for assessing generation quality
- API testing skills for validating LLM integrations and multi-agent workflows
- Experience testing conversational AI applications (chatbots, assistants)
- Knowledge of Kubernetes-based application health checks and smoke testing
- GitHub Actions for CI/CD test execution
- Strong documentation skills for test cases, evaluation criteria, and defect reporting
- Eagerness to learn advanced AI evaluation methodologies and contribute to evaluation framework development
Additional Details:
Cultural Fit:
- High autonomy and self-direction required across all roles
- Strong written communication in English (async-first team)
- Comfortable with ambiguity and iterative development processes
Skills & Requirements
Tech Stack Overview:
- Frontend: OpenWebUI (Python/Svelte)
- AI: Azure OpenAI, LiteLLM proxy, multi-agent frameworks (LangGraph, Microsoft Agent Framework)
- Infrastructure: Azure (AKS, Key Vault, PostgreSQL), Terraform, Helm
- Integrations: REST APIs, MCP protocol, workflow automation tools
- Evaluation/Observability: LangFuse, LLM-as-a-Judge frameworks, custom evaluation pipelines
Ideal Candidate Profiles: