Mid QA - Automated Testing & AI Validation

Overview

This organization is a global managed services provider delivering end-to-end IT and business solutions to enterprises. Its offerings include consulting, custom application development, AI, machine learning, data science, and technology operations. The company was formed through the merger of two established entities and focuses on leveraging advanced technologies—especially AI—to drive business outcomes and long-term value for clients.

It serves a diverse range of industries and emphasizes strong partnerships, adaptability, and delivering ethical and socially responsible solutions.

Job Description

  • 3-5 years QA experience with growing expertise in AI systems testing
  • Proficiency in Python test frameworks (Pytest) and LLM evaluation basics
  • Hands-on experience with LLM observability and tracing tools (LangFuse, LangSmith, or similar)
  • Understanding of core LLM evaluation concepts:
    • Intent classification and semantic similarity testing
    • Output consistency and regression testing for prompt changes
    • Basic hallucination and factual accuracy checks
    • Response quality scoring using automated evaluators
  • Familiarity with LLM-as-a-Judge evaluation patterns for assessing generation quality
  • API testing skills for validating LLM integrations and multi-agent workflows
  • Experience testing conversational AI applications (chatbots, assistants)
  • Knowledge of Kubernetes-based application health checks and smoke testing
  • GitHub Actions for CI/CD test execution
  • Strong documentation skills for test cases, evaluation criteria, and defect reporting
  • Eagerness to learn advanced AI evaluation methodologies and contribute to evaluation framework development


Additional Details:

Cultural Fit:

  • High autonomy and self-direction required across all roles
  • Strong written communication in English (async-first team)
  • Comfortable with ambiguity and iterative development processes

Skills & Requirements

Tech Stack Overview:

  • Frontend: OpenWebUI (Python/Svelte)
  • AI: Azure OpenAI, LiteLLM proxy, multi-agent frameworks (LangGraph, Microsoft Agent Framework)
  • Infrastructure: Azure (AKS, Key Vault, PostgreSQL), Terraform, Helm
  • Integrations: REST APIs, MCP protocol, workflow automation tools
  • Evaluation/Observability: LangFuse, LLM-as-a-Judge frameworks, custom evaluation pipelines

 

Ideal Candidate Profiles:

  • Production LLM experience with real user traffic, not just experiments and internal projects
  • End-to-end ownership: can take "implement X integration" and deliver without hand-holding
  • Deep understanding of system integration patterns
  • Self-starters who translate business requirements into technical solutions autonomously
  • QA candidates who grasp probabilistic outputs and semantic testing over rigid assertions

Join Our Community

Let us know the skills you need and we'll find the best talent for you