GenAI Engineer

Overview

We are seeking a GenAI Engineer to design, develop, and deploy Generative AI solutions that enhance business workflows and user experiences. The ideal candidate will have strong expertise in LLMs (Large Language Models), prompt engineering, and integration of AI services into scalable applications.

Job Description

Key Responsibilities

Model Integration: Implement/fine-tune LLMs; build APIs/microservices for GenAI features.

Prompt Engineering: Design, optimize, and evaluate prompts for safety and accuracy.

RAG (Retrieval-Augmented Generation): Develop pipelines for document ingestion, vector embeddings, and semantic search.

App Dev: Integrate GenAI into web/mobile apps using FastAPI, Streamlit, or React.

Optimization: Monitor token usage, latency, and inference costs.

Safety: Implement moderation, bias detection, and responsible AI guidelines.

Required Skills

Python (FastAPI, Flask, Django), LLM APIs (OpenAI, Azure), Vector DBs (Pinecone, Weaviate, FAISS).

Cloud (AWS/Azure/GCP), Docker/K8s, ML fundamentals (embeddings, tokenization).

Real-time AI (SSE/WebSockets).

Preferred Skills

LangChain, LlamaIndex, Image models (Stable Diffusion), MLOps, CI/CD.

Technical Deep-Dive: Vector Embeddings

Since the JD specifically asks for knowledge of embeddings and vector databases, your engineers should be prepared to answer the following:

  1. Conceptual Understanding.

    What are they? They are high-dimensional numerical representations of data (text, images, audio). Unlike keyword search, embeddings capture semantic meaning.

    Dimensionality: Be familiar with common sizes (e.g., OpenAI’s text-embedding-3-small is 1536-dimensional).
    Distance Metrics: Know when to use Cosine Similarity (directional similarity) vs. Euclidean Distance(magnitude-based) vs. Dot Product.

    2. Implementation Challenges

    Chunking: How to break a 100-page PDF into chunks so the embedding captures context without losing detail.
    Normalization: Why we normalize vectors to unit length before storing them (crucial for Cosine Similarity performance).

    Matryoshka Embeddings: (Advanced 2026 topic) Being able to explain how to shorten vectors (e.g., from 3072 to 256) without losing significant accuracy to save on storage costs.

    Suggested Preparation Topics

    Pillar 1: The RAG Pipeline
    Indexing: The flow from Document -> Chunking -> Embedding -> Vector DB.
    Retrieval: Explain Top-K retrieval and how to use "Re-ranking" models (like Cohere Rerank) to improve the quality of the top results.

    Pillar 2: Engineering (The "Developer" part)
    FastAPI: Be ready to code a basic endpoint that takes a user query and returns a streamed response using StreamingResponse.

    Streaming (SSE): Explain why we use SSE for LLMs (to reduce "perceived latency" for the user).

    Pillar 3: Evaluation & Operations

    LLM-as-a-Judge: Using a stronger model (GPT-4o) to grade the outputs of a smaller model.

    Token Management: How to implement a "sliding window" or "summary-based memory" to keep context without hitting token limits or high costs.

Skills & Requirements

Python, FastAPI, Flask, Django, Llm Apis, Openai, Azure Openai, Prompt Engineering, Retrieval Augmented Generation, Vector Embeddings, Vector Databases, Pinecone, Weaviate, Faiss, Semantic Search, Cloud Computing, Aws, Azure, Gcp, Docker, Kubernetes, Microservices, Api Development, Ml Fundamentals, Tokenization, Real Time Ai, Server Sent Events, Websockets, Langchain, Llamaindex, MLOps, Ci Cd, Genai Integration, Application Development, Model Monitoring, Cost Optimization, Ai Safety, Bias Detection

Apply Now

Join Our Community

Let us know the skills you need and we'll find the best talent for you