AI-Powered Resume Screening Engine

Enterprise Full-Stack RAG Pipeline for High-Performance Automated Talent Acquisition.

Overview

I built this end-to-end recruitment platform to solve a major bottleneck in talent acquisition: the manual processing and biased screening of high-volume candidate applications. The system automates the entire resume lifecycle—from ingestion via direct Gmail integration or manual upload to intelligent scoring against complex job descriptions. I designed a Retrieval-Augmented Generation (RAG) pipeline that transforms unstructured PDF and DOCX files into searchable semantic vectors, allowing recruiters to find the 'best fit' based on intent rather than just keyword density. What makes this project technically interesting is the hybrid architecture I implemented. I used local transformer models for vectorization to ensure privacy and efficiency, while leveraging cloud-scale LLMs for the analytical heavy lifting of candidate evaluation. By integrating the Google Gmail API, I created a 'hands-free' workflow where resumes sent to a recruiter's inbox are automatically parsed, indexed, and ranked.

Problem

Recruiters are often overwhelmed by 'volume hiring,' where hundreds of applications arrive in disparate, disorganized formats. This leads to 'recruiter fatigue,' where top-tier talent is frequently missed due to shallow keyword filtering or manual oversight. Core challenges included: • Unstructured Multi-Format Ingestion: Extracting clean, semantically meaningful text from messy PDF and DOCX layouts without losing context. • Context-Aware Ranking: Existing Applicant Tracking Systems (ATS) often fail at semantic matching (e.g., missing a 'Software Engineer' because the resume says 'Full-Stack Developer'). • Manual Data Entry Bottlenecks: The friction of manually downloading and uploading resumes from emails into a database.

Constraints

•Data Privacy & Cost: The system needed to minimize external API costs and keep sensitive candidate data secure, leading to the choice of local embedding generation.
•API Quotas: The system had to respect Google OAuth and HuggingFace rate limits while maintaining a responsive, 'instant-search' user experience.
•Transactional Integrity: Balancing the high-speed, non-relational nature of vector search (Pinecone) with the strict consistency required for candidate records (PostgreSQL).

Solution

I implemented a robust RAG architecture using Node.js and LangChain. Instead of relying on expensive cloud embedding APIs, I used @xenova/transformers to run the all-mpnet-base-v2 model locally within the Node runtime, generating 768-dimensional vectors for every resume chunk. For the 'screening' engine, I utilized the Mistral-7B-Instruct model via HuggingFace’s inference API, providing a high-reasoning output at zero infrastructure cost. To ensure the system remains production-ready, I built a custom DocumentProcessor utility that handles recursive character chunking (1000 chars with 20% overlap).

Architecture

•frontend:React 19, Tailwind CSS 4, Vite, Axios, React Router.

•backend:Node.js, Express, LangChain, Multer.

•database:PostgreSQL (Transactional), Pinecone (Vector Store).

•infrastructure:HuggingFace Inference API (Mistral LLM), Google Cloud OAuth (Gmail API).

•libraries:

pdf-parse
mammoth
better-auth
papaparse

Trade-offs

Local Embeddings over OpenAI API: Chose local CPU-bound embedding generation with Xenova to eliminate API costs and improve privacy. Mistral 7B vs. GPT-4: Selected Mistral via HuggingFace for speed and cost-efficiency, sacrificing slight edge-case reasoning depth for a more scalable free-tier operation. Synchronous Scoring over Queuing: Chose synchronous API calls for the initial MVP to reduce architecture complexity, which may lead to slower UI feedback during massive batch uploads. Metadata Mirroring: Stored core metadata in both PostgreSQL and Pinecone to simplify data joins.

Learnings

The 'Overlapping Chunk' Strategy: I learned that 20% chunk overlap is essential to prevent cutting off critical context at a chunk boundary. OAuth Resilience: Implementing the Gmail integration taught me how to handle token rotation and the security nuances of managing persistent Google Cloud credentials. Semantic vs. Keyword Hybrid: LLMs occasionally miss obvious matches, so I implemented a keyword-based fallback utility as a 'safety net.' Local AI in Node.js: Successfully running transformer models on the CPU side of Node.js proved that AI features don't always require expensive GPU-based infrastructure.

Future Roadmap

•Background Job Execution: Implement BullMQ to offload Gmail fetching and LLM scoring to worker threads.
•Adaptive Scoring Templates: Allow recruiters to define custom 'Weighting Rules' to customize the AI's grading rubric per job.
•Human-in-the-loop Training: Build an 'AI Feedback' loop where recruiters can correct scores.

Summary of Work

RAG Pipeline Development
Architected a multi-stage RAG pipeline using Pinecone and local transformer models.
Gmail Automation Engine
Integrated Google Cloud OAuth2 to monitor and parse recruiter inboxes in real-time.
Semantic Scoring Logic
Developed custom LLM prompts for scoring candidates across technical and cultural dimensions.
Local AI Optimization
Configured @xenova/transformers for CPU-optimized inference within a Node.js environment.