JurisRAG
A retrieval-augmented generation system built over Philippine Supreme Court decisions and signed resolutions. Ask questions in natural language — answers are grounded exclusively in actual case excerpts scraped from the SC E-Library.
What's indexed
—
vector embeddings
—
fully indexed
—
of vectors ready
—
index health
The RAG pipeline
Scrape → Chunk → Embed → Store
31,818 Supreme Court decisions were scraped, split into 1,024-token chunks, embedded with BAAI/bge-large-en-v1.5, and stored in Qdrant Cloud. 233,224 vectors totaling 1,024 dimensions each.
Query embedding + vector search
Your question is embedded with a BGE instruction prefix and searched against Qdrant using cosine similarity. The top 20 semantically closest chunks are returned as candidates.
Cross-encoder reranking
A cross-encoder (BAAI/bge-reranker-large) scores each chunk against your question for precise relevance. The top 5 are kept and passed to the LLM.
Grounded answer via Llama 3.1
The retrieved excerpts are injected into a prompt that instructs the model to answer only from the provided case text — no hallucination beyond what the cases say.
RAGAS faithfulness scoring
Evaluated using RAGAS faithfulness metric across 5 representative legal questions. Baseline faithfulness score: 0.509 — meaning roughly half of all claims trace directly to retrieved excerpts.
Built with
Python
RAG pipeline
FastAPI
Backend API
Qdrant Cloud
Vector database
BAAI/bge-large-en-v1.5
Embeddings
Groq · Llama 3.1
LLM inference
Next.js 15
Frontend
JurisRAG is a research and portfolio project. It is not a substitute for legal advice. Answers are generated from indexed case excerpts and may be incomplete or imprecise. Always consult the official SC E-Library for authoritative sources.