About

JurisRAG

A retrieval-augmented generation system built over Philippine Supreme Court decisions and signed resolutions. Ask questions in natural language — answers are grounded exclusively in actual case excerpts scraped from the SC E-Library.

Collection Stats

What's indexed

Live · Qdrant
Chunks indexed

vector embeddings

Searchable

fully indexed

Progress

of vectors ready

Status

index health

How it works

The RAG pipeline

01 · Ingestion

Scrape → Chunk → Embed → Store

31,818 Supreme Court decisions were scraped, split into 1,024-token chunks, embedded with BAAI/bge-large-en-v1.5, and stored in Qdrant Cloud. 233,224 vectors totaling 1,024 dimensions each.

02 · Retrieval

Query embedding + vector search

Your question is embedded with a BGE instruction prefix and searched against Qdrant using cosine similarity. The top 20 semantically closest chunks are returned as candidates.

03 · Reranking

Cross-encoder reranking

A cross-encoder (BAAI/bge-reranker-large) scores each chunk against your question for precise relevance. The top 5 are kept and passed to the LLM.

04 · Generation

Grounded answer via Llama 3.1

The retrieved excerpts are injected into a prompt that instructs the model to answer only from the provided case text — no hallucination beyond what the cases say.

05 · Evaluation

RAGAS faithfulness scoring

Evaluated using RAGAS faithfulness metric across 5 representative legal questions. Baseline faithfulness score: 0.509 — meaning roughly half of all claims trace directly to retrieved excerpts.

Stack

Built with

Python

RAG pipeline

FastAPI

Backend API

Qdrant Cloud

Vector database

BAAI/bge-large-en-v1.5

Embeddings

Groq · Llama 3.1

LLM inference

Next.js 15

Frontend

JurisRAG is a research and portfolio project. It is not a substitute for legal advice. Answers are generated from indexed case excerpts and may be incomplete or imprecise. Always consult the official SC E-Library for authoritative sources.