About

JurisRAG

A retrieval-augmented generation system built over Philippine Supreme Court decisions and signed resolutions. Ask questions in natural language — answers are grounded exclusively in actual case excerpts scraped from the SC E-Library.

Collection Stats

What's indexed

Live · Qdrant

Chunks indexed

233,224

vector embeddings

Searchable

233,064

fully indexed

Progress

99.9%

of vectors ready

Status

Green

index health

Corpus details

SourceSC E-Library · elibrary.judiciary.gov.ph

CoverageDecisions/Signed Resolutions (1996 – January 2026)

Documents31,818 case files

Chunks233,224 indexed chunks

Chunk size1,024 tokens per chunk

RetrievalHybrid search · Dense (BGE-Large) + Sparse (BM25) via RRF

Embedding modelBAAI/bge-large-en-v1.5

RerankerBAAI/bge-reranker-large

Generation modelLlama 3.1 8B via Groq

Vector similarityCosine

Live · Qdrant

Chunks indexed

—

vector embeddings

Searchable

—

fully indexed

Progress

—

of vectors ready

Status

—

index health

How it works

The RAG pipeline

01 · Ingestion

Scrape → Chunk → Embed → Store

31,818 Supreme Court decisions were scraped, split into 1,024-token chunks, embedded with BAAI/bge-large-en-v1.5, and stored in Qdrant Cloud. 233,224 vectors totaling 1,024 dimensions each.

02 · Retrieval

Query embedding + vector search

Your question is embedded with a BGE instruction prefix and searched against Qdrant using cosine similarity. The top 20 semantically closest chunks are returned as candidates.

03 · Reranking

Cross-encoder reranking

A cross-encoder (BAAI/bge-reranker-large) scores each chunk against your question for precise relevance. The top 5 are kept and passed to the LLM.

04 · Generation

Grounded answer via Llama 3.1

The retrieved excerpts are injected into a prompt that instructs the model to answer only from the provided case text — no hallucination beyond what the cases say.

05 · Evaluation

RAGAS faithfulness scoring

Evaluated using RAGAS faithfulness metric across 5 representative legal questions. Baseline faithfulness score: 0.509 — meaning roughly half of all claims trace directly to retrieved excerpts.

Stack

Built with

Python

RAG pipeline

FastAPI

Backend API

Qdrant Cloud

Vector database

BAAI/bge-large-en-v1.5

Embeddings

Groq · Llama 3.1

LLM inference

Next.js 15

Frontend

JurisRAG is a research and portfolio project. It is not a substitute for legal advice. Answers are generated from indexed case excerpts and may be incomplete or imprecise. Always consult the official SC E-Library for authoritative sources.