MultiModal RAG with LlamaIndex + Qdrant + Local Vision-LLM & Embedding models

Overview

This project is implemented within the framework of LlamaIndex, using multiple custom components to achieve a fully localized multimodal document-QA system without relying on any APIs or remote resources.

Below are the main tools/models used:

PDF parser: SciPDF Parser
RAG Framework: LlamaIndex
Vector DataBase: Qdrant
Vison-LLM: A gguf quantized 7B version of LLaVA
LLM Inference Framework: llama.cpp & llama-cpp-python
Embedding Models:
- BGE models for text embedding and reranking
- Efficient SPLADE models (doc, query) for sparse retrieval
- CLIP for query-to-image retrieval

Environment

WSL2 on Windows 10, Ubuntu 22.04

CUDA Toolkit Version 12.3

Library Installation

SciPDF Parser

Follow the steps from https://github.com/titipata/scipdf_parser/blob/master/README.md.

Install this branch to avoid exceeding the request time limit when dealing with large PDF files.

pip install https://github.com/Virgil-L/scipdf_parser.git

To save memory and computing cost, this project uses a lightweight image of GROBID, run serve_grobid_light.sh to get it.

LlamaIndex

pip install -q llama-index llama-index-embeddings-huggingface

Qdrant

pip install qdrant-client

llama-cpp-python

# Linux and Mac
CMAKE_ARGS="-DLLAMA_CUBLAS=on" pip install llama-cpp-python

Examples

Download the sample pdf data

wget --user-agent "Mozilla" "https://arxiv.org/pdf/2307.09288.pdf" -O "data/paper_PDF/llama2.pdf"
wget --user-agent "Mozilla" "https://arxiv.org/pdf/2310.03744.pdf" -O "data/paper_PDF/llava-vl.pdf"

wget --user-agent "Mozilla" "https://arxiv.org/pdf/1706.03762.pdf" -O "data/paper_PDF/attention.pdf"
wget --user-agent "Mozilla" "https://arxiv.org/pdf/1810.04805.pdf" -O "data/paper_PDF/bert.pdf"

Parse pdf files to extract text and image elemnts

python parse_pdf.py \
    --pdf_folder "./data/paper_PDF" \
    --image_resolution 300 \
    --max_timeout 120

Create Qdrant collections, build and ingest text and image nodes

python build_qdrant_collections.py \
    --pdf_folder "./data/paper_PDF" \
    --storage_path "./qdrant_db" \
    --text_embedding_model "./embedding_models/bge-small-en-v1.5" \
    --sparse_text_embedding_model "./embedding_models/efficient-splade-VI-BT-large-doc" \
    --image_embedding_model "./embedding_models/clip-vit-base-patch32" \
    --chunk_size 384 \
    --chunk_overlap 32

Name		Name	Last commit message	Last commit date
Latest commit History 7 Commits
LLMs		LLMs
data/paper_PDF		data/paper_PDF
embedding_models		embedding_models
qdrant_db		qdrant_db
src		src
README.md		README.md
app.py		app.py
multi-modal_qdrant_retrieval_demo.ipynb		multi-modal_qdrant_retrieval_demo.ipynb
ollama_multimodal_llm.py		ollama_multimodal_llm.py
retrieval_serving.py		retrieval_serving.py
retriever_config.yaml		retriever_config.yaml
serve_grobid_light.sh		serve_grobid_light.sh

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

MultiModal RAG with LlamaIndex + Qdrant + Local Vision-LLM & Embedding models

Overview

Environment

Library Installation

Examples

About

Languages

Virgil-L/Local_MultiModal_RAG_with_llamaindex

Folders and files

Latest commit

History

Repository files navigation

MultiModal RAG with LlamaIndex + Qdrant + Local Vision-LLM & Embedding models

Overview

Environment

Library Installation

Examples

About

Resources

Stars

Watchers

Forks

Languages