| Issue |
ITM Web Conf.
Volume 80, 2025
2025 2nd International Conference on Advanced Computer Applications and Artificial Intelligence (ACAAI 2025)
|
|
|---|---|---|
| Article Number | 01034 | |
| Number of page(s) | 5 | |
| Section | Machine Learning & Deep Learning Algorithms | |
| DOI | https://doi.org/10.1051/itmconf/20258001034 | |
| Published online | 16 December 2025 | |
An Investigation on RAG Question-Answering System Implementation
College of Liberal Arts & Sciences, University of Illinois Urbana-Champaign, Champaign, the United States
This paper presents the design and implementation of a Retrieval-Augmented Generation (RAG) question-answering system based on a Streamlit web application. The system allows users to upload one or more PDF files and ask questions in natural language. It retrieves and indexes PDF content using FAISS for semantic vector search and BM25 for keyword-based retrieval, combining both methods to improve accuracy. A cross-encoder is used to rerank the retrieved results, ensuring that the most relevant passages are used to generate the final answer. The OpenAI language model then produces short and clear responses grounded in the retrieved context. The system also supports notes generation, multilingual interfaces in English and Chinese, and memory for continuous dialogue. Experiments conducted on academic PDFs show that hybrid retrieval with reranking performs better than single retrieval methods, giving more precise and meaningful answers. The Streamlit interface and Cloudflare tunnel make the system easy to deploy and share. This project demonstrates how RAG technology can effectively extract information from complex documents and provide practical tools for education, research, and enterprise applications.
© The Authors, published by EDP Sciences, 2025
This is an Open Access article distributed under the terms of the Creative Commons Attribution License 4.0, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
Current usage metrics show cumulative count of Article Views (full-text article views including HTML views, PDF and ePub downloads, according to the available data) and Abstracts Views on Vision4Press platform.
Data correspond to usage on the plateform after 2015. The current usage metrics is available 48-96 hours after online publication and is updated daily on week days.
Initial download of the metrics may take a while.

