Smart PetroQuery (An NLP based query system over a structured database)
Degree
Master of Science in Data Science
Department
Department of Computer Science
Faculty/ School
School of Mathematics and Computer Science (SMCS)
Date of Submission
Fall 2024
Supervisor
Usman Ali, Lecturer, Department of Computer Science, School of Mathematics and Computer Science (SMCS) Institute of Business Administration (IBA), Karachi
Keywords
Natural Language Processing, Query System, AI Chatbot, Petrochemical data, Retrieval- Augmented Generation
Abstract
The project "Smart PetroQuery: An NLP-Based Query System Over a Structured Database" develops an advanced question-answering system leveraging OpenAI's GPT- 3.5-turbo and FAISS for efficient document retrieval, specifically designed for analyzing petrochemical data. The data is stored in .csv format on a local system and through cloud bucket the data is transferred into Google Big Query table. The system processes data stored in Google Big Query, transforming it into text and PDF formats, then an LLM model of Open AI “gpt-3.5 turbo” is used with text embedding and FAISS vector store followed by indexing for optimized retrieval. The LLM generates contextually relevant responses to user queries based on the retrieved documents. A Streamlit-based frontend facilitates user interaction with the AI assistant, providing a simple and intuitive interface. Additionally, integration with Google Drive and local file systems ensures seamless storage and access to the processed operational data and outputs. This solution offers a scalable framework for automated document analysis, with potential applications across various industries, including petrochemicals and research, enabling efficient data-driven decision-making.
Document Type
Restricted Access
Submission Type
Research Project
Recommended Citation
Faizan, F. (2024). Smart PetroQuery (An NLP based query system over a structured database) (Unpublished graduate research project). Institute of Business Administration, Pakistan. Retrieved from https://ir.iba.edu.pk/research-projects-msds/41
The full text of this document is only accessible to authorized users.