Smart PetroQuery (An NLP based query system over a structured database)

Degree

Master of Science in Data Science

Department

Department of Computer Science

Faculty/ School

School of Mathematics and Computer Science (SMCS)

Date of Submission

Fall 2024

Supervisor

Usman Ali, Lecturer, Department of Computer Science, School of Mathematics and Computer Science (SMCS) Institute of Business Administration (IBA), Karachi

Keywords

Natural Language Processing, Query System, AI Chatbot, Petrochemical data, Retrieval- Augmented Generation

Abstract

The project "Smart PetroQuery: An NLP-Based Query System Over a Structured Database" develops an advanced question-answering system leveraging OpenAI's GPT- 3.5-turbo and FAISS for efficient document retrieval, specifically designed for analyzing petrochemical data. The data is stored in .csv format on a local system and through cloud bucket the data is transferred into Google Big Query table. The system processes data stored in Google Big Query, transforming it into text and PDF formats, then an LLM model of Open AI “gpt-3.5 turbo” is used with text embedding and FAISS vector store followed by indexing for optimized retrieval. The LLM generates contextually relevant responses to user queries based on the retrieved documents. A Streamlit-based frontend facilitates user interaction with the AI assistant, providing a simple and intuitive interface. Additionally, integration with Google Drive and local file systems ensures seamless storage and access to the processed operational data and outputs. This solution offers a scalable framework for automated document analysis, with potential applications across various industries, including petrochemicals and research, enabling efficient data-driven decision-making.

Document Type

Restricted Access

Submission Type

Research Project

The full text of this document is only accessible to authorized users.

Share

COinS