Degree

Bachelor of Science (Computer Science)

Department

Department of Computer Science

School

School of Mathematics and Computer Science (SMCS)

Advisor

Dr. Faisal Iradat, Assistant Professor, School of Mathematics & Computer Science (SMCS)

Co-Advisor

Mr. Arbaz Khan, Folio3

Keywords

Yield Prediction, Satellite Imagery, Deep Learning, Convolutional Neural Networks

Abstract

This project presents a satellite-based crop yield prediction system tailored for Pakistan’s key crops. Leveraging multi-source satellite imagery from Landsat and Sentinel-2 combined with historical provincial yield data from AMIS, the system predicts crop yields using a custom convolutional neural network (CNN). The model incorporates feature engineering, data preprocessing, and hyperparameter tuning to improve accuracy. Experimental results demonstrate promising predictive performance with an error rate as low as 10.35%. This work addresses agricultural productivity challenges by enabling data-driven decision making for policymakers and stakeholders.

Tools and Technologies Used

  • Programming Languages: Python, JavaScript (React/Next.js)

  • Machine Learning Framework: PyTorch

  • Web Frameworks: FastAPI (backend), Next.js (frontend)

  • Databases: PostgreSQL

  • Satellite Data Processing: Google Earth Engine API

  • Libraries and Packages: NumPy, Pillow, SQLAlchemy, Psycopg2

  • Testing Frameworks: Pytest (backend), Jest and Testing Library React (frontend)

  • Data Handling: OpenCV (image preprocessing)

  • Version Control: GitHub (repository hosting)

  • Others: Mapbox GL (mapping), Material UI / Tailwind CSS (UI styling)

Methodology

The methodology of this project addresses the challenge of limited and fragmented crop yield data in Pakistan by utilizing historical agricultural statistics for rice and maize from AMIS.pk, covering provincial and district-level data from 2007 to 2022 across Punjab, Sindh, Khyber Pakhtunkhwa, and Balochistan. This dataset, detailing cultivated area and production output, was cleaned and enriched using proxy-based imputation to fill missing data, leveraging geographically and climatically similar districts to maintain agronomic validity. Yield was calculated in kilograms per hectare from the raw data to enable consistent spatial and temporal comparisons. The project employed the Harmonized Landsat Sentinel-2 (HLS) dataset, which harmonizes observations from Landsat 8/9 and Sentinel-2 satellites, providing 30-meter spatial resolution and near-daily temporal coverage with eleven spectral bands spanning visible to thermal infrared regions. Additional quality assessment and angle bands supported atmospheric correction and data harmonization. Satellite imagery preprocessing and feature extraction were facilitated by the Google Earth Engine API, which enabled cloud masking, vegetation index calculations, and large-scale geospatial data processing. The enriched and structured dataset included temporal features capturing year-on-year yield changes and spatial identifiers by province and district, forming the input for training a convolutional neural network model designed to learn spatial-temporal patterns for accurate crop yield forecasting across Pakistan’s diverse agricultural regions.

Document Type

Restricted Access

Submission Type

BSCS Final Year Project

Share

COinS