Degree

Bachelor of Science (Computer Science)

Department

Department of Computer Science

School

School of Mathematics and Computer Science (SMCS)

Advisor

Dr. Imran Rauf, Assistant Professor & Program Coordinator BS(CS) and PhD (CS) Programs

Co-Advisor

Saad Mughal - CTO AlphaVenture

Keywords

Vision Language Models, Design Automation, Visual Quality Assurance

Abstract

AI For Visual QA is an AI powered design quality assurance platform that automates the detection of visual discrepancies between Figma design mockups and live web implementations. Design drift, the gradual accumulation of undetected layout, style, and content deviations between intended designs and deployed interfaces, erodes brand fidelity and creates a slow, error prone feedback cycle for development teams. The platform addresses this by providing a automated QA workflow: a custom Figma plugin exports design frame data to the system, which captures live website screenshots, performs a multi layer comparative analysis using vision language models, and produces a structured report with issues classified by severity, category, and confidence score. Beyond detection, an autonomous code fixing agent locates the source of each discrepancy, applies targeted changes, and submits a GitHub pull request, enabling complete resolution without manual developer involvement. Visual QA delivers a scalable, production ready framework that reduces the cost of maintaining design fidelity across iterative software development.

Tools and Technologies Used

Frontend: React 19, Vite, Laravel Echo, Pusher.js

Backend: Laravel 10, PHP 8.2

Database: MySQL

Real-time & WebSocket: Laravel Reverb

Browser Automation: Playwright, Chromium

AI & Vision Models: Claude Sonnet 4.6, Claude

Sonnet 4.5

API & Integration: OpenRouter API, GitHub REST API

Figma Plugin: TypeScript, esbuild

Methodology

The system is structured around a four phase sequential job pipeline. In the first phase, a large language model processes exported Figma JSON to identify and extract named design sections. The second phase uses Playwright to launch a headless Chromium browser and capture screenshots of the target website segmented to match each section. The third phase runs a two layer VLM analysis per section pair: Layer A evaluates structural and content fidelity using screenshots alongside extracted HTML and Figma JSON, while Layer B assesses visual style including typography, colors, and spacing using computed CSS and Figma JSON. A deduplication call then removes cross layer duplicate findings before the report is compiled. The optional fourth phase runs an autonomous agent loop that iterates over flagged issues, using file system and code search tools to locate and fix responsible code, with GitHub integration to commit changes and open a pull request. Each phase broadcasts real time progress to the frontend via WebSocket.

Document Type

Restricted Access

Submission Type

BSCS Final Year Project

Creative Commons License

Creative Commons Attribution 4.0 International License
This work is licensed under a Creative Commons Attribution 4.0 International License.

Share

COinS