Student Name

Ali Zia UddinFollow

Date of Submission

Spring 2024

Supervisor

Dr. Ali Raza, Assistant Professor, Department of Computer Science

Co-Supervisor

Dr. Saurav Sthapit

Committee Member 1

Dr. Farrukh Hasan, Examiner – I, FAST National University

Committee Member 2

Dr. Salman Zafar, Examiner – II, Institute of Business Administration (IBA), Karachi

Degree

Master of Science in Data Science

Department

Department of Computer Science

Faculty/ School

School of Mathematics and Computer Science (SMCS)

Keywords

Reinforcement Learning, Reward Shaping, Neural Network, Policy Learning, Adversarial Attacks, and Data Poisoning

Abstract

This research enhances the robustness of reward-shaping-based reinforcement learning agents against adversarial attacks by investigating a critical vulnerability in the process of reward function generation and deploying a targeted defense mechanism to mitigate this weakness. Reinforcement learning agents are increasingly deployed in critical real-world scenarios where data integrity cannot be guaranteed, making their robustness against adversarial attacks essential for reliable performance. In reward shaping, a common and efficient approach is to learn a reward function from user feedback on sample data. However, this process is vulnerable to adversarial attacks, as ensuring the integrity of the feedback is challenging. Malicious actors can intentionally provide incorrect feedback to corrupt the learned policy. Existing research lacks a comprehensive understanding of the impact of such attacks, and the current methods for reward function design are not robust against data poisoning attacks.

In this work, we first explore how an effective attack mechanism can be designed by injecting noisy data into user feedback provided to the reinforcement learning agent. Secondly, we develop a defense mechanism based on the K-Nearest Neighbors (KNN) algorithm, which protects the reward function learning process from noisy data. Our experiments involved generating an Oracle agent that always provides correct feedback, simulating a perfect user. Subsequently, we systematically corrupted the feedback from the oracle to simulate an attack. The experiments covered scenarios both with and without the reward function and included varying levels of noise in the training data. The Mountain Car domain was used as a testbed.

The results demonstrated that the learned reward function significantly improved the agent's performance. However, as noise levels in the training data increased, the agent's performance degraded, highlighting the impact of data quality in efficient policy learning. Furthermore, the KNN-based defense mechanism detected noisy data with high accuracy across different noise levels, as indicated by a consistently low number of noisy data points predicted as clean. Our findings underscore the importance of analyzing potential vulnerabilities in the reinforcement learning process. Moreover, a straightforward

technique like KNN can effectively detect and mitigate noisy data, further improving the system's robustness.

Document Type

Restricted Access

Submission Type

Thesis

Share

COinS