Student Name

Omar KhanFollow

Date of Submission

Spring 2024

Supervisor

Dr. Syed Ali Raza, Assistant Professor, Department of Computer Science

Committee Member 1

Dr. Tariq Mahmood, Examiner – I, Institute of Business Administration (IBA), Karachi, Institute of Business Administration (IBA), Karachi

Committee Member 2

Dr. Sajjad Haider, Examiner – II, Institute of Business Administration (IBA), Karachi, Institute of Business Administration (IBA), Karachi

Degree

Master of Science in Data Science

Department

Department of Computer Science

Faculty/ School

School of Mathematics and Computer Science (SMCS)

Keywords

Deep Reinforcement Learning, Neural Networks, Stock Trading, Automated Stock Trading, Finance

Abstract

The equity market is characterized by its inherent volatility and unpredictability, yet it is governed by underlying patterns and structures. The most successful traders are ones who have managed to accumulate small wins over a period, rather than investing in a single stock that rose tremendously. However, in today’s fast-moving economy, it is more difficult for us to place our bets on a 20+ year investment horizon, and it would be more beneficial to the investors to have multiple wins during that period.

Through this study our aim is to leverage algorithms that use various methods of Deep Reinforcement Learning to identify the dynamics of stock movement and attempt to optimize them by altering different parameters with the hope of improving their performance and understanding the sensitivity of each model’s performance to those parameters. We aim to train our model in a manner wherein it learns a medium-term strategy rather than keeping an investment horizon of multiple decades.

To achieve this, we used the stock data from the Dow Jones 30 Index to train our model, using each stock’s closing price and a handful of technical indicators. The data was then used to train our Agent through different models, such as the Advantage Actor-Critic (A2C), the Proximal Policy Optimization (PPO), the Deep Deterministic Policy Gradient (DDPG), the Soft Actor-Critic (SAC), and the Twin Delay Deep Deterministic Policy Gradient (TD3) models, whose training was curtailed to a finite period for each episode. Our objective is to identify the optimal episodic length to achieve the best results.

Through our experiments, we managed to chart the performance of various trained models against the different parameters used to train them. We noticed that clipping the investment horizon to a shorter period resulted in more waning results, whereas longer training episodes led to far more stable and positive results.

Document Type

Restricted Access

Submission Type

Thesis

Share

COinS