DarkrAI: a Pareto ε-greedy policy

General information

This project explores the use of bio-inspired techniques to enhance reinforcement learning (RL) performance and accelerate the learning process. Specifically, we focus on training RL agents for Pokémon battles, a complex domain with numerous possible outcomes in each turn.

Approach

We’ve devised a novel approach that combines various elements:

Damage Calculator: To accurately assess the impact of Pokémon moves, we’ve set up a node server interfacing with a damage calculator API server. This infrastructure provides precise data on move effects.
NSGA-II Optimization: We employ the NSGA-II genetic algorithm to solve a multi-objective optimization problem. NSGA-II identifies Pareto optimal moves that Pokémon can make in a given turn, enhancing decision-making.
Artificial Neural Network (ANN): We’ve designed an ANN, whose weights are learned through RL. This network aids in optimizing Pokémon actions.
Training Environment: We utilize Pokémon Showdown, an online battle simulator, as our training environment. The poke-env Python library facilitates communication with the simulator and enables the development of custom trainable agents.
Data from Pikalytics: To address uncertainties about the opponent’s Pokémon, we incorporate data from Pikalytics, offering competitive analysis and team-building insights.

Results

Our experiments demonstrate promising results. The ParetoPlayer, an agent trained with our approach, shows the potential to improve training by providing higher rewards. However, in situations with a small search space and a single win condition, Player outperforms ParetoPlayer.

We acknowledge that further optimization, exploration of hyperparameters, and network topology adjustments could enhance results. Challenges include the time-consuming nature of NSGA-II, which currently relies on CPU computation, and addressing forced switches, a separate network issue.

For comprehensive details, methodology, implementation choices, and results, refer to our report and presentation slides.

Pokémon Battle

Our statistical tests include:

normality.R: Tests the normality of reward differences between two RL runs.
regression.R (Obsolete): Produces a regression line from average episode rewards.
10_runs_agg.R: Generates a regression line and performs a Kolmogorov-Smirnov test on average episode rewards.
significance_if_not_normal.R: Conducts a Wilcoxon rank-sum test on episode reward data.
significance_if_normal.R: Runs a Student’s t-test on episode reward data.

For more information on our statistical tests, consult the analysis README.

DarkrAI: a Pareto ε-greedy policy

General information

Approach

Results

Contributors