DarkrAI: a Pareto ε-greedy policy
General information
This project explores the use of bio-inspired techniques to enhance reinforcement learning (RL) performance and accelerate the learning process. Specifically, we focus on training RL agents for Pokémon battles, a complex domain with numerous possible outcomes in each turn.
Approach
We’ve devised a novel approach that combines various elements:
-
Damage Calculator: To accurately assess the impact of Pokémon moves, we’ve set up a node server interfacing with a damage calculator API server. This infrastructure provides precise data on move effects.
-
NSGA-II Optimization: We employ the NSGA-II genetic algorithm to solve a multi-objective optimization problem. NSGA-II identifies Pareto optimal moves that Pokémon can make in a given turn, enhancing decision-making.
-
Artificial Neural Network (ANN): We’ve designed an ANN, whose weights are learned through RL. This network aids in optimizing Pokémon actions.
-
Training Environment: We utilize Pokémon Showdown, an online battle simulator, as our training environment. The
poke-env
Python library facilitates communication with the simulator and enables the development of custom trainable agents. -
Data from Pikalytics: To address uncertainties about the opponent’s Pokémon, we incorporate data from Pikalytics, offering competitive analysis and team-building insights.
Results
Our experiments demonstrate promising results. The ParetoPlayer, an agent trained with our approach, shows the potential to improve training by providing higher rewards. However, in situations with a small search space and a single win condition, Player outperforms ParetoPlayer.
We acknowledge that further optimization, exploration of hyperparameters, and network topology adjustments could enhance results. Challenges include the time-consuming nature of NSGA-II, which currently relies on CPU computation, and addressing forced switches, a separate network issue.
For comprehensive details, methodology, implementation choices, and results, refer to our report and presentation slides.
Our statistical tests include:
normality.R
: Tests the normality of reward differences between two RL runs.regression.R
(Obsolete): Produces a regression line from average episode rewards.10_runs_agg.R
: Generates a regression line and performs a Kolmogorov-Smirnov test on average episode rewards.significance_if_not_normal.R
: Conducts a Wilcoxon rank-sum test on episode reward data.significance_if_normal.R
: Runs a Student’s t-test on episode reward data.
For more information on our statistical tests, consult the analysis README.
Contributors
- Samuele Bortolotti
- Simone Alghisi
- Massimo Rizzoli
- Erich Robbi