Monster Data
Data Pattern: Broken’s Angle
Principle Feature Focus Technique for Monstrous Data With a Peculiar Pattern
Introduction
Dungeons and Dragons (D&D) is one of the most popular tabletop roleplaying games ever made. It would be an understatement to say that this game is well loved, however the game system does have a few issues. One of the most common criticisms of D&D is its complexity at higher levels of play. Follow me on a short adventure to tease apart some of this complexity and see what we can learn from it.
All D&D content in this study was used with permission under the Open-Gaming License (OGL) and the DM’s Guild. Special thanks goes to Wizards of the Coast for making this possible!
In the D&D Dungeon Master’s Guide by WoTC, it is suggested that when the armor class (AC) of a custom monster is set high, the attack bonus (ATT) should be set low for any given challenge rating (CR), but both AC and ATT should track up with increasing CR. This relationship forms a triad. This report will show how a simple technique can take advantage of this relationship for machine learning.
Raw Monster Data
The Monstrous Dataset was created with MonsterFactory.py – based on Fortuna: Generative Modeling Toolkit. Both libraries are developed by Robert Sharp – the author of this report. Conceptually these monsters are custom variations on the monsters published by WoTC in the 5th Edition Monster Manual.
In the graphs above notice that any monster that has a high AC will always have a low ATT for its group – and vice versa. This relationship is the focus of this report and it is prevalent across the entire dataset. Understanding this relationship is a key insight into how this data was produced, and gives a clue to how we can harness simple component reduction to produce more accurate machine learning models.
Broken’s Angle
The graph below shows how this relationship can typically manifest. The overall trend is up and to the right but individual feature trends are down and to the right.
Hypothesis: Is it possible to produce a tighter distribution of data by combining two or more counter-variant features with simple math?
Processed Monster Data
Summary: After combining features with simple addition, the data becomes more focused and better suited for predictive models. CR prediction with +/- 1 CR accuracy is now possible. Notice how tight the data is after applying principle component reduction. This is done with no normalization or scaling.
Final Thoughts
While this technique can improve simple regression models, the more advanced models, like RandomForestRegressor, don’t see much benefit: 1-2% for this data set (best case). Admittedly, PCA would do much the same combination of features, but it wouldn’t be as precise. The variances between the two features are mirror images and the feature values have different offsets. Traditional pipelines would typically scale the values, and for this dataset that would be incorrect. It is the variances that need to be at the same scale… not the values. A bonus of +1 AC is by definition exactly equivalent to +1 ATT, even though the AC starts counting at 8 and ATT starts at 0. This information is unavailable to the model without a human to help. Know your data!