Aggregate Performance Metric, Pt. I - Overview
Posted by: source
(Friday, November 6, 2020)
Introduction
In anticipation of Wolf PRO being released sometime in the near future, I developed a new framework for ranking players which is called Aggregate Performance Metric (APM). It works by calculating two primary factors, player impact and team success which are then combined and adjusted for quality of competition. Each player contributes a certain percentage of their rank points to two pools, impact points and success points. The ratio of those contributions is determined by the number of players in the match. In essence, each player makes two bets: 1) on themself meet or exceed expectations, and 2) on their team to meet or exceed expectations. Expectations for impact and success are determined by relative rank within the match. Impact points are distributed based on each player’s performance in a number of weighted statistical categories, and success points are distributed based on the outcome of the match. A player’s rank after a match is a percentage of the difference between the amount points that they had prior to the match and the amount attributed to them after the match.
Problem
The current system (Elo) is unsuitable for use with multiplayer games because:
1) It doesn’t account for the disparity between players within a match;
2) It doesn’t account for the individual impact that a player has on the game, particularly in the event of a loss.
Requirements
I set the following requirements for a replacement system to be well-rounded:
> It must correctly rank a player in each of 27 (33) possible scenarios, where the factors are player impact, team success, and quality of competition given that each of these may take on a positive, neutral, or negative value.
> It must be scalable to work correctly for player counts between 2-64.
> Each match must be a closed system where the number of points is fixed and the sum of the redistribution of points is 0.
> Player ranks are adjusted based on expectations. Those that exceed expectations increase their rank, those that meet expectations stay in the same position, and those that fail to meet expectations decrease their rank.
> It must work at minimum for objective format games (Objective, Stopwatch).
> It must account for players who played incomplete rounds.
Assumptions
I made the following assumptions when constructing APM:
> The less people in a match, the more important it is to win; the more people in a match, the more important it is to have an individual impact.
> The less people in a match, the more important it is for teams to be balanced. A team’s expectation to win is not linear in the sense that a significant gap between team ranks becomes increasingly difficult to overcome.
> The more people in a match, the more difficult it is to win in spite of any particular rank gap between teams - as variance decreases, confidence increases.
For a mathematical breakdown of how it works, see Aggregate Performance Metric, Pt. II - Breakdown.
For examples of it working in 6v6 matches, see Aggregate Performance Metric, Pt. III - Examples.
Back to PlanetRTCW Articles
Posted on Friday, November 6, 2020 at 10:35:12 PM |
-doNka- |
v | ^ | # |
For reference:
Current elo system is here https://github.com/donkz/RTCW-stats-py-sci/blob/master/tests/elo.py
Lines specific to rtcw are 82-97
In short, people are ranked by the number of kills they have in a given match.
Kills by special weapons worth half a point.
They are ranked 1-x.
Ranks are subtracted or awarded 1.5 points for win or loss and then they are matched against all of their opponents based on their existing ELO score.
People with high ELO's are expected to be at the top of the ranks and people with low ELO expected to be in the bottom.
Any difference will result in subtracting or adding elo to a player.
The rankings can be only calculated from the metrics that could be derived from rtcw logs and therefore should be absolutely objective.
Any player can have a good or a bad game. That's okay because yellow is calculated over hundreds right now it's about a thousand games for each player. You can see the elo rankings on the front page in the season closing posts.
You can see for yourself if it works for you.
ELO system has been used by a decade of quake live games, xonotic, and is derived from mathematical theory for ranking chess players.
|
Posted on Saturday, November 7, 2020 at 05:52:38 AM |
source |
v | ^ | # |
I know that you are using a slight modification right now but since I don't know how many people reading this are going to even care about the math I wanted to stick to classic Elo for comparisons. The gather bot just goes off standard Elo at the moment, but integrating this into a new bot is a whole other thing. There are a number of different systems out there which are either modified versions of Elo or different altogether like Microsoft's TrueSkill system and this is more or less a different flavour which is tailored for use with team & class based objective games.
Elo assumes that 1) wins alone are indiciative of greater skill between opponents, and 2) relative skill is spread across a standard normal distribution and that. That's great for 2-player games like chess (or quake duels) but is pretty bad for larger teams. The system that I created has a lot of the same elements as Elo including expectation vs outcome and scaling the difference but with a more holistic approach. Practically speaking, you could think of this as a way of combining the objectives of both the Elo & rankPts systems that we use right now. I'm also not sure that a standard normal distribution is even relevant with such a small / inconsistent community.
|
Please Login to PlanetRTCW to post a comment.
|