Jump to content
bjshnog

⟪WN9⟫ Development

Recommended Posts

Ok. I get most of that, but sounds like you are advocating each tank having a pdf for each stat; that's a lot of PDFs.

Do you R? What we need in this thread are a few.more people who use R for analysis. I can get those people access to data, and off we go.

 

No, I don't have that kind of software. If there is a way to get it without paying an obscene amount of money I could take a look. 

 

I will simplify it down to one dimension to start. Take Damage/Game and WR. Each account you take will have a WR in a tank, and a damage/game for that tank. Take a large number of accounts (and weight them based on battles) and you will be able to interpolate the points at any given damage level into a PDF. You might see that people who average 1200 DPG have an average win rate of 50% with a deviation of 0.8%. and it is shaped like a gamma or a gaussian (not hard to deal with any of them really). 

 

Now, add a second stat, call it frags. The effect of frags essentially widens the distribution of the Damage PDF, as the different frag levels are not completely correlated. If you take Frags into account you end up removing this influence. You will find something like 0.8 frags/game gives you the same PDF as before, 50% with an 0.8% deviation. However, increase this to 1.2 frags/game at the same damage and you now get 52% with a 1.1% deviation. Your PDF is basically a blurry surface with the degree of blur indicating the spread at any combination of the two variables. 

 

Extend this into as many dimensions as you can get stats for. 

Link to post
Share on other sites

No, I don't have that kind of software. If there is a way to get it without paying an obscene amount of money I could take a look.

 

R is free. You just have to learn the language.

Link to post
Share on other sites

Thats what I'm gonna try to evaluate Otis! Thx for running that tonight Gryphon. 

 

For all the expected values in v17, here is a pdf showing each value per tier as a dot plot, then the means, then the sd's

 

The first 5 graphs have 358 dots on them, so are 'busy' but I though they were useful. Let me know what you think..

No, I don't have that kind of software. If there is a way to get it without paying an obscene amount of money I could take a look. 

 

I will simplify it down to one dimension to start. Take Damage/Game and WR. Each account you take will have a WR in a tank, and a damage/game for that tank. Take a large number of accounts (and weight them based on battles) and you will be able to interpolate the points at any given damage level into a PDF. You might see that people who average 1200 DPG have an average win rate of 50% with a deviation of 0.8%. and it is shaped like a gamma or a gaussian (not hard to deal with any of them really). 

 

Now, add a second stat, call it frags. The effect of frags essentially widens the distribution of the Damage PDF, as the different frag levels are not completely correlated. If you take Frags into account you end up removing this influence. You will find something like 0.8 frags/game gives you the same PDF as before, 50% with an 0.8% deviation. However, increase this to 1.2 frags/game at the same damage and you now get 52% with a 1.1% deviation. Your PDF is basically a blurry surface with the degree of blur indicating the spread at any combination of the two variables. 

 

Extend this into as many dimensions as you can get stats for. 

 

I will produce some distributions of what you asked for so you can see what they look like. As to how to model those using a gamma or a gaussian, I'll do what I can.

 

Get R here with a nice GUI to use it (R Studio). All of you need this. R rocks! 

We also need a dropbox to store stuff we can work on, and we need a chatroom on wotlabs TS for us to talk rather than post all the time.

Link to post
Share on other sites

No, I don't have that kind of software. If there is a way to get it without paying an obscene amount of money I could take a look. 

 

I will simplify it down to one dimension to start. Take Damage/Game and WR. Each account you take will have a WR in a tank, and a damage/game for that tank. Take a large number of accounts (and weight them based on battles) and you will be able to interpolate the points at any given damage level into a PDF. You might see that people who average 1200 DPG have an average win rate of 50% with a deviation of 0.8%. and it is shaped like a gamma or a gaussian (not hard to deal with any of them really). 

 

Now, add a second stat, call it frags. The effect of frags essentially widens the distribution of the Damage PDF, as the different frag levels are not completely correlated. If you take Frags into account you end up removing this influence. You will find something like 0.8 frags/game gives you the same PDF as before, 50% with an 0.8% deviation. However, increase this to 1.2 frags/game at the same damage and you now get 52% with a 1.1% deviation. Your PDF is basically a blurry surface with the degree of blur indicating the spread at any combination of the two variables. 

 

Extend this into as many dimensions as you can get stats for. 

 

Here is a first step - simple distributions of average damage per tank - this is for all tier tens.

 

First set is weighted by number of accounts, second is by number of battles. Battle weighting doesnt seem to affect the distribution much, if any thing it just creates a few outliers we could do without

 

 

average_damage_tier10.jpeg

Weighted by Battles:

average_damage_tier10_weighted.jpeg

 

Link to post
Share on other sites

Can you try and fit those to gaussian and gamma distributions?

 

What test of fitness do you want to apply? They look gaussian, but I dont know how to demostrate that statistically.

 

Here are the same tanks showing the plots of winrate vs average damage - one dot per account. They all seem to have a definite linear trend sitting there underneath all the noise.

 

BTW, I will be on wotlabs TS for next few hours if anyone wants to discuss WN9, and even platoon if you want me to tank your WN8, lol

 

aDMG_vs_aWIN_10.jpeg

Link to post
Share on other sites

How do you plan to make sure they are all equally meaningful? Are you just thinking of using the raw quantile function or CDF and not account for skill bias?

Link to post
Share on other sites

I'm making a little interactive program that will show how my WN9 concept will work, step by step. The coefficients and tanks are just made up at this point, but it should be enough to show what kind of tank composition will cause what kinds of distortion.

Link to post
Share on other sites

For all the damage vs winrate plots I posted, I'd just use a linear regression to find the relationship between damage and winning for them all. Thats what we have always done. If Maxl or anyone else has a better idea then please explain

Link to post
Share on other sites

I an going to try and figure out R and see what I can get.

 

I want to interpolate those graphs into a continuous PDF of damage vs win rate for each tank. Then we can see how the mean and distribution of win rate changes with damage. We want to see how accounts at any particular level are distributed to try and filter out platooning or obvious padding. In fact, on some of those graphs as it is I can see a second set of points which appear to be platoon-padded values. 

Link to post
Share on other sites

Good. If you have R installed I will help you. See my PM. Once that is taken care of, I'll send you scripts to play with. You will need to figure out how to install packages into R Studio, else some of the functions wont work. I use ggplot2 for all the graphs, and there are other very important ones that dont come with base R that are very useful: dplyr, MASS, utils, stats, rjson, datasets

 

I recommend this book, you'll be productive in less than a week - its very good

 

I'll start an R thread in here so questions about R can be dealt with outside this thread

Link to post
Share on other sites

t_34_350_360_zpsmrndrjyf.png\

This is a WR distribution for all EU server T-34 drivers with between 350 and 360 average damage and more than 50 games. It is roughly normal. 

 

t_34_450_460_zpsiup4qn8j.png

 

Same for 450-460 avg damage - a bit more distorted. 

 

t_34_550_560_zpsfzd5s6iw.png

550-560 avg damage. Here sample size is becoming a bit of an issue.

 

My idea is to take these binned normal plots and recalculate one starting at every damage point. Then, we can plot their means and variances and goodness of fit to see what happens. Sadly I am still red@R so it might be slow going to start. 

Link to post
Share on other sites

Ok, well hang in there. One thing I've noticed is that stats like 'damage to death ratio' and 'kill to death ratio' correlate way better to winrate for a single tank (T-62a is what I'm using) than average frags, or damage by themselves

 

Average Frags, or Damage, correlate to T-62a average winrate at about 0.5 r squared. Cap, Spots and Defs  have close to zero correlation. However, using damage to death it's .58 and kill to death its 0.56, which are pretty good at the single tank level

Link to post
Share on other sites

Those just represent an interaction between survival rate and those factors. The final model will certainly involve multiple non-linear terms representing combinations of everything. 

 

A 3D scatter plot between damage/battles, frags/battles and victories/battles is quite tightly grouped. This does indicate that damage and frags are correlated but it shows how multiple factors affect the response. 

Link to post
Share on other sites

About my algorithm, I just realised that because of the way different stats scale between different tanks, and not just tiers, it would be impossible to split the totals based on tier only. It would have to be tank-based, which means an expected value/scale table 2.5x as big as the current one for WN8, when individual tank stats aren't available. I'll figure out how they are calculated.

 

Question: If you take all the stats for a particular tank which correspond to a win rate of 50% (using some linear or non-linear model), and then model win rate based on all those stats, should you expect a 50% win rate, or will it be slightly different?

 

@Max: How were you planning on making overall stats usable? I think the small amount of precision you would gain from doing all this analysis will disappear when we are forced to use the aggregate stats. It might be useful in dossier format like on vBAddict, but I'm doubting how valuable it will be in contexts like WoTLabs.

 

 

I'm still working on my little application, for anyone interested. So far, I've finished:

  • Tank stats framework (as in data structure; it would have been easier in something other than GML).
  • Expected "average" (technically made up) stats per tier which correspond to a 50% win rate (though not really).
  • Stat coefficients for per-tier rating formulas (which are also made up). The variables represent the ratio of the tank average stat to the tier average stat. When all of the variables are equal to 1, they represent a 50% win rate at their tier (1000 points).
  • Tier vs skill normalization.

What I'll be working on:

  • Splitting overall stats into per-tank stats.
  • Tank vs tier normalization.
  • Finishing interface elements ("add tank" buttons, setting tank stats, the very important "Calculate All" button).

One of the features will be that it will calculate both ratings for the actual stats and for the projected stats based on account totals, so any methods of padding will probably become immediately visible.

Link to post
Share on other sites

You use the overalls to develop the model and then just use anything else you get as validation data. If mismatches start becoming significant we re-parameterize. 

Link to post
Share on other sites

Guys: remember that introduction of a new rating system would require a whole lot of patient explaining and arguing with the community. If you forgot that, look at the size of the WN8 Development thread, or even at the size of the WN8 Expected Values Update thread.

 

Been talking to Crab and we see WN9 as an evolution of WN8; just make a limited number of changes that can each be easily justified and seem intuitively correct. So before we all run off down rabbit holes please think : what needs to change, and why?

Link to post
Share on other sites

Richard Nixon made a nice suggestion regarding WN8b or 9, that would be an evolution not a revolution you guys seem to be working on here. Why was that abandoned btw?

 

 


Here's a relatively simple way of fixing skill scaling:
 
1. Generate expected values as usual, centred on 1565.
2. Go back through your player database, calculating per-tank WN8 values with the new expected values.
3. Throw out tanks below 50 games and then the bottom 50% of tanks for each player, as usual.
4. Calculate a "recent WN8" based on the remaining tanks.
5. Throw out any players below 2500 recent WN8.
6. Average the tanks of the remainder, and the WN8 of the players of that tank.
7. Normalize the ratio between the average tank WN8 and average player WN8 to give you a scale factor per point of WN8 from 1565: scalefactor = ((tankWN8 - 1565) / (playerWN8 - 1565))
 
Final results should be a bit like this:
 
https://docs.google....dit?usp=sharing
 
Mine's a bit distorted because I'm mixing Gryphon's expected values with my own database, but you get the idea. Fast tanks have scale factors above 1.0, while slow tanks have scale factors below 1.0. Low tiers are mostly garbage results due to lack of data, but you can guess or substitute 1.0 as appropriate.
 
Once you've got that, you add a final step to the WN8 calculation:
 
1. Sum battles*scalefactor over the player's tanks. Divide by total battles to give scaleAvg.
2. Adjust with the following formula:
 
scaledWN8 = 1565 + ((wn8 - 1565) / scaleAvg)
 
So for example, a 3000 WN8 player who's only played the Maus will get 3057 scaledWN8, and a 3000 WN8 player who's only played the T62A will get 2884 scaledWN8.

 

Wouldnt it be possible to add a 3rd point say 1000WN8 beside 1565 and 2500?

 

Link to post
Share on other sites

The final result likely wouldn't be that much different than WN8, the issue is that right now there isn't a lot of statistical justification for what WN8 does. It uses the same forumla for every tank at every tier, bases performance as a ratio of a top 10% player and has obvious errors as a result. 

 

In short, you are linearizing one too many times and in the wrong place - the 1565-~54% range is not a fair approximation for the 49% range or the 60% range. 

 

As a first step I would like to see the regression which developed the WN8 equation repeated for every tank at tier 10 to see what differences you end up seeing. 

 

If a T-62A formula is A(x) + B(x*y) + C(z) then a Maus can't be approximated as D(x) + E(x*y) + F(z) which is what expected values do. It might be something like A(x^2) + B(y) + C(x*y*z). We need to actually develop a rigorous model of performance, not use fudge factors. 

Link to post
Share on other sites

Richard Nixon made a nice suggestion regarding WN8b or 9, that would be an evolution not a revolution you guys seem to be working on here. Why was that abandoned btw?

He did, and it has merit. I was somewhat disengaged at the time, but as of now the problem I see with his method is in these steps:

4. Calculate a "recent WN8" based on the remaining tanks.

5. Throw out any players below 2500 recent WN8.

6. Average the tanks of the remainder, and the WN8 of the players of that tank.

7. Normalize the ratio between the average tank WN8 and average player WN8 to give you a scale factor per point of WN8 from 1565: scalefactor = ((tankWN8 - 1565) / (playerWN8 - 1565))

The 'recent' is a problem, because we dont have data to do recent. I do, however, have plots for every tank, every rSTAT generated every time I run the expected values, and those plots clearly show the slope of all the rSTAT plots. What I can do is use the current WN8 methodology, generate a per tank plot of WN8 vs user_WN8, and use the slope of that as the scaling factor.

RN's approach was to apply the scaling from 1565 up, but as you said, it should also apply downwards: the main impact of this would be on arty players, as the scaling is most significant for arty, so players who play arty badly will be penalized, but those who play well will get a plus up.

If you like the sound of all that, I will write some code in R that will implement that, and let us see the results. At this point the only complication for the websites using it to calculate WN would be need to know battles per tank for every player. I hope that isnt a problem for server resources...

Link to post
Share on other sites

So what is this trying to accomplish? 

 

You can't force every tank's WN8 to the account average - it ignores the fact that players may actually perform better or worse in certain vehicles. 

 

It would make more sense to change the WN8 formula so it is not full of ridiculous outliers like artillery and the T-62A. 

Link to post
Share on other sites

The final result likely wouldn't be that much different than WN8, the issue is that right now there isn't a lot of statistical justification for what WN8 does. It uses the same forumla for every tank at every tier, bases performance as a ratio of a top 10% player and has obvious errors as a result. 

 

In short, you are linearizing one too many times and in the wrong place - the 1565-~54% range is not a fair approximation for the 49% range or the 60% range. 

 

As a first step I would like to see the regression which developed the WN8 equation repeated for every tank at tier 10 to see what differences you end up seeing. 

 

If a T-62A formula is A(x) + B(x*y) + C(z) then a Maus can't be approximated as D(x) + E(x*y) + F(z) which is what expected values do. It might be something like A(x^2) + B(y) + C(x*y*z). We need to actually develop a rigorous model of performance, not use fudge factors. 

 

Per vehicle modelling is what bjshnog calls 'Vehicle Based Efficiency' (VBE). He started a VBE thread, I think. If you want to go down that road, and are willing to use R to write some code that might implement VBE, then by all means do that but the discussion might be best done in the VBE thread. 

Link to post
Share on other sites

@Gryphon:

 

Do you still have old data sets just for playing with the numbers? If yes use them for now, just to develop "the model" (3 points?). If that works out and produces reasonable numbers also for LTs, Arti and the the T62As of this World, even Garbad would be "happy".

 

After that Orrie can maybe adapt his script, so that we can have a look at known accounts.

 

Gathering recent data wont be a problem I think. We can contact Phalynx or I can see if I can reach Mr Noobmeter, seen him online a few times recently.

 

If this works, it well also have the benefit that the population is already used to WN8 and adding these two new points, wont be something totally new. Much less explaining and arguing to do.

Link to post
Share on other sites

So what is this trying to accomplish? 

 

You can't force every tank's WN8 to the account average - it ignores the fact that players may actually perform better or worse in certain vehicles. 

 

It would make more sense to change the WN8 formula so it is not full of ridiculous outliers like artillery and the T-62A. 

 

First, here is a 'ridiculous outlier' called T-62a, when you plot tank WN8 vs user WN8:

 

T-62A.jpeg

 

The problem is that the the average 3000 WN8 player is achieving about 3300 WN8. All the scaling fix would do is correct that at the account level by applying a factor to a players overall WN8 depending on his WN8 and the percent of battles he'd played in a T-62A. It only affects those who play most of their battles in one tank - to most, its noise.

 

@ Folt: I dont think I need recent data. I think I can use the slope of that graph (and the other 357 of them) in a modification to the WN8 script to implement a scaling fix that should be sufficient.

Link to post
Share on other sites

That graph is nowhere near an accurate model. It shows that T-62A performance is not linear with respect to WN8, which indicates that the model used to determine T-62A WN8 is not adequate. 

Link to post
Share on other sites

  • Recently Browsing   0 members

    No registered users viewing this page.

×
×
  • Create New...