Jump to content ## Recommended Posts

VBE's ded.

⟪VBE - Vehicle Based Efficiency⟫

A tank-normalized account rating system designed to predict relative winning ability compared to other players, using statistics other than win rate.

Data used (API)

• # of battles played in each tank
• damage
• defense
• spots
• frags

Data used (DB)

• For each tank:
• 3 expected damage points
• 3 expected defense points
• 3 expected spots points
• 3 expected frags points
• tier (1~10)
• scouts (light tanks with scout MM) may be simply treated as tanks 1 tier higher
• 3 expected general performance points based on the five values above (using the TBD formula mentioned below)

Definitions

• The 1st expected value point represents a player who does nothing beneficial for the team (equivalent to 0 WN8). Can be calculated similarly to the new WN8 expected values, where rSTAT = the WN8 baseline coefficient (or rSTATc = 0).
• The 2nd point represents a player around 1500 WN8. Calculated the same way, using the average rSTAT for a 1500 WN8 player as an account benchmark.
• The 3rd point represents a player around 3000 WN8. Calculated by taking the account overall expected stats for the 2nd point, adding the difference between the 2nd and 1st (so that rSTATc ends up around 2), then adjusting the same way again.

Procedure

• For the 2nd expected value points, total expected values.
• Predict per-tier stats:
• Calculate a simple score for each stat:
• If the player's actual stat is greater than the totaled expected value for the 2nd point, calculate the 3rd point too. Then, set statPoint (where "stat" is replaced by whichever stat is being calculated) to [1 + (actual-exp2) / (exp3-exp2)].
• If it is below the 2nd point, calculate the 1st point instead. If the actual is between the 1st and 2nd points, set statPoint to [(actual-exp1) / (exp2-exp1)].
• If it is below the 1st point, set statPoint to 0.
• Split up tanks played into all relevant tiers:
• If a player did 10 battles in tier 10, 5 battles in tier 6 lights, and 5 battles in tier 7 heavies, you would set battles to 10 and battles to 10.
• Using all tanks played within each tier, use the inverse of the hybrid function used above, with the scores it generated, to calculate per-tier predicted stats.
• Use a formula (yet to be defined through analysis) to calculate general performance at each tier played, based on these predicted stats.
• Calculate a similar scores as above, but for general performance, and per-tier instead of per-stat.
• Calculate an average score, weighted by number of battles played at each tier.
• Multiply by a coefficient so that the average is around 900, similar to WN8 or WN7.

##### Share on other sites

⟪VR - Victory Rating⟫

A normalized version of global Win Rate, which essentially predicts the player's win rate in an average tier 10 tank, using data for all tiers/tanks.

[to be expanded]

[reserved]

##### Share on other sites

New concept... again. This time, I hope it's easier to understand.

Data used (API)

• # of battles played in each tank
• damage
• defense
• spots
• frags

Data used (DB)

• For each tank:
• 3 expected damage points
• 3 expected defense points
• 3 expected spots points
• 3 expected frags points
• tier (1~10)
• scouts (light tanks with scout MM) may be simply treated as tanks 1 tier higher
• 3 expected general performance points based on the five values above (using the TBD formula mentioned below)

Definitions

• The 1st expected value point represents a player who does nothing beneficial for the team (equivalent to 0 WN8). Can be calculated similarly to the new WN8 expected values, where rSTAT = the WN8 baseline coefficient (or rSTATc = 0).
• The 2nd point represents a player around 1500 WN8. Calculated the same way, using the average rSTAT for a 1500 WN8 player as an account benchmark.
• The 3rd point represents a player around 3000 WN8. Calculated by taking the account overall expected stats for the 2nd point, adding the difference between the 2nd and 1st (so that rSTATc ends up around 2), then adjusting the same way again.

Procedure

• For the 2nd expected value points, total expected values.
• Predict per-tier stats:
• Calculate a simple score for each stat:
• If the player's actual stat is greater than the totaled expected value for the 2nd point, calculate the 3rd point too. Then, set statPoint (where "stat" is replaced by whichever stat is being calculated) to [1 + (actual-exp2) / (exp3-exp2)].
• If it is below the 2nd point, calculate the 1st point instead. If the actual is between the 1st and 2nd points, set statPoint to [(actual-exp1) / (exp2-exp1)].
• If it is below the 1st point, set statPoint to 0.
• Split up tanks played into all relevant tiers:
• If a player did 10 battles in tier 10, 5 battles in tier 6 lights, and 5 battles in tier 7 heavies, you would set battles to 10 and battles to 10.
• Using all tanks played within each tier, use the inverse of the hybrid function used above, with the scores it generated, to calculate per-tier predicted stats.
• Use a formula (yet to be defined through analysis) to calculate general performance at each tier played, based on these predicted stats.
• Calculate a similar scores as above, but for general performance, and per-tier instead of per-stat.
• Calculate an average score, weighted by number of battles played at each tier.
• Multiply by a coefficient so that the average is around 900, similar to WN8 or WN7.

Will post diagrams later.

In this method, I assume the expected values will change from time to time?  Will the coefficient change as well or be static?

##### Share on other sites

A few questions if I may:

Is this a whole account concept (like the 'overall' WN8) or recent performance only (like the 60 day)? - This drives your choice of data used to develop the rating and derive the expected values

What will this rating relate to? Will it be a predictor of winning (good correlation to recent or overall winrate), predictor of anything else, or will it just be an arbitrary metric? - Knowing this will avoid all the disagreement we had over how we know when the rating is accurate and the updates are good.

What process will be used to derive the original expected values, and the updates? - We know how much fun that is...

Suggestion: the more you can put into a script, the easier it all gets to review, reproduce results, and sustain the rating

##### Share on other sites

A few questions if I may:

Is this a whole account concept (like the 'overall' WN8) or recent performance only (like the 60 day)? - This drives your choice of data used to develop the rating and derive the expected values

What will this rating relate to? Will it be a predictor of winning (good correlation to recent or overall winrate), predictor of anything else, or will it just be an arbitrary metric? - Knowing this will avoid all the disagreement we had over how we know when the rating is accurate and the updates are good.

What process will be used to derive the original expected values, and the updates? - We know how much fun that is...

Suggestion: the more you can put into a script, the easier it all gets to review, reproduce results, and sustain the rating

1. While in XVM, the overall rating should be more accurate, but player's current skill is better represented by recent stats. It may be necessary to generate two sets of expected values; one for overall and one for recent.
2. It will be a predictor of winning, relative to the winning ability of the tank being played. If the player is playing a crap tank, for example, they won't on average win more than another player with the same rating who is driving a good tank. Pretty much, it's a skill rating.
3. The 2nd point may well just be the same points used in WN8 (rSTATc = 1). The 1st point (the zero-point) will be calculated using the same process that was used to calculate the expected values for WN8, but centered around rSTATc = 0 (which we may have to redefine somehow). The 3rd point will be calculated first by simply being set to the 2nd point + the difference between the 2nd and 1st points, then adjusted in the same way (essentially rSTATc = ~2).

In this method, I assume the expected values will change from time to time?  Will the coefficient change as well or be static?

Yes, the expected values will change. The coefficient will probably stay the same.

##### Share on other sites

I have been reading a bunch on this lately and have a few thoughts some of which could probably be sorted out easily using the ?eureka? software.

I would be interested to see how the damage received correlates to the win rate and damage dealt stats. Seems as if there should be a good one there that could be semi easy to add in. To use the Win8 formula as an example rDamage = avgDam - avgRecDam/ expDam-expRecDam. Just base off of casual observation and cruising through the stats the better players generally have a better delta here.

The other issue I have is using a base value that is intended to be a median. It seems as if the best method would be to calculate out where the "purple" line should be, say the 95-99% area, and calculate down from there. Since overall performance is rarely a linear scale when measuring skill this would allow you to curve the data based on the actual achievement point. It seems it should also allow you to curb seal clubbing data some by simply adjusting the expected values of a tier by a modifier where tier x could be 1( no modification) to tier one at 1.3( .3 increase).

If this has been discussed before sorry I did most of this thread in addition to more than I want to admit over the last few days.

##### Share on other sites

I would be interested to see how the damage received correlates to the win rate and damage dealt stats. Seems as if there should be a good one there that could be semi easy to add in. To use the Win8 formula as an example rDamage = avgDam - avgRecDam/ expDam-expRecDam. Just base off of casual observation and cruising through the stats the better players generally have a better delta here.

The correlation with damage taken is actually the other way around after accounting for hp loss caused by losing games. Here's what you get if you subtract (1-winrate)*0.9*avghp from average damage taken: It's not an illogical result. When you play to win, you spend your hp as a resource. Playing too safely is just wasting a resource.

##### Share on other sites

Not quite what I meant. I guess the best way you could plot it in a two axis system would be damage-damage recieved over winrate. I think you would find the delta between the two to correlate to the better win rate. Better players trade their HP more effectively and deal more damage over the course of the game. So as an average the higher producing players should have a larger delta between the two.

##### Share on other sites

Definitions

• The 1st expected value point represents a player who does nothing beneficial for the team (equivalent to 0 WN8). Can be calculated similarly to the new WN8 expected values, where rSTAT = the WN8 baseline coefficient (or rSTATc = 0).
• The 2nd point represents a player around 1500 WN8. Calculated the same way, using the average rSTAT for a 1500 WN8 player as an account benchmark.
• The 3rd point represents a player around 3000 WN8. Calculated by taking the account overall expected stats for the 2nd point, adding the difference between the 2nd and 1st (so that rSTATc ends up around 2), then adjusting the same way again.
Procedure
• For the 2nd expected value points, total expected values.
• Predict per-tier stats:
• Calculate a simple score for each stat:
• If the player's actual stat is greater than the totaled expected value for the 2nd point, calculate the 3rd point too. Then, set statPoint (where "stat" is replaced by whichever stat is being calculated) to [1 + (actual-exp2) / (exp3-exp2)].
• If it is below the 2nd point, calculate the 1st point instead. If the actual is between the 1st and 2nd points, set statPoint to [(actual-exp1) / (exp2-exp1)].
• If it is below the 1st point, set statPoint to 0.

I'm slightly puzzled - you're building a three point system, but instead of using these to form a curve, you're generating two slopes that meet in the middle?

Is this simply a matter of computational expediency?

##### Share on other sites

I'm slightly puzzled - you're building a three point system, but instead of using these to form a curve, you're generating two slopes that meet in the middle?

Is this simply a matter of computational expediency?

Yeah. Even as I've planned it, it's still several times more intensive than WN8, and I imagine that if a curve was involved, more complex calculations would take place and drive the computational cost up even further.

EDIT: Now that I think about it, the current idea may be about as computationally intensive as collecting all tank data from the API (not certain). Technically, that means predictions won't be required, and there would be less calculations involved. (not really practical)

##### Share on other sites

The website and server owners were very kind in accepting WN8 as is - it creates a significant load on their systems - because WN8 calcs are complex. So, one of the main suggestions I'd make is that computational complexity needs to be reduced, not increased, to improve website and xvm performance.

If any follow on system is developed with the working assumption that it needs more expected values or more normalization or limiting of statistics then it would be important to demonstrate what you get for that additional complexity in terms of measurable improvement. If it cant be demonstrated, dont do it.

I'm also a bit puzzled as to why we would want to characterize data by sampling at multiple points, rather than characterizing how the population is distributed and then just using the widely accepted parameters for that distribution - for example, mean and variance for a normally distributed population. Some other distributions (gamma) would need more than 2 parameters, and if we wanted to go there it almost certainly wouldnt be for all stats, maybe one.

Once we get this new forum section open I will open up a thread for us to do some 'data science' on WOT stats. Let's spend a few weeks sampling various data sets and plotting the distributions, then move on to see how we can accurately model them with least computational effort.

##### Share on other sites

I'm also a bit puzzled as to why we would want to characterize data by sampling at multiple points, rather than characterizing how the population is distributed and then just using the widely accepted parameters for that distribution - for example, mean and variance for a normally distributed population. Some other distributions (gamma) would need more than 2 parameters, and if we wanted to go there it almost certainly wouldnt be for all stats, maybe one.

Once we get this new forum section open I will open up a thread for us to do some 'data science' on WOT stats. Let's spend a few weeks sampling various data sets and plotting the distributions, then move on to see how we can accurately model them with least computational effort.

The reason I chose to use multiple points is so that there is effectively a rudimentary "curve" allowing equivalent amounts of damage, kills, etc, to be set between tanks, making it easier to predict average stats per tank based on average stats over the sample, which can then be used to calculate the general performance scores, and so on.

If there is a less costly method using population distributions, then I'd like to know about it. I just want the end result to be accurate and fair from the low end to the high end, between various tank types.

##### Share on other sites

The calculations are not complicated (from a computing standpoint).  The 'biggest' component is just the per-tank, and I don't think anyone wants a rating that just uses the overall.

It's a few additions, multiplications, min/max.  That isn't anything from a computing standpoint.  Adding more points (2/3) and some additional add/multiply will still not be a problem.  If you started trying to calculate log() or something that would matter from an overall standpoint.

##### Share on other sites

The calculations are not complicated (from a computing standpoint).  The 'biggest' component is just the per-tank, and I don't think anyone wants a rating that just uses the overall.

It's a few additions, multiplications, min/max.  That isn't anything from a computing standpoint.  Adding more points (2/3) and some additional add/multiply will still not be a problem.  If you started trying to calculate log() or something that would matter from an overall standpoint.

Crunching the numbers once you have them in memory, sure, no problem. I think the issue is looking up 5 expected values (now) for all tanks a player has ever played, THEN crunching the numbers. If you raise that to 3 x 5 ie 15 numbers per tank ever played to be dug out of a database, then things wont be all that fast. Its the database queries that take time.

##### Share on other sites

Crunching the numbers once you have them in memory, sure, no problem. I think the issue is looking up 5 expected values (now) for all tanks a player has ever played, THEN crunching the numbers. If you raise that to 3 x 5 ie 15 numbers per tank ever played to be dug out of a database, then things wont be all that fast. Its the database queries that take time.

Holding all the expected values in memory is trivial.  500 tanks * 5 values * 1 byte/value * 3 something = 7.5K.

##### Share on other sites

Holding all the expected values in memory is trivial.  500 tanks * 5 values * 1 byte/value * 3 something = 7.5K.

I believe the problem is that WN8 was already significant workload on servers and my proposal would only increase that several times. It had to be calculated for numerous tanks, for all players being tracked, for various time intervals, every few hours or so, depending on the particular service.

##### Share on other sites

If a two point system would fit reasonably, then we could do that. It would take a bit of server load off while still being decently accurate.

In fact, a logarithmic or exponential curve could be used. I'll elaborate later.

##### Share on other sites

I believe the problem is that WN8 was already significant workload on servers and my proposal would only increase that several times. It had to be calculated for numerous tanks, for all players being tracked, for various time intervals, every few hours or so, depending on the particular service.

My point is that the workload is due to the number of players and number of tanks, which isn't changing.  The load of actually calculating WN8 for a single player with one value vs three values is a trivial difference.

##### Share on other sites

I like the idea of creating a victory rating. A simple scale from 1-9 (or just a percentage) that estimates a player's ability to individually impact the odds of winning a battle in a certain tier/tank/type. Require players to upload something like past 1000 battles, this way we can take into account the platoon effect.

##### Share on other sites

I like the idea of creating a victory rating. A simple scale from 1-9 (or just a percentage) that estimates a player's ability to individually impact the odds of winning a battle in a certain tier/tank/type. Require players to upload something like past 1000 battles, this way we can take into account the platoon effect.

People can choose not to upload battle results as they wish, or they may forget at times. It wouldn't work well. Besides, I was thinking more for use in XVM, to replace win rate (and it may very well just be a percentage value, normalized by tier or tank type; perhaps a prediction of overall WR, had the player only played at tier 10).

Edited the OP and moved VR into the 2nd post.

##### Share on other sites

People can choose not to upload battle results as they wish, or they may forget at times. It wouldn't work well. Besides, I was thinking more for use in XVM, to replace win rate (and it may very well just be a percentage value, normalized by tier or tank type; perhaps a prediction of overall WR, had the player only played at tier 10).

Edited the OP and moved VR into the 2nd post.

My point is that if people wanted a detailed evaluation, they would have to go through a more comprehensive process (for bragging rights). Part of this process would involve uploading battle replay files, this wouldn't be an XVM thing.

##### Share on other sites

My point is that if people wanted a detailed evaluation, they would have to go through a more comprehensive process (for bragging rights). Part of this process would involve uploading battle replay files, this wouldn't be an XVM thing.

I guess if they want to do a solopub challenge using the ADU or WoT Statistics or whatever, or track personal performance, it could use useful. (Though I'm claiming "Victory Rating" for something else.)

##### Share on other sites

I feel like we might need to fiddle with the way this handles light tanks, bj -- simply bumping them a tier higher doesn't seem quite right.

• ### Recently Browsing   0 members

No registered users viewing this page.

×
×
• Create New...