Jump to content
LanterneRouge

Is 100 games a large enough sample size?

Recommended Posts

100 games is definitely not enough to get a good representation of your performance over all the maps + spawns in the game, of which there're close to 100 (~50x2) including standard/encounter/assault. Couple that with the fact that you tend to get a limited map rotation in any given session, and you're bound to get zero samples on a lot of maps. This can make a massive difference. E.g. if you're playing an E-100 and the server has open maps on repeat then you're obviously going to do worse.

 

If you save all your replays, you can use a replay analyzer and see how you perform on different maps in your tank of choice; I'm pretty sure most people will see maps where they perform significantly worse/better.

 

100 games is probably ok for a rough ballpark estimate but I wouldn't take it seriously unless it was an unusually good result.

Share this post


Link to post
Share on other sites

Handy table here:  http://forum.worldoftanks.com/index.php?/topic/183629-curious-if-your-51-win-rate-is-due-to-your-skill/#topmost

 

100 games is only about 68% likely to be 95% accurate. At least according to my understanding of the math. Really bad actually. Really calls into account to accuracy of any single 100 game challenge. Mine included.

 

I'll take that as an veiled retraction of this comment: 

 

:thumbup:

Share this post


Link to post
Share on other sites

regardless of everything, it still takes more then one person to win a game in WoT (this isn't chess)..personally, wr in a tank is to dependent on your team, RNG, and lets not forget the MM gods.. nobody can win alone, period.

are you a retard?

Share this post


Link to post
Share on other sites

regardless of everything, it still takes more then one person to win a game in WoT (this isn't chess)..personally, wr in a tank is to dependent on your team, RNG, and lets not forget the MM gods.. nobody can win alone, period.

 

Hey, I'm going to be direct here in saying the absolute truth.

 

If a player wins 64% of their solo battles over a large sample and the average is 49%, then they alone are winning 15% of those battles.

Share this post


Link to post
Share on other sites

Hey, I'm going to be direct here in saying the absolute truth.

 

If a player wins 64% of their solo battles over a large sample and the average is 49%, then they alone are winning 15% of those battles.

Absotively. 

My original question was about the small sample size of most solo challenges. I think they may be somewhat informative but statistically pretty meaningless. 

Share this post


Link to post
Share on other sites

Absotively. 

My original question was about the small sample size of most solo challenges. I think they may be somewhat informative but statistically pretty meaningless. 

 

Yeah. There is a reason that recent stats use a 1000 battle sample.

Share this post


Link to post
Share on other sites

So, since this is the MATH forum and no one wanted to do the math...

It looks like it is way way way way too few. 

 

http://en.wikipedia.org/wiki/Checking_whether_a_coin_is_fair

 

Hi,

I tried to do the maths once:

http://forum.worldoftanks.eu/index.php?/topic/357838-winrate-for-mathematicians-a-quantitative-analysis/

that topic is closed now, so i copied it here to a blog and continued to update it:

http://scrontch.wordpress.com/2014/02/28/win-rate-in-world-of-tanks-a-quantitative-analysis/

Basically this is the analysis on the same coin-toss assumption as what you have posted.

In short: 100 is far to few, 1000 is starting to get significant, but you need 10K if you really want to be affirmative.

(orders of magnitude)

Share this post


Link to post
Share on other sites

My opinion is no...

 

100 matches is too easily affected to be an accurate assesment. Being really bad at math I'd consider that 1000 matches would show the more accurate assesment.

 

If a gambler won 60 of 100 hands would we consider them a skilled gambler or on a hot-streak? I'm thinking hot streak.

 

JM.02c

Share this post


Link to post
Share on other sites

Wow, there is some serious retardation here.  Only a couple of people are even making a pretense of math and all but one got it wrong.

 

Read this and responses:

Ok, I'm not a mathemagician but I'm not convinced you are reading those numbers right. 

In your last example:

http://www.wolframalpha.com/input/?i=margin+of+error+for+binomial+parameter%2C+sample+size+300%2C+confidence+level+0.999

Isn't that margin of error +/- 10%

That seems, huge. 

 

here's the traditional 100 game challenge. 

http://www.wolframalpha.com/input/?i=margin+of+error+for+binomial+parameter%2C+sample+size+100%2C+confidence+level+0.95

 

+/- almost 10% just seems so large that for the purpose of a challenge where you hit 60% you'd be in the range of just average because of the margin of error. 

Share this post


Link to post
Share on other sites

+/- almost 10% just seems so large that for the purpose of a challenge where you hit 60% you'd be in the range of just average because of the margin of error. 

You aren't understanding what margin of error means.  Margin of error isn't related to what your true win rate is, its related to what the range of values due to chance are.

 

As we all know, 100 games is small enough that you might get shit maps/teams etc.  Streakiness/luck can still be a major factor.  Nevertheless, at 60% you have a 95% chance that its not due to luck.  At 55%, you have a ~80% chance that its not due to luck.  At 51%, the chance that its due to luck is like 55%.  So for example the KV-5 challenge proved it wasn't due to luck.  Had I done more poorly, say 58%, one could plausibly argue that I was just lucky.  The further the result from the mean, the less likely it is to be chance.

 

A separate question is whether that 65% shows my true skill (or perhaps if I am a 70%er who had a bad run or maybe a 60%er on a hot run).  If we assume a set skill value will produce a normal set of results, then we can say that my 65% result means my actual skill has a 95% chance of being between 55% and 75%.  More samples narrows that range down, but doesn't change the mean.

 

In practical use, this means proving win rate is not just luck is pretty easy.  Within the limitations discussed at the end of that thread, I think I did that.  A much more complex question is what my true solopub win rate is or if I am better than kewei, who ran similar trials.  The answer to both of those questions is basically unknown.  We can get a pretty huge range for my win rate (but still pretty high -- whether I solopub 58 or 65, I am still a unicum).  Comparing me to kewei we don't have even close to enough trials to make that call.  I just call the win because 61 > 60, knowing full well there is like a 48% chance I just got lucky.

 

Make sense?

Share this post


Link to post
Share on other sites

Wow, there is some serious retardation here.  Only a couple of people are even making a pretense of math and all but one got it wrong.

 

Read this and responses:

 

Nah, you were right the first time, except that your conclusion is very dubious. Moving onto the old red herring:

 

"It appears I'm mistaken...or at least that I'm too rusty to remember what the right answer is. The catch is "The probability of "success" p is the same for each outcome."  We all believe/know that some games are winnable than others."

 

The trick is that the binomial method works to a degree as long as random matchmaking is random. When you enter a battle, there are ton of factors that are outside your control: Team balance, tank balance, map choice, whatever. However, as long as all the significant factors are independent of your previous results, then it's essentially equivalent to tossing a single loaded coin.

 

Consider that you toss 99 coins, and if 50 or more come up heads then you mark it as a win. The results are of course exactly the same as tossing a single coin.

 

What trips people up is that they tend to think of games as two separate stages: The setup and the game itself. For this question, the distinction has little value.

 

 

If the matchmaking is not independent of your previous results, then everything changes. Apparently MM rigging theories are unpopular on this forum.

Share this post


Link to post
Share on other sites

Apparently MM rigging theories are unpopular on this forum.

Technically, the MM does run differently depending on a few conditions (recently bought tank, in queue for more than 2 minutes).  But we just ignore that.  In any event, I think you are right.

Share this post


Link to post
Share on other sites

You aren't understanding what margin of error means.  Margin of error isn't related to what your true win rate is, its related to what the range of values due to chance are.

 

As we all know, 100 games is small enough that you might get shit maps/teams etc.  Streakiness/luck can still be a major factor.  Nevertheless, at 60% you have a 95% chance that its not due to luck.  At 55%, you have a ~80% chance that its not due to luck.  At 51%, the chance that its due to luck is like 55%.  So for example the KV-5 challenge proved it wasn't due to luck.  Had I done more poorly, say 58%, one could plausibly argue that I was just lucky.  The further the result from the mean, the less likely it is to be chance.

 

A separate question is whether that 65% shows my true skill (or perhaps if I am a 70%er who had a bad run or maybe a 60%er on a hot run).  If we assume a set skill value will produce a normal set of results, then we can say that my 65% result means my actual skill has a 95% chance of being between 55% and 75%.  More samples narrows that range down, but doesn't change the mean.

 

In practical use, this means proving win rate is not just luck is pretty easy.  Within the limitations discussed at the end of that thread, I think I did that.  A much more complex question is what my true solopub win rate is or if I am better than kewei, who ran similar trials.  The answer to both of those questions is basically unknown.  We can get a pretty huge range for my win rate (but still pretty high -- whether I solopub 58 or 65, I am still a unicum).  Comparing me to kewei we don't have even close to enough trials to make that call.  I just call the win because 61 > 60, knowing full well there is like a 48% chance I just got lucky.

 

Make sense?

Yeah, I understand that. I just don't think it's valuable for such a small sample size when your actual winrate could be 51% if you get 60% for a challenge. Yeah it could also be 69% but that's just so huge. What's the point?

 

 

 The further the result from the mean, the less likely it is to be chance.

 

 

Maybe that's the part I'm missing. Why is that?

Share this post


Link to post
Share on other sites

100 games is a decent indicator.

A good and significant sample size would be 500. A true sample would be 1000. But with the amount of time it takes to do that... Well, let's say 100 games is the best format all considered.

Share this post


Link to post
Share on other sites

In short: 100 is far to few, 1000 is starting to get significant, but you need 10K if you really want to be affirmative.

(orders of magnitude)

 

Sorry, I didn't notice the question was specifically about those solo challenges with over 60% win rate.

Thus forget the above conclusion (This conclusion is for resolving 1%-point differences in real win rate between players)

For Garbad's claim which is that he (and not luck) influenced the outcome of those battles positively, 100 battles are indeed sufficient, and the (simplified) coin formula says nothing different: For Z=2, E=0.1, => n=100.

But had his win rate been only say 52%, then Z=2, E=0.02 => n=2500.

Share this post


Link to post
Share on other sites

 

Yeah, I understand that. I just don't think it's valuable for such a small sample size when your actual winrate could be 51% if you get 60% for a challenge. Yeah it could also be 69% but that's just so huge. What's the point?

 

 

Maybe that's the part I'm missing. Why is that?

 

Because when I started, I wasn't trying to prove what my actual solopub win rate is.  My goal in challenges was initially to prove stat deniers wrong -- to prove win rate was not luck/gold/platoons/etc.  Later, people asked me to do challenges to review a tank and provide replays, and still later we started using it as a way to measure epeens.  But originally it was about win rate = luck retards.  And that argument is over, the kv5 challenge broke it.

Share this post


Link to post
Share on other sites

Because when I started, I wasn't trying to prove what my actual solopub win rate is.  My goal in challenges was initially to prove stat deniers wrong -- to prove win rate was not luck/gold/platoons/etc.  Later, people asked me to do challenges to review a tank and provide replays, and still later we started using it as a way to measure epeens.  But originally it was about win rate = luck retards.  And that argument is over, the kv5 challenge broke it.

Ok, you were answering a different question. 

My question was more about the validity of 100 game challenges for epeen measurement:)

For that purpose I don't think they are very valid. 

Share this post


Link to post
Share on other sites

Ok, you were answering a different question. 

My question was more about the validity of 100 game challenges for epeen measurement:)

For that purpose I don't think they are very valid. 

Unless there is an extremely large spread, they aren't conclusive proof.  They need more samples.

 

For example me vs kewei (always within 3%, even after 500 games) STILL doesn't prove much.  Me vs. EJ might have (he quit early, but I was so far ahead its likely he was outside by margin of error).  But even then, its a bit iffy.

 

And remember, 95% confidence is essentially arbitrary.  If I am ~75% confident I am better than player X, is that enough for epeen?  For most of us, sure.  What about 55%?  Good enough for me, I still slam my epeen in kewei's face regularly.  What about beating EJ, where its probably even lower?  I don't give a fuck, I can still legitimately boast poast about beating the entire EU server at once (USA USA USA).

 

So yeah.  Proof?  Yes.  Perfect proof?  No.

Share this post


Link to post
Share on other sites

So yeah. Proof? Yes. Perfect proof? No.

Uncertainty is an inherent property of statistics. You'll never prove something with 100% certainty using statistics. Please dont say perfect proof, this feeds apologist theist B.S. when they talk about science like evolution being uncertain thus "just a theory". Even experiments like those conducted at LHC have an uncertainty(think I've read 8+ sigma on one of the CERN papers).

Share this post


Link to post
Share on other sites

 

Maybe that's the part I'm missing. Why is that?

 

Because its an "odd" result and thus less likely to be the result of chance.  Imagine flipping a coin.  If after 100 flips you had 51 heads, there is a very high chance its due to luck.  Thus, you have no reason to believe anything other than random chance is affecting the outcome.  But if you tossed 100 times and got 65 heads, this result is extremely unlikely due to chance, and thus, have reason to believe something else is affecting the outcome (loaded dice).  With tanks, my 65% was very improbable due to luck, and thus, I was winning by skill (as I eliminated other possibilities, such as gold spam).

 

Share this post


Link to post
Share on other sites

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.

Guest
Reply to this topic...

×   Pasted as rich text.   Paste as plain text instead

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.


  • Recently Browsing   0 members

    No registered users viewing this page.

×
×
  • Create New...