Kalith

Verified Tanker [EU]
  • Content count

    29
  • Joined

  • Last visited

Everything posted by Kalith

  1. I'm Kalith from wottactic.com, and I'm posting to let people know I'm starting an effort to create a new tank specific data set for the wn8 rating. I'll base myself on the practices explained here: http://wiki.wnefficiency.net/pages/WN8#Data_Sourcing. I'm still looking into it, but when I'm done I'll go into detail on the exact process. I'll also release all code. I am doing this, because I disagree with both the v29 and v30/v31 average values. v29 because of the ridiculous expected values manually assigned to certain tanks. I'm not opposed to a little tweaking, but it has to be well founded. v30/v31 abandons tank specific ratings altogether. This Brings us back to days before wn8, where people where boosting heir stats by playing the most op tank they could find. Yes wg balancing tanks and shifting their tiers makes the rating less reliable. Yes the values for reward tanks and certain rare tanks are skewed and need to be tweaked manually, which is statistically iffy. Still wn8 is still the best tool we have and it's fundamentals are sound, but it needs up to date expected values, so that's what I'm offering. If you want to help or offer any advice, feel free to contact me.
  2. Ok that makes sense, let me know when/where/how you're going to make the expected value available.
  3. I'm ok with switching over if the values tank specific and makes sense. I agree that we don't want several different wn8 ratings. Which stats do you use as a source ? Anyway, keep me informed on when you release your expected values.
  4. Sorry for the long hiatus. Took a vacation and then got swamped with work. Anyway, pleased to announce the new expected values. This time it's based on the stats of all players on all WG servers (> 1000 battles, tanks with > 50 battles), calculated with Gryphon's method/script, slightly modified to deal with larger data files. The following substitutions were performed: 15617,Object 907,16897,Object 140 15905,M60,14113,M48A1 Patton 55841,T95E6,14113,M48A1 Patton 58641,VK7201,9489,E 100 63537,121B,4145,121 58881,IS-5,5377,IS-3 11809,T23E3,1569,T20 54273,SU-76I,2369,FCM 36 Pak 40 59425,T34 B,2849,T34 12577,M4 Improved,52257,M4A2E4 Sherman 13905,FV4005 Stage II,9297,FV215b (183) 53793,T95E2,5921,M26 Pershing 59665,Grosstraktor - Krupp,2385,Vickers Medium Mk. III 61457,Pz.Kpfw. III Ausf. K,6417,Pz.Kpfw. III/IV 54033,Pz.Kpfw. V\/IV Alpha,51473,Pz.Kpfw. V\/IV 49409,IS-6 B,9217,IS-6 64273,Panzer 58 Mutz,49937,Schwarzpanzer 58 59681,M4A3E8 Thunderbolt VII,1313,M4A3E8 58913,T26E5,59169,T26E5 Patriot http://wottactic.com/expected_v33.json http://wottactic.com/expected_v33.csv You can test out the calculator here http://wottactic.com/wn8_standalone.html. Wottactic.com will switch over in the next week, but it takes a few days to recalculate the history. Code and methodology is still here: https://github.com/karellodewijk/wn8_expected.
  5. I gathered the player list. This list should contain all players across all servers who played 1 or more RANDOM battles in world of tanks. with the amount of random battles played. (~47.32 mil players). http://forum.wottactic.com/other/players_jul_12_2017.7z I shuffled that list and am now downloading the stats for all players with > 1000 battles to generate the wn8 values. I probably will not end up downloading them all but a sufficiently large sample.
  6. @bounceplink I remember reading it somewhere, anyway Gryphon_ will know. Anyway I've started on an update. Like before I'm starting by gathering a player list first but this time I'm building one with players across all servers. It might take some time to gather but I'll use that player list again in the future and publish it for other people to use.
  7. The expected values have actually always been based on only the EU server, I've just used a much larger sample of players. I've noticed on average wn8 with my expected values seems a bit higher overall. I can't really say for certain why as I don't know how/which data was gathered to calculate the expected values before. Anyway I'll be starting an update probably next weekend. I do plan this time to gather data across servers instead.
  8. Basically the script will pick any tank that a player has > 50 battles on. The problem is that it takes some time for enough people to reach 50 battles on a tank to be statistically useful. Mostly no, there is no real way to tell them apart, except maybe taking a snapshot now and using the difference in stats rather than absolute stats. It's not ideal, I'll admit. I believe Android25 is collecting a stats dump of all players atm. If we have that we can use that as base, we can use that to calculate expected values based on difference between that and some point in the future. Hell we could include spotting damage/tracking damage/armor use in the rating as well then if wg does not change how they are calculated in between.
  9. @User Can you five me your wg id, so I can take a look ? I've moved the stats for wottactic.com to these new expected values. I don't want to fragment wn8 unnecessarily. But for reason mentioned earlier I also won't implement the v30 expected values. Everyone is free to adopt/use/(re)publish/improve or modify the tank specific wn8 expected values I've published, and I'll update them occasionally for the foreseeable future. I think tank specific values are important for a more accurate and less padable rating and I hope people will adopt them or come up with another set of tank specific expected values to use.
  10. @User I checked for the ISU-130 and it is listed: 58625,1.01,1369.06,0.69,0.98,51.12 On what account did it mess up the stats ? @bounceplink I make no assumptions based on tier. The expected values for a tank are based on the performance of players on that tank compared to their performance on other tanks. This is plain wn8, I didn't change anything. It will take quite a long time for the expected values to reflect recent changes. I've not modified the way the expected values are calculated except for the tanks explicitly mentioned, I've even used the same code. I've simply applied it to a larger dataset. Also try to compare against v29 as it's the last version that had per tank values. It is possible a tank is not picked up if it wasn't mentioned in v30 and nobody has played > 50 battles on it. Fluctuations are normal. There isn't even a guarantee that it remains the same on average, the entire scale could have shifted up a bit. I sort of expected as much as previous expected values were calculated from a player list taken from a rating website which is slightly biased towards better players. As I mentioned the expected values are calculated on all EU accounts with > 1000 random battles (random columns on API) taken about a week ago now. 2464513 players in total. I purposely didn't rely on rating sites because they are always biased towards people who look up their stats. I built the player list by just asking for the random battle count of all accounts ids 500000000-550000000. It's really not so bad as you can query a 100 accounts in a single request. Of course many accounts come back empty because they don't exist. @Android25 It took me about 1 day to create the list of players and 5 days to download the data. I basically sent a request every 150ms or 7 a second to stay below the WG recommended maximum of 10 requests a seconds. Requests that bounced for whatever reason are automatically retried. Yes it takes some time to gather the data if WG is reading this, if you release a database dump every few months or so, that would be very handy and we wouldn't have to harass your api servers quite as much :).
  11. It works fine for me, note it takes wg id's like: 505943778, not names or anything.
  12. I've now plugged in the values for the remaining tanks, I've updated the post accordingly. You can check out the stats of a player with the new rating here: http://wottactic.com/wn8_standalone.html It's a little basic, but it gives you some idea.
  13. Ok I'll use that for the remaining tanks. For the tanks in the 'new' pluglist already, it's definitely a better approximation, also still can't open attachments on this forum
  14. I'll add the rating to wottactic as wn8 experimental or something. I had another look at the tank list and looked them up. swarzpanzer 58 is the mutz black edition and the is-6B is the is-6 black edition, so those are obvious. Pz.Kpfw. V/IV is apparantly the Pz.Kpfw. V/IV alpha, which is identical to the regular Pz.Kpfw. V/IV The chieftain T95 will be a cw reward tank. The BT-SV and pz2J are just very rare premium tanks. The M6A2E1 was apparently available as a pre-order package for wot a long time ago. The M4A3E8 Thunderbolt VII, Strv 103B, mausschen, T-100 LT and wz-132A, FV4005 Stage II have been released but are just new. Not sure how the T26E5 made it onto this list, that's the patriot, I thought it was pretty popular but I only have 14 people on record playing more than 50 battles in it am I missing something ? Anyway can use the easy 8 value for the thunderbolt those are pretty similar. But I don't know how comfortable I am guessing values for the Strv 103B, mausschen, T-100 LT and wz-132A, FV4005, they are not sufficiently similar to other tanks imo, maybe I should use the values from the weighted version.
  15. Well I finished gathering the data and creating new expected value files. First of all here is a useful database for anyone debarking on something similar: http://forum.wottactic.com/other/input.7z. It contains a large csv file with lines such as: "userid","compDescr","title","type","tier","countryid","battles","victories","damage_dealt","frags","spotted","defence_points" for every tank of every player with > 1000 battles on the eu server. As for processing the data: I've taken two approached. One identical to Gryphon's method. For CW reward tanks, I've used the closest related tank, when in doubt I picked one that's probably a little bit better. 15617,Object 907,16897,Object 140 15905,M60,14113,M48A1 Patton 55841,T95E6,14113,M48A1 Patton 58641,VK7201,9489,E 100 63537,121B,4145,121 58881,IS-5,5377,IS-3 11809,T23E3,1569,T20 Then there were some exceedingly rare or new tanks. partly based on the community pluglist, partly just common sense: 54273,SU-76I,2369,FCM 36 Pak 40 59425,T34 B,2849,T34 12577,M4 Improved,52257,M4A2E4 Sherman 13905,FV4005 Stage II,9297,FV215b (183) 53793,T95E2,5921,M26 Pershing 58369,Object 260 mod. 1945,7169,IS-7 59665,Grosstraktor - Krupp,2385,Vickers Medium Mk. III 61457,Pz.Kpfw. III Ausf. K,6417,Pz.Kpfw. III/IV 54033,Pz.Kpfw. V\/IV Alpha,51473,Pz.Kpfw. V\/IV 49409,IS-6 B,9217,IS-6 64273,Panzer 58 Mutz,49937,Schwarzpanzer 58 59681,M4A3E8 Thunderbolt VII,1313,M4A3E8 Then there are a few very new or very rare tanks that I didn't have enough data, I've gone with the average values from v30 5681,0.68,1100.00,2.15,0.63,51.27 4737,1.12,2153.40,0.71,0.64,49.22 13905,1.12,2153.40,0.71,0.64,49.22 15441,0.88,1162.12,1.31,0.91,51.62 18705,0.90,1579.22,0.98,0.68,51.03 19473,0.95,1918.96,1.08,0.64,49.34 49921,1.00,1106.68,1.02,0.92,54.23 52513,0.92,1283.61,1.07,0.81,52.12 52225,1.21,290.59,1.71,1.32,57.68 19201,0.68,1400,2.15,0.63,51.27 except for the pz2j because there is nothing average about this tank. I just went on the limited data i have: 51729,1.82,353.81,1.56,1.78,65.19 Then I've tried an alternative approach that removes the 50 minimum battle requirements, but when doing the linear regression weights the entries with the amount of battles. For popular tanks it yields very similar results, but for rarer tanks, there is a lot more data to work with. I have again replaced the CW reward tanks with their closest counterparts 15617,Object 907,16897,Object 140 15905,M60,14113,M48A1 Patton 55841,T95E6,14113,M48A1 Patton 58641,VK7201,9489,E 100 63537,121B,4145,121 58881,IS-5,5377,IS-3 11809,T23E3,1569,T20 There were no relevant entries in the pluglist. And only these 9 tanks had insufficient. 15441, 17217, 18705, 19457, 51473, 52993, 19201, 62785, 63233 JITT, Chieftain\/T95, KV-4 Kreslavskiy, Mäuschen, AMX M4 mle. 49, Pz.Kpfw. V\/IV, A-32, T-100 LT Resulting expected values: regular: http://forum.wottactic.com/other/expected_v32.csv, http://forum.wottactic.com/other/expected_v32.json weighted: http://forum.wottactic.com/other/expected_v31w.csv, http://forum.wottactic.com/other/expected_v31w.json Code and detailed procedures at: https://github.com/karellodewijk/wn8_expected So what do you think ?
  16. The WG api has no concept of recent battles. It has only totals. The way stat websites calculate recent performance is, they take a snapshot of your tank stats, then a little later they take another snapshot and they calculate the difference.
  17. Ok that makes sense, I'm just going to download all the accounts, it will take a few days but I can live with that, i hope WG is too . And it will make my life easier in the long run. I can't seem to download your attachment though, "The page you are trying to access is not available for your account.". Maybe just mail it to me [email protected] I did just have to restart, I noticed I was missing some pretty popular tanks, looks like it was skipping over tanks it couldn't find name/tier/etc from. That together with "/wot/encyclopedia/vehicles" being very incomplete, made sure I missed a bunch of tanks. I switched to getting the tank info from "/wot/encyclopedia/tanks/", which is deprecated but at least more complete, and just to be sure if it encounters a tank that is missing it will just fill in some dummy values, it the id that counts anyway, such fun. Anyway Plots with less than 100, got it, use community created pluglist file. Thanks again
  18. @Gryphon_, mostly I'm not finished yet, but I tried running the data I had gathered so far and see what I got. Basically I generated a data set:that your R script seems to be ok with: http://forum.wottactic.com/other/input.zip That will give me a set of expected values, I added the tank name to it to make it easier to talk about: http://forum.wottactic.com/other/expected 2017-05-14 .csv And I've also used your plot script to generate your scatter plots: http://forum.wottactic.com/other/plots.zip Am I doing ok so far ? Now what would be the next step. By investigating the scatter plots I can see some tanks are quite rare and people who played 50 games in them are even rarer. It will get a little bit better when the stats of more people are gathered, but there probably will be tanks where data is so sparse the linear regression is very unreliable. So I guess that for those rare tanks it's best to fix their expected values to the most similar tank I can find, right ? Then there is the issue of tanks that were part of some kind of skill based reward. I'm thinking cw campaign rewards tanks, maybe the t-22, others ? They are not necessarily very rare but their raw calculated stats are super biased because only somewhat skilled people own them. I guess for those I do the same and fix their stats to a very similar tank ? Then there are tanks that have received significant buffs/nerfs/tier changes. I'm somewhat inclined to ignore this, because it's very subjective. Am I on the right track or is there something else I need to do ?
  19. Two thing here is a list of all eu players with >1000 battles and their battle count: https://karellodewijk.github.io/other/players_eu.zip (2464513 in total), should be useful when calculating any rating, so you know which accounts exist. I've also uploaded my code and explanation so far to github: https://github.com/karellodewijk/wn8_expected. I've shuffled the players and started downloading their stats. It will take the better part of a week to download them all. But I don't think I need all 2.5 milion. A random sample of a few 100k accounts should be plenty.
  20. Well I'm a realist, wn8 is used today and very widespread. I also think it's fundamentally sound, with per tank expected values. And from my limited testing it holds up well even against wn9. I do feel it should use "random" stats published by WG and not the "all" stats as a base. The "all" stats by wg are random stats + tank companies + clan wars before a certain date. wn9 does use the "random" stats. But I'm not going to implement that myself without some kind of consensus. Anyway I fully intend to release my expected values with all code and the process I've followed to get there. As for wn9, I'll release my code to scrape stat data and if you want I can create a raw dataset in any format with any filters you want. When browsing around I see you use a bash script with wget. That works obviously but by slightly modifying the code I've already written to keep wottatic stats up to date, I can do it an order of magnitude faster. If you or someone else with experience with wn9 could then do the post processing, that would be great.
  21. Ok the script that is making the account list is over half way done. It found 1.7mil accounts with >1000 battles and 1.3 mil accounts with over 2500 battles, so I'm going to assume around 3mil eu accounts when it's all done. Tbh that's not so bad, according to my calculations I should be able to fetch those in +- 5 days without getting banned from the wg api :). Anyway I'll make sure to do a random shuffle on the wg ids before I start downloading the stats, so anything it does download can be used as a sample. I got your script to work fine. on a very small sample of stats I downloaded. It does spit out some expected values that make sense at first sight. A few NA's in there but I presume that my small sample of 5 accounts doesn't cover every tank. Anyway here's sample csv file I generated, your script seems to accept it, but just to make sure, can you have a look to see if I haven't made any mistakes. For example I don't know exactly what title is supposed to be. Here is the link: https://karellodewijk.github.io/other/input.csv. As for inspecting the output, can you post the code to create the scatter plot with the linear regression line. I can probably figure it out, but it's been a while since I used R. But tbh manually eliminating outliers as you seem to be doing is time consuming and even a little iffy. I'd rather give it more data so the outliers matter less or at least automate the process using some fixed rules. I also think these outliers are a result of accounts with a low number of battles (>50) on a certain tank. Wouldn't it help a lot already if you weighted your linear regression again with the number of battles, or are you already doing this ? Anyway Thanks for the help Gryphon, I appreciate it.
  22. Sure I realize it's a prank. But it's been taken over by most rating websites and used over the last 4 months or so. I think it has run it's course. Anyway I'm really not trying to re-invent the wheel here. I would be fine with with somewhat accurate and updated expected values for wn8. I'm not creating a new rating, I am just trying to update the expected values for wn8.
  23. I'd be happy to use your R script if it still works, that would be a great help. Can you give me a link ? And how does it like it's data ? I have quite a bit of experience scraping the statistics data from WG, for wottactic stats I basically make a list off all active accounts and re-fetch the data for all accounts that have played battles that week. ~200K players on average. I can't use that data set of course because it's created from all players that have ever been looked up on wottactic or are in a clan that has been looked up. About wn9, I did a correlation test between wn8/winrate and wn9/winrate on my data set and i got: corr wn9-wr: 0.7867520119707153corr wn8-wr: 0.8046118979857809 This could be because of outdated data, but wn8 did seem the better rating. As for maintaining it, we'll see when we get there. I have to admit I don't have a lot of free time, I'll have to see to what extend I can streamline the process. Anyway I'm focusing my efforts on the eu server for now. To create an unbiased list of players with more than 1k battles, I'm just asking for the battle counts of all players between 500000000 and 540000000 (100 at a time), I've never seen an eu wg id much over 530000000, so that should catch them all. Anyway this alone will take approx 16h as I can't send >10 request a second without them getting dropped. But at least then I'll know how many players we're dealing with and if I can just fetch all their stats or if I need to work with a random sample. Anyway I don't necessary want to do this alone, and I would be happy with any advice or help I can get. I do have solid base of statistics and a more solid base on programming. As a side note, I think it would be better to start with the "random" stats instead of the "all" stats. Random stats seem to be basically "all" - tank company stats. I know wn9 uses it and it just makes more sense, wouldn't it be better to move wn8 to the random stats ?
  24. Missing tanks is probably where your difference is coming from. I think the only sane way is to disregard them. Remove them from the data before calculating wn8. Anyway here is an example of a standalone page that calculates wn8/wn9: http://wottactic.com/wn8_standalone.html I think it's a very concise example of what is required, and it does calculate the same values as wotlabs and other rating sites, maybe you'll find it useful.
  25. For anyone stumbling upon this topic in the future, the trick is to get a players tank list from /wot/account/tanks (this one appears accurate), e.g.: https://api.worldoftanks.eu/wot/account/tanks/?application_id=demo&account_id=523984764&fields=tank_id . And remove the vehicle stats from tanks not in this list.