Hi everyone, especially the math and statistical interested, I need your help and response.
I've recently scraped and recorded all users detailed statistics. With all this data, I'm working towards continuing where nerbas' HoF left off.
When I'm ready I'll be releasing a page with DAILY updated HoF and various other misc. statistics.
In the long run, I'd like to enable users (you) to see your HoF placement for all the various stats.
I've been working on generating some mathematical usable presentations of the statistical data instead of the usual Average, which doesn't give much value to work from.
However due to LARGE amount of inactive profiles out of all created Torn profiles, where many haven't any values in many of the stats, this gives a high percentage of users dragging each and every statistical calculation down.
Essentially for each of the 120 (or so) detailed stats available, the median and mode values is ~0 for every one of them.
Therefor I've made some experiment using various limits on the user base to calculate the average on. E.g require that they have at least a "1" in the value being calculated, and/or only active people active within the last xx days.
This gives much more relevant and usable numbers, however, the numbers doesn't reflect the TRUE statistical data, but a statistical data on active profiles.
Here is a data tabled I have generated on "Xanax Taken","Drugs used","Attacks Won","Criminal offences", with a few different combinations of limits of userbase.
What I'm here for is to hear your opinions on statistical data such as this in general, and which of these limited userbase calculations, and math, makes most sense to to use in my later statistical data work.
Explanation of table headers (Math expressions):
Count: Amount of users the calculation is based on out of the total amount of Torn users (including/excluding fedded)
Mean: Also known as average, sum of values divided by count. Read more, click here
Max: The highest value recorded
SD: Standard Deviation, Value indication Spread deviation from Mean value Read more, click here
Q(xx%): Quartiles percentage, the value at which xx% of users have less than this values Read more, click here, note 50% is the same as Median
P(xx%): Percentile percentage, same as above, at 95% value Read more, click here
Explanation of description:
First part is self explanatory, Key.
_Incl/Excl = Including fedded people or excluding fedded people
_0 / _1 = 0 is all people, 1 is only include people having at least 1 in the stat.
30d: If not there it means all people, if 1d/30d only include people with last activity within the last 1day or 30days.
12h/24h: Ignore profiles with lower than 12hour/24hour Played Time.
I love stuff like this but you need to dumb it down to the bits of useful information. As you were suggesting, the first step would be to remove the inactive players. As a indication of how many inactive or useless profiles there are:
Jan 1 2013: There are currently 282,112 players in Torn City.
Feb 2 2013: There are currently 288,983 players in Torn City.
We all know 6800 players have not become active within the last month.
I think you need to sort out all players with last played 30 days and all people with time played less than one day.
Thanks for your efforts. This could be interesting info.
Detailed stats data being made available to the wider population is to be welcomed but I would be concerned as to whether this exercise in data gathering on such a scale is contributing to the lag. It may be wiser to put this on hold until the terrible lag problems being experienced are rare rather than common.
By Bigdearmond [1103884]
Wonder how many of the new 6800 this last month are multi's.
6599 were multies and 200 were restarted players that got busted for multies. Hope this clears up any confusion.
I'm really happy some one decided to pick up were nerbes left off as I really enjoyed looking at the HOF myself but was there not a technical reason the HOF could not be continued?
Hiring for my company 50k int=600k. Druggies are welcome if active. Paying 350k for stalemates, I will unload weapons so anyone can do it.
By Penicillin [1517799]
I love stuff like this but you need to dumb it down to the bits of useful information. As you were suggesting, the first step would be to remove the inactive players. As a indication of how many inactive or useless profiles there are:
Jan 1 2013: There are currently 282,112 players in Torn City.
Feb 2 2013: There are currently 288,983 players in Torn City.
We all know 6800 players have not become active within the last month.
I think you need to sort out all players with last played 30 days and all people with time played less than one day.
Thanks for your efforts. This could be interesting info.
Yeah, the information will defintely be dumbed down This was more an attempt at getting response on what data is usable, exactly what you provided me.
I agree that inactive players should be excluded. I didn't think to add the play time as a factor, that is a good idea.
Using only player time as a limiting factor without limit on last activity might infact add more relastic statistics.
The problem with excluding, lets say people with more than 30 days last acticty, is that the statistics can largely fluctuate depending on which of the "larger" players suddenly reappears or disappears for a month.
I'm thinking that when I create the webpage for this data, that it might be possible to see "different" versions of the data, with 1 of the as default.
I will also only include a few those calculated numbers, the problem is really, which of these numbers makes most sense to represent.
I don't think the Mean/Average is saying much, and statistically speaking it's a very lousy number in by it self.
I really like the concept of Standard Deviation, and maybe the quartiles.
By Evil-Duck [1182047]
Well this is a load of tat...
Peace
Evil-Duck, no one ever asked your opinion, I don't even know if people listen to you at all.
A comment from the guy with the all time lowest forum score. Tsk tsk.
I still see DP's low priced in item market. Aren't we in february month yet?
By SourRon [852964]
Detailed stats data being made available to the wider population is to be welcomed but I would be concerned as to whether this exercise in data gathering on such a scale is contributing to the lag. It may be wiser to put this on hold until the terrible lag problems being experienced are rare rather than common.
Indeed a valid concern, but I'm doing everything I can to limit my effects on the Torn servers. While still providing a constant up to date data set.
As I have now scrapped all current users, my software will monitor new users and put them on the list. This will happen once daily.
Profiles will be rated and ranked according to their last activity and last updated, thus I will only update those needed on a daily basis. Those below will be updated occasionally by this ranking system.
This will mean I will update between 25k-35k profiles daily. Stretched over a period of 24 hours, thats roughly only 1 profile update per 3 seconds.
Notice that when I scrape data, I only request the raw http source data. When a brower requests data, it will also request several other resources, such as javascript, css, and images. Some of it will be cached some of it will not.
But generally speaking, my daily scrape will add less data traffic than the 25000 daily loggings. Think of it as if just 1/3 of every daily player made just 1 more page view before they logged off, that would amount to the same. Out of the daily work and data already presented by Torn this work is minimal (Compared to the data it can represent).
If I need to, I can limit it to perform the needed scrape over a period of 2-3 days, getting data a tiny bit delayed. But still fairly up to date, instead of just scrapping data once per month.
The current data I have is between 3-8 days old. I have not scrapped any data in the past 3 days, so the lag currently seen is 100% not affected by my scraping.
So as long as the Torn servers are working and doing their job, my scraping will be of litle concern.
The problems with lag is deeper underlying problems in Torn and server setup. Servers are showing signs of lag with just 10 people online.
But yeah, I'll see what happens, I'll defintely not be taking part in trying to pressure Torn servers more than necessary. I still need a lot of development left, and in the next week or two I have very litle spare time to do this in, so it'll be a few works before I'm ready. hopefully Torn servers will be better by then
As I said, I already scrapped a few days ago, with no problems. 8-10 page scrapes per second were no problem at the time.
By Cadillac [929733]
I look forward to seeing this in action
I'm glad to hear
By CompanyHiring [266775]
I'm really happy some one decided to pick up were nerbes left off as I really enjoyed looking at the HOF myself but was there not a technical reason the HOF could not be continued?
I'm not aware what the techical reasons for nerbas' stop was?
Might easily have been the mayor lag and problems with Torn servers at the time, that would defintely put a stopping to this also.
By Apples [1626204]
Don't be such a tease, show us some frequency distributions.
It will be ready in due time. Have faith and patience young padawan
Anyway, on a more helpful basis, i'd use a dataset limited to non-fedded accounts with more than 10 or so crimes. Crimes are an activity that I'm pretty sure all users will do, so you wont be losing any 'true' 0s, that you would do if you based stats on 1 xanax taken or anything else like that.
Thank you A valid idea, but might need to be a slight tad higher. 10 crimes can be done within the first day of playing, so doesn't really say anything about whether the profile is active or not, but none the less thanks
i would have been the first one to respond saying that "this will be awesome, but better layout/visual would be perfect." i just didn't know if you would want another post at the top as a place holder.
By Apples [1626204]
Yep, just set the actual figure at whatever comes out of the data.
True
By Nahaz [1511431]
i would have been the first one to respond saying that "this will be awesome, but better layout/visual would be perfect." i just didn't know if you would want another post at the top as a place holder.
but AWESOME!
Thank you very much, and for being considerate Appreciated
"What I'm here for is to hear your opinions on statistical data such as this in general, and which of these limited userbase calculations, and math, makes most sense to to use in my later statistical data work."
This first one isn't about layout, i'm really asking for some help in deciding what parts of this data is usable and significant. After that has been decided, I will do my "later work", which will include a Forum post with better presentation, a webpage with all these stats available, as well as HoF lists, frequency distributions (graphs?), etc.
I will also, at some point, incorporate this into my script, where users will be able to see presentations of HoF with their own position (range), just like the current HoF's available in Torn.
I will also be able to present the data grouped by factions, to generate faction HoF (which is generated by their members stats), maybe calculate this based on some sort of average, median or standard deviation model, as above.
By cyberdude_dk [1613175]
I'm not aware what the techical reasons for nerbas' stop was?
Might easily have been the mayor lag and problems with Torn servers at the time, that would defintely put a stopping to this also.
update 1st of September
- a new rate-limiting has been introduced to the torn servers, this might have been the last accurate HoF post of mine, I'm still checking out the details
Of course I haven't the first clue what this means.
Last Edited: Sat Feb 02, 2013 23:46:56
Hiring for my company 50k int=600k. Druggies are welcome if active. Paying 350k for stalemates, I will unload weapons so anyone can do it.
Updated OP, with some of the suggestions provided. Now always exclude fedded. Exclude profiles that has been inactive for the past 30 days, and exclude profiles that have less than 12hour/24hour play time.
The question now becomes, is this better? Is 12 hours play time sufficient? Excluding up to 24 hours play time, excludes most new players, and new players somehow still need to be a part of the statistics, or you'll quickly end of with statistics only for highly active people, which skews the data way to much.
Also added Total Respect Gained, just for fun
By -Clansdancer [65306]
Why would anyone vote this down?
Awesome bit of data you have there
Thanks Clansdancer!.. I'm glad I could impress you I however a minor request, that I'll send you a private message, when I'm ready, that'll essentially make or break this.
By Apples [1626204]
I'd prefer to just have the raw data to play with, i'm not fussed with how you present it .
I will probably not provide the raw data.
1)
The raw data can easily be used/abused to create an automated target system for all profiles and the wars in your warbase. I'd currently consider this a borderline cheating, this data is very valuable. I know others could do the same, but I'm not here to help cheaters, and I'm not here to get into trouble.
I'm aware that I have this data, and could easily use it my self, but I've several times proved that I try my best to not cross the grey line of cheating. I'm doing this because I enjoy being a geek, not because I enjoy or want to cheat at the game.
All I want is statistical data and HoF.
2)
The data is hosted in database on private server, I can't let each and everyone have access to this
I might later (if this has been discussed and agreed with by staff/Ched) consider creating a safe interface that would allow you to extract the data by dynamically building sql based on inputs. But I'd defeintely not offer direct sql read access.
By CompanyHiring [266775]
update 1st of September
- a new rate-limiting has been introduced to the torn servers, this might have been the last accurate HoF post of mine, I'm still checking out the details
Of course I haven't the first clue what this means.
Ahh, that was true at some point when the servers were having problems. I experienced it myself at the time, I was doing something similar but with another agenda at the time, and stopped that also.
It just meant the Torn server(s) stopped accepting requests when hammering the servers with too many requests. However even if this limit comes along again, with my current setup I should still be able to provide a constantly updated stats and HoF. It might just only be updated every 2nd or 3rd day instead of every day. Time will tell.
By CompanyHiring [266775]
update 1st of September
- a new rate-limiting has been introduced to the torn servers, this might have been the last accurate HoF post of mine, I'm still checking out the details
Of course I haven't the first clue what this means.
Ahh, that was true at some point when the servers were having problems. I experienced it myself at the time, I was doing something similar but with another agenda at the time, and stopped that also.
It just meant the Torn server(s) stopped accepting requests when hammering the servers with too many requests. However even if this limit comes along again, with my current setup I should still be able to provide a constantly updated stats and HoF. It might just only be updated every 2nd or 3rd day instead of every day. Time will tell.
At one time, there was an anti-scripting measure added that only allowed 60 pageviews per minute, or something to that effect. I'm not sure if it still exists or if it was changed to be less strict. Might be that you get the data from a different place then he did also...
Buying bulk Boxes of Grenades 1mil each - Just start trade for fast sale
I wasn't aware that the limit was imposed to deter scripts, I thought it was more because of the server problems. However it doesn't seem to appear to be there anymore.
I don't think I'm doing much different on this matter than nerbas, the data is only available on profiles detailed stats page, so nothing different can really be done.
Follow @TornRPG