How to calculate percentile rank for point totals over different time spans?

By : Toph
Source: Stackoverflow.com
Question!

On a PHP & CodeIgniter-based web site, users can earn reputation for various actions, not unlike Stack Overflow. Every time reputation is awarded, a new entry is created in a MySQL table with the user_id, action being rewarded, and value of that bunch of points (e.g. 10 reputation). At the same time, a field in a users table, reputation_total, is updated.

Since all this is sort of meaningless without a frame of reference, I want to show users their percentile rank among all users. For total reputation, that seems easy enough. Let's say my user_id is 1138. Just count the number of users in the users table with a reputation_total less than mine, count the total number of users, and divide to find the percentage of users with a lower reputation than mine. That'll be user 1138's percentile rank, right? Easy!

But I'm also displaying reputation totals over different time spans--e.g., earned in the past seven days, which involves querying the reputation table and summing all my points earned since a given date. I'd also like to show percentile rank for the different time spans--e.g., I may be 11th percentile overall, but 50th percentile this month and 97th percentile today.

It seems I would have to go through and find the reputation totals of all users for the given time span, and then see where I fall within that group, no? Is that not awfully cumbersome? What's the best way to do this?

Many thanks.

By : Toph


Answers

I don't see why that would be too overly complex. Generally all you would need is to add to your WHERE clause a query that limits results like:

WHERE DatePosted between @StartOfRange and @EndOfRange
By : TheTXI


I can think of a few options off the top of my head here:

  1. As you mentioned, total up the reputation points earned during the time range and calculate the percentile ranks based on that.

  2. Track updates to reputation_total on a daily basis - so you have a table with user_id, date, reputation_total.

  3. Add some new columns to the user table (reputation_total, reputation_total_today, reputation_total_last30days, etc) for each time range. You could also normalize this into a separate table (reputation_totals) to prevent you from having to add a new column for each time span you want to track.

Option #1 is the easiest, but it's probably going to get slow if you have lots of rows in your reputation transaction table - it won't scale very well, especially if you need to calculate these in real time.

Option #2 is going to require more storage over time (one row per user per day) but would probably be significantly faster than querying the transaction table directly.

Option #3 is less flexible, but would likely be the fastest option.

Both options 2



This video can help you solving your question :)
By: admin