With MLB on the cusp of trading season and trade season rumors, Pirates Prospects thought it might be a good time to re-evaluate the Trade Surplus Values that we use in our Trade Target articles. The original values were developed by Victor Wang back in 2008 as explained in this Hardball Times article and shown in this Beyond the Box Score post. Wang used the Baseball America Top 100 Prospects lists from 1990-1999 to track how each prospect performed in his first 6 years of team-controlled time.
Now was the time to update the data for a variety of reasons. First, the article is 4 years old so it is always a good idea to refresh the concept. The second reason is that I feel “prospecting” has been refined greatly over the past few years. Hitting and pitching environments are factored in to rankings more, there is more attention paid to strikeout and walk rates for both hitters and pitchers, plus there has been a tone-down on hype when ranking players to some extent.
The next section will focus on the methodology used to calculate these new surplus values for the different tiers of prospects. It will be a little number-crunchy so if you don’t want to see how the sausage is made, feel free to skip down to the Conclusions section. If this site were set up more like Grantland, with all of their side-margin footnotes, it would probably be a much cleaner way to present the info. I’ll get on Tim to make that change at the next e-meeting.
As stated above, we wanted to re-examine how the prospect values may have changed in the intervening 4 years since Wang’s original piece. To do so, our data set was the 1994-2003 (shifting the 10 year period four years forward) Baseball America Top 100 Prospect list. With the potential for 1000 data points, I decided to enlist the help of a friend to split the number-crunching duties. Many thanks go out to Steve DiMiceli for his assistance on this project. 2003 was determined to be the endpoint because it would have allowed prospects time to come up in 2006 and then serve out their 6 years of team control through 2011.
Using a shared Google spreadsheet, we sifted the players from these 10 years of lists into the same tiers as Wang did: Hitters #1-10, Hitters #11-25, Hitters #26-50, Hitters #51-75, Hitters #76-100, Pitchers #1-10, Pitchers #11-25, Pitchers #26-50, Pitchers #51-75, Pitchers #76-100.
For each year, each tier was filled appropriately. We then went through and found which tier the player was in during his last appearance on any list. We labored on whether to use the player’s peak value on a list or whether to use his last appearance, but ultimately went with the latter. The reasoning is that a player may have initially been over-hyped and then it was determined that he: a) couldn’t hit off-speed stuff, b) couldn’t develop a third pitch, c) had his ceiling reduced by injuries, or any number of other reasons. We felt his last appearance was more representative of how Baseball America felt about him after multiple viewings.
Once each player’s final tier was determined, each of the overall tier lists were filled for hitters and pitchers. Each player’s value during his 6 years of team control was determined using Fangraphs’ WAR. In Wang’s original study, the more esoteric Win Shares Above Bench (WSAB) was used. WAR is more in the “mainstream” and accessible to most people, so that metric was used for this revised study. Best estimates were made for service time in cases where a player only had a partial season for a certain year; not every player was calculated using strictly the first 6 years of his career on Fangraphs.
When a team trades a major league player and receives a package of prospects back, they want to know the Present Worth Value of each player in 2012. If a player can project to give you 15 WAR over the next 6 years, that’s not the same as what the major league player could give you in 2012. That prospect’s value must be brought back to present day using a discount value. Think of it this way: If I offered you $100 right now or $150 spread over the next 6 years, you would most likely choose the $100 now. The same concept holds true with the prospects. Teams have a time-money component to worry about as well; if they don’t invest money in a salary in 2012, they can use that money in other areas to improve the team — hence the discount value. For the purposes of this study, we stuck with the same 8% discount value per year used by Wang.
Discount value works this way:
For year 1 — use full value
For year 2 — use 92%
For year 3 — use (92%)^2
For year 4 — use (92%)^3
For year 5 — use (92%)^4
For year 6 — use (92%)^5
For the purposes of this study, these 6 values were totalled and divided by 6 to get a Discount Value Factor of 0.82.
Each player’s total 6 Year WAR was multiplied by the 0.82 Discount Value Factor to get a Present Worth Estimate of their WAR. That Present Worth Estimate of WAR was then multiplied by $5M/WAR to get their Gross Value Amount.
During the review of all the prospects’ WAR values, it was observed that the vast majority of them had a back-loaded shift to their WAR component — they were giving their team more value during their arbitration years — so we made an assumption to assign 2/3 of their Present Worth Estimate of WAR to the arb years.
The typical model for calculating arbitration values is 40% of a player’s presumed free agent worth in Year 1, 60% in Year 2, and 80% in Year 3. This averages out to a 60% arbitration value over the 3 arb years. The 2/3 Present Worth Estimate of WAR was multiplied by 60% and then multiplied by $5M/WAR. This gives an estimate of how much the team would estimate to pay for that prospect based on his production level during those arbitration years. Then an additional $1.5M was added on to that previously calculated arbitration years cost to account for 3 years of minimum salary paid to that player, using $500,000 as the minimum salary for model purposes.
Combining the arbitration value estimated cost with the minimum scale cost gives an estimate on Prospect Cost During Team Control. The surplus value of a prospect was finally determined to be Gross Value Amount minus Prospect Cost During Team Control.
The results from our data sifting are as follows:
Hitters #1-10: 42 unique data points, average WAR per player was 17.76 for their 6 years of team control (high – Andruw Jones 38.8, low – Ruben Mateo -1.3), Present Worth WAR 14.56, Surplus Value $42.2M
Hitters #11-25: 30 unique data points, average WAR per player was 14.16 (high – Todd Helton 35.7, low – Jose Guillen -1.2), Present Worth WAR 11.61, Surplus Value $33.36M
Hitters #26-50: 74 unique data points, average WAR per player was 7.98 (high – Albert Pujols 48.9, perhaps ranked a touch too low by BA, low – Dee Brown -4.3), Present Worth WAR 6.54, Surplus Value $18.12M
Hitters #51-75: 81 unique data points, average WAR per player was 4.75 (high – David Wright 37.5, low – Peter Bergeron -3.4), Present Worth WAR 3.9, Surplus Value $10.2M
Hitters #76-100: 80 unique data points, average WAR per player was 4.84 (high – Chase Utley 41.3, low – Karim Garcia -3.5), Present Worth WAR 3.97, Surplus Value $10.43M
Pitchers #1-10: 17 unique data points, average WAR per player was 11.45 (high – Ben Sheets 24.5, low – Jesse Foppert 0.1), Present Worth WAR 9.39, Surplus Value $26.7M
Pitchers #11-25: 41 unique data points, average WAR per player was 8.28 (high – Roy Oswalt 31.9, low – Scott Ruffcorn -0.9), Present Worth WAR 6.79, Surplus Value $18.89M
Pitchers #26-50: 61 unique data points, average WAR per player was 6.58 (high – Cliff Lee 30.1, low – Franklyn German -1.2), Present Worth WAR 5.4, Surplus Value $14.7M
Pitchers #51-75: 83 unique data points, average WAR per player was 3.66 (high – Zack Greinke 26.8, low – Andy Larkin -0.9), Present Worth WAR 3, Surplus Value $7.5M
Pitchers #76-100: 90 unique data points, average WAR per player was 3.83 (high – Carlos Zambrano 21.6, low – John Frascatore -0.8), Present Worth WAR 3.14, Surplus Value $7.93M
As you can see above, not only is there very little variation between both Hitters and Pitchers ranked #51-75 and #76-100, in both cases the #76-100 tier actually was worth slightly more. As a result, moving forth we at Pirates Prospects will only categorize both Hitters and Pitchers as #51-100 and use the slightly higher #76-100 values. Perhaps in future models there will be a distinct separation between the tiers again, but for now it is not discernible.
How do these numbers compare to Wang’s 2008 numbers?
|Tier||Pirates Prospects 2012||Wang 2008|
|Hitters 1-10||$42.20 M||$36.5 M|
|Hitters 11-25||$33.36 M||$25.1 M|
|Hitters 26-50||$18.12 M||$23.4 M|
|Hitters 51-75||$10.20 M||$14.2 M|
|Hitters 76-100||$10.43 M||$12.5 M|
|Pitchers 1-10||$26.70 M||$15.2 M|
|Pitchers 11-25||$18.89 M||$15.9 M|
|Pitchers 26-50||$14.70 M||$15.9 M|
|Pitchers 51-75||$7.50 M||$12.1 M|
|Pitchers 76-100||$7.93 M||$9.8 M|
In looking at these numbers, you can see more striations between the tiers in the Pirates Prospects data than in the Wang data. Whereas the pitchers are all sort of clumped together for Wang’s summation, there are 4 distinct break points in the current data. The same is true for the hitters — there are 4 planes in the current data, rather than 3 for Wang’s study. This lack of blending gives the prospect tiers a truer feel of value between them. Otherwise, why give up a Top 10 pitching prospect when you can give up a Top 50 prospect if they are worth the same average value? Now there could be some actual decisions in the process.
This is not to say that there is anything wrong with Wang’s methodology or his results; rather, it speaks more to the usage patterns and value placed on prospects by Major League teams, coupled with a better source of ranking procedures by Baseball America in recent years. Teams are hoarding prospects, especially high-end ones, more and more in recent years. When they debut, they are put in key roles and expected to succeed, leading to higher WAR totals than in the past.
Also included in establishing Prospect Surplus Values are values given to hitters and pitchers based on John Sickels’ rankings of an individual team’s farm system by letter grade for each prospect. With 10 years of necessary data, for 30 teams and 20 players per team, that would be 6,000 more potential data points to assess. We simply did not have the time or inclination to do that, especially since the values of those players are very small in some cases. Additionally, for consistency we wanted to go back to 1994 and Sickels’ online data only goes to 2005 in his archives, which was outside the zone for this study. We’ll continue to use those numbers from the Beyond the Box Score piece.
Contributing (greatly) to this article was Steve DiMiceli.
I’m sure this was very challenging and exhaustive, but I’m confused about some of the approach to the statistical analysis and would like more clarification.
1) Why did you have to choose between peak value or latest appearance? Couldn’t you count a player multiple times when they appear on the list more then once. I would think this would end up just averaging into the values and be more indicative of what kind of value your prospect is worth as over-hyping is definitely part of this process and not all ~whatever~ ranked prospects are traded at their latest values.
2) Doesn’t using a combined present worth conversion (.82) skew the values? Using that system two players producing an equal WAR over six years would be valued the same if one only produced in year 1 and the other only produced in year 6. I don’t think there is a true conversion from the exponential scale to a linear scale and the only way to get accurate values is to multiple each years WARs individually and sum them up.
3) I’m not sure if the arbitration calculation model takes into account that a player producing consistently for 4 years isn’t as worth as much as a free agent as a player producing consistently for 6 years. This would mean 40% in year four is based on a different salary than the 80% at year 6. That said I don’t know if that matters. But like the prospect values, are the arbitration calculations and the $5MIL/WAR model somewhat older as well.
4) Would it help to even out the numbers and make them slightly more predictive to throw out a few of the top and bottom values when calculating the average? This would help outliers from distorting the mean.
I know I’m being overly negative. I’m not trying to troll and appreciate the work. These are genuine questions, I am by no means an expert. I want to know actual answers to better my statistical understanding, and also the trade value pieces are some of my favorite to read and I’d like to see them as accurate as possible.
1. Peak vs. appearances we labored with, but I definitely didn’t want to do every appearance. I called this The Jackson Melian Factor. Melian was an overhyped Yankee prospect on the list only because he was young. He would have negatively skewed the ranks, while the truly elite guys never lasted as long on the list.
2. A true, true analysis would have taken each year and weighted it using the factors described in the analysis. But it took Steve and I 2 weeks to do just this and I wanted it out for trade season. I still feel the margin of error would be sufficient for the way we did it.
3. Using the weighted 2/3 of all value in arb years was the way around this for me. It also averaged the WAR during arb years to get a value.
4. Not sure about this one. I think you need them for integrity.
Thank you for reading it and your GREAT questions.
This is great work, Kevin. Really interesting. I agree with Rory’s questions #1 and #4 above (actually I don’t necessarily disagree with any of it, but I’m quite sure I didn’t think about it to the depth he has). Anyway, I’m wondering a couple of things:
1. what percent of the players in the study were unproductive (either a low WAR, a negative WAR or never even made it to the bigs)? Anytime a team deals for a prospect, the possibility of him not working out for whatever reason has to enter the internal conversation. I mean, Ryan Anderson was ranked as a top 20 pitcher four straight years by BA and never tossed a big league pitch. But he was only counted once in the study (from what I can tell). Matt Riley was top 20 twice. So was John Patterson. On the flip side, so was Matt Cain.
2. How do you account for the absence of WAR for those players who never made it to the Bigs? Did you count it as a zero?
1. I would have to check for a %, but it would depend on what you call a “low WAR”. Is that less than 6 (1 WAR/year) for their 6 year career?
2. Yes, if they never made it they got a zero.
WOW ~ I couldn’t even BEGIN to read this puppy, ‘ cause there’s a lot if info in it. And it’s 6 a.m. on the Left Coast ~ WAY too early for this amount of brain cell activity. But I can tell, Kevin (author) again surpassed your the PP high standards.
Spectacular work, Kevin. Do you have any idea what kinds of models the mlb teams would use?
I do not have any idea what they may use. I would like to think it is a variation of this type of model, because if you look at some of the trades this past offseason (Cahill, Gonzalez, etc.) they fall in relative line with this concept.
I want Jason Heyward (I know that will never happen, but thats what I want), Ryan Lavarnway & Xander Bogaerts (for Hanny, Welker, and Polanco), and Clint Robinson (for Tabata & J. Wilson). Make my dreams come true…or tell me how insane I am.
That is an orgy of a trade proposal(s).
My brain hurts.