|
PECOTA, an acronym for Player Empirical Comparison and Optimization Test Algorithm, is a sabermetric system for predicting Major League Baseball player performance.[1] It was invented by Nate Silver in 2003 and shares the name of former Kansas City Royals player Bill Pecota. Sabermetrics is the analysis of baseball through objective evidence, especially baseball statistics. ...
MLB and Major Leagues redirect here. ...
Nate Silver is Executive Vice-President of Baseball Prospectus. ...
Major league affiliations American League (1969âpresent) Central Division (1994âpresent) Current uniform Retired Numbers 5, 10, 20, 42 Name Kansas City Royals (1969âpresent) Other nicknames The Boys in Blue Ballpark Kauffman Stadium (1973âpresent) a. ...
William Pecota (born February 16, 1960 in Redwood City, California), is a former professional baseball player who played third base in the Major Leagues from 1986-94. ...
One of several widely used statistical systems of forecasting player performance, PECOTA is marketed by Baseball Prospectus (BP) as a fantasy baseball product.[2] PECOTA forecasts a player's performance in all of the major categories used in typical fantasy baseball games; it also forecasts production in advanced sabermetric categories developed by Baseball Prospectus (e.g., VORP and EqA). In addition, PECOTA forecasts several summary diagnostics such as breakout rates, improve rates, and attrition rates, as well as the market values of the players. The logic and methodology underlying PECOTA have been described in several publications, but the detailed formulas are proprietary and have not been shared with the baseball research community. For Wikipedia statistics, see m:Statistics Statistics is the science and practice of developing human knowledge through the use of empirical data expressed in quantitative form. ...
Baseball Prospectus, sometimes abbreviated as BP, is a think-tank focusing on the statistical analysis of the sport of baseball, which is also known as sabermetrics. ...
Fantasy baseball is a game whereby players manage imaginary baseball teams based on the real-life performance of baseball players, and compete against one another using those players statistics to score points. ...
In baseball, value over replacement player (or VORP) is a statistic which demonstrates how much a hitter contributes offensively or how much a pitcher contributes to his team in comparison to a fictitious replacement player, who is an average fielder at his position and a below average hitter. ...
Equivalent Average (EqA) is a baseball metric invented by Clay Davenport[1], and intended to express the production of hitters in a context independent of park and league effects. ...
Proprietary indicates that a party, or proprietor, exercises private ownership, control or use over an item of property, usually to the exclusion of other parties. ...
Methodology
Silver has described the inspiration for his approach as follows: The basic idea behind PECOTA is really a fusion of two different things – James's work on similarity scores and Gary Huckabay's work on Vlad, [Baseball Prospectus's] previous projection system, which tried to assign players to a number of different career paths.[3] I think Gary used something like thirteen or fifteen separate career paths, and all that PECOTA is really doing is carrying that to the logical extreme, where there is essentially a separate career path for every player in major league history. The comparability scores are the mechanism by which it picks and chooses from among those career paths.[4] George William âBillâ James (born October 5, 1949 in Holton, Kansas) is a baseball writer, historian and statistician whose work has been widely influential. ...
Comparable players PECOTA relies on fitting a given player's past performance statistics to the performance of "comparable" Major League ballplayers by means of similarity scores. As is described in the Baseball Prospectus website's glossary:[5] In Sabermetrics and APBRmetrics, Similarity Scores are a method of comparing baseball and basketball players (usually in MLB or the NBA) to other players, with the intent of discovering who the single most similar historical player is to a certain player. ...
PECOTA compares each player against a database of roughly 20,000 major league batter seasons since World War II. In addition, it also draws upon a database of roughly 15,000 translated minor league seasons (1997-2006) for players that spent most of their previous season in the minor leagues. . . . PECOTA considers four broad categories of attributes in determining a hitter's comparability: 1. Production metrics – such as batting average, isolated power, and unintentional walk rate for hitters, or strikeout rate and groundball rate for pitchers. 2. Usage metrics, including career length and plate appearances or innings pitched. 3. Phenotypic attributes, including handedness, height, weight, career length (for major leaguers), and minor league level (for prospects). 4. Fielding Position (for hitters) or starting/relief role (for pitchers). . . . In most cases, the database is large enough to provide a meaningfully large set of appropriate comparables. When it isn't, the program is designed to 'cheat' by expanding its tolerance for dissimilar players until a reasonable sample size is reached. Although drawing on the underlying concept of Bill James' similarity scores, PECOTA calculates these scores in a distinct way that leads to a very different set of "comparables" than James' method.[6] Furthermore, Silver describes the following distinct feature: George William âBillâ James (born October 5, 1949 in Holton, Kansas) is a baseball writer, historian and statistician whose work has been widely influential. ...
The PECOTA similarity scores are based primarily on looking at a three-year window of a pitcher’s performance. Thus, we might look at what a pitcher did from ages 35-37, and compare that against the most similar age 35-37 performances, after adjusting for parks, league effects, and a whole host of other things. This is different from the similarity scores you might see at baseball-reference.com or in other places, which attempt to evaluate the totality of a player’s career up to a given age."[7] Once a set of "comparables" is determined for each player, his future performance forecast is based on the historical performance of his "comparables." For example, a 26 year-old's forecast performance in the coming season will be based on how the most comparable Major League 26 year-olds performed in their subsequent season. Separate sets of predictions are developed for hitters and pitchers. The comparable players are drawn from a database of all major league player-seasons since 1946. The raw statistics in this database are first adjusted to take into account park effects and the era in which a player played. Batting Park Factor, also simply called Park Factor or BPF, is a baseball statistic that indicates the difference between runs scored in a teams home and road games. ...
Peripheral statistics PECOTA also relies a lot on the use of peripheral statistics to forecast a given player's future performance. For example, drawing on the insights coming out of the use of defense-independent pitching statistics, PECOTA forecasts a pitcher's future performance in a given area by using information about his past performance in other areas.[8] As Baseball analyst and journalist Alan Schwarz writes, "Silver . . . designed a sophisticated variance algorithm that has examined every big-league pitcher's statistics since 1946 to determine which numbers best forecast effectiveness, specifically earned run average. His findings are counterintuitive to most fans. 'When you try to predict future E.R.A.'s with past E.R.A.'s, you're making a mistake,' Silver said. Silver found that the most predictive statistics, by a considerable margin, are a pitcher's strikeout rate and walk rate. Home runs allowed, lefty-righty breakdowns and other data tell less about a pitcher's future."[9] In baseball, Defense Independent Pitching Statistics (DIPS), also known as DIPS ERA (dERA), is a sabermetric statistic which measures a pitchers effectiveness based only on plays which are completely under his control: home runs allowed, strikeouts, and walks. ...
Alan Schwarz (b. ...
In baseball statistics, earned run average (ERA) is the mean of earned runs given up by a pitcher per nine innings pitched. ...
Probability distributions Instead of focusing on making point estimates of a player's future performance (such as batting average, home runs, and strike-outs), PECOTA relies on the historical performance of the given player's "comparables" to produce a probability distribution of the given player's predicted performance during the next five years. Alan Schwarz has emphasized this feature of PECOTA: "What separates Pecota from the gaggle of projection systems that outsiders have developed over many decades is how it recognizes, even flaunts, the uncertainty of predicting a player's skills. Rather than generate one line of expected statistics, Pecota presents seven – some optimistic, some pessimistic – each with its own confidence level. The system greatly resembles the forecasting of hurricane paths: players can go in many directions, so preparing for just one is foolish."[10] Silver has written, In statistics, point estimation involves the use of sample data to calculate a single value (known as a statistic) which is to serve as a best guess for an unknown (fixed or random) population parameter. ...
In mathematics and statistics, a probability distribution is a function of the probabilities of a mutually exclusive and exhaustive set of events. ...
Alan Schwarz (b. ...
This procedure requires us to become comfortable with probabilistic thinking. While a majority of players of a certain type may progress a certain way – say, peak early – there will always be exceptions. Moreover, the comparable players may not always perform in accordance with their true level of ability. They will sometimes appear to exceed it in any given season, and other times fall short, because of the sample size problems that we described earlier. PECOTA accounts for these sorts of factors by creating not a single forecast point, as other systems do, but rather a range of possible outcomes that the player could expect to achieve at different levels of probability. Instead of telling you that it's going to rain, we tell you that there's an 80% chance of rain, because 80% of the time that these atmospheric conditions have emerged on Tuesday, it has rained on Wednesday. Surely, this approach is more complicated than the standard method of applying an age adjustment based on the 'average' course of development of all players throughout history. However, it is also leaps and bounds more representative of reality, and more accurate to boot.[11] Team effort Although Silver is the creator and steward of PECOTA, he acknowledges that PECOTA forecasts are a team product: "I might be `the PECOTA guy,' but it very much is a team effort," he says of the BP staff. "We all do it. It's my baby, but it takes a village to run a PECOTA."[12] For example, PECOTA draws on Clay Davenport's translations (the so-called Davenport Translations or DT's) of minor league and international baseball statistics to estimate the major league equivalent performance of each player.[13] In this way, PECOTA is able to make projections for more than 1,600 players each year, including many players with little or no prior major league experience. Clay Davenport is a baseball sabermetrician and a writer for the Baseball Prospectus. ...
Alternative forecasting systems Most of the other popular forecasting systems do not use a "comparable players" approach. Instead most rely on direct projections from a player's past performance to his future performance, typically by using as a baseline a weighted average of a player's performance in his previous three years. Like PECOTA, many of those systems also adjust the projections for aging, park effects and regression toward the mean. Like PECOTA, they may also adjust for the competitive difficulty of each of the two major leagues.[14] The systems differ from one another, however, in the types and intensities of age adjustments, regression-effect estimates, park adjustments, and league-difficulty adjustments that they may make as well as in whether they use similarity scores.[15] PECOTA also makes projections for many more players than do other systems, because PECOTA relies on adjusted minor league statistics as well as major league statistics and tries to make projections for all of the players on major league expanded rosters (40 players per team) as well as other prospects. In statistics, given a set of data, X = { x1, x2, ..., xn} and corresponding weights, W = { w1, w2, ..., wn} the weighted mean is calculated as Note that if all the weights are equal, the weighted mean is the same as the arithmetic mean. ...
Batting Park Factor, also simply called Park Factor or BPF, is a baseball statistic that indicates the difference between runs scored in a teams home and road games. ...
In statistics regression toward the mean, sometimes called the regression effect in other disciplines, is a principle stating a relationship between a measurement that is used to split a population into groups, and a second measurement of the groups thereby created. ...
Updates and revisions First introduced in 2003,[16] PECOTA projections are produced each year and published both in the Baseball Prospectus annual monographs and on the BaseballProspectus.com website. PECOTA has undergone several improvements since 2003. The 2006 version introduced metrics for the market valuation of players based on the predicted performance levels. The 2007 version introduces adjustments for league effects, to account for differences in the competitive environment of the two major leagues.[17] In finance, valuation is the process of estimating the market value of a financial asset or liability. ...
Accuracy Although Baseball Prospectus promotes PECOTA commercially as "deadly accurate," all projection systems are subject to considerable uncertainty. Nonetheless, the test of PECOTA is its ability to make accurate forecasts in comparison with alternative forecasting methods. A comparison for the 2006 season shows that PECOTA outperformed several other forecasting systems in predicting hitting (OPS) and performed nearly as well as the best of the other systems in predicting pitching (ERA).[18] Look up forecast in Wiktionary, the free dictionary. ...
In baseball statistics, on-base plus slugging (denoted by OPS) incorporates on base percentage (OBP) and slugging percentage (SLG). ...
In baseball statistics, earned run average (ERA) is the mean of earned runs given up by a pitcher per nine innings pitched. ...
Although designed primarily for predicting individual player performance, PECOTA has been applied also to predicting team performance. For this purpose, projected team depth charts are established with projected playing times for each team member, drawing on the expert advice of the Baseball Prospectus staff. The number of runs a team will score and allow during the coming season is estimated based on the playing times and PECOTA's predicted individual performance of each player, using a "Marginal Lineup Value" algorithm created by David Tate and further developed by Keith Woolner.[19] A team's expected wins is based on applying an improved version of Bill James' Pythagorean Formula to the estimated number of runs scored and allowed by the roster of players under the given playing-time assumptions.[20] A example of a depth chart In sports, a depth chart is used to show the placements of the starting players and the secondary players. ...
Keith Woolner is an author for Baseball Prospectus and is the creator of the runs-based statistic VORP or Value Over Replacement Player. ...
Pythagorean expectation is a formula invented by Bill James to estimate how many games a baseball team should have won based on the number of runs they scored and allowed. ...
PECOTA has been used in preseason forecasts of how many wins teams will attain and in mid-season simulations of the number of wins each team will attain and its odds of reaching the playoffs.[21] In 2006, PECOTA's preseason forecasts compared favorably to other forecasting systems (including Las Vegas betting line odds) in predicting the number of wins teams would earn during the season.[22] Vegas redirects here. ...
Notes - ^ The acronym was actually based on the name of journeyman major league player Bill Pecota,[1] who with a lifetime batting average of .249 is perhaps representative of the typical PECOTA entry. In 2007 the Houston Astros organization created their own system and named it PANKOVITS, an acronym in honor of former utility player Jim Pankovits that stands for Player Analysis with Neutral Knowledge of Offensively Vital Information Tracking Statistics.
- ^ Illustrative PECOTA estimates and "cards" are available for inspection by nonsubscribers here: http://www.baseballprospectus.com/pecota/.
- ^ Gary Huckabay, "6-4-3: Reasonable Person Standard," BaseballProspectus.com, August 2, 2002.
- ^ Rich Lederer, "An Unfiltered Interview with Nate Silver," Baseball Analysts, February 12, 2007.
- ^ http://baseballprospectus.com/glossary/index.php?mode=viewstat&stat=38
- ^ This difference is explained and illustrated in Nate Silver, "Introducing PECOTA," Baseball Prospectus 2003 (Dulles, VA: Brassey's Publishers, 2003): 507-514.
- ^ http://www.baseballprospectus.com/unfiltered/?p=136. Also see Baseball Prospectus' glossary entry for "Comparable Players".
- ^ See PERA for an example of the use of peripheral statistics to estimate a performance.
- ^ Alan Schwarz, "Numbers Suggest Mets Are Gambling on Zambrano," New York Times, August 22, 2004.
- ^ Alan Schwarz, "Predicting Futures in Baseball, and the Downside of Damon," New York Times, November 13, 2005.
- ^ Nate Silver, "Baseball Prospectus Basics: The Science of Forecasting," BaseballProspectus.com, March 11,2004.
- ^ William Hageman, "Baseball By the Numbers," Chicago Tribune, January 4, 2006.
- ^ See Clay Davenport, "DT's vs. MLEs — A Validation Study," BaseballProspectus.com, January 30, 1998; Clay Davenport, "Winter and Fall League Translations: Just How Good Are These Leagues, Anyway?," BaseballProspectus.com, January 27, 2004; and Clay Davenport, "Over There! A Second Review of Translating Japanese Statistics, and Translating the Mexican League," Baseball Prospectus 2004 (New York: Workman, 2004): 585-590.
- ^ PECOTA's aging adjustment is implicit in the path of "future" performance of the set of historical "comparable" players.
- ^ Among the current major alternative statistically-based projection systems are Tom Tango's Marcel projections (available and documented for 2007 at The Hardball Times); Diamond Mind Baseball, also described in an ESPN.com article on 2007 team projections; Ron Shandler's Baseball HQ and his annual book, Baseball Forecaster; The Hardball Times pre-season forecasts, inaugurated with the 2007 season; Chone Smith's "Chone Projections," reported on the website of Fangraphs.com; Baseball Info Solutions – BIS; and Dan Szymborski's "ZiPS" Projections.
- ^ Nate Silver, "Introducing PECOTA," Baseball Prospectus 2003, cited previously.
- ^ "Baseball Prospectus Chat: Nate Silver," BaseballProspectus.com, January 19, 2007.
- ^ Dan Szymborski, "2006 Projections," BaseballThinkFactory.com, December 14, 2006.
- ^ Keith Woolner, "Marginal Lineup Value," StatHead.com.
- ^ On the Pythagenport formula, see Clay Davenport and Keith Woolner, "Revisiting the Pythagorean Theorem: Putting Bill James' Pythagorean Theorem To the Test," BaseballProspectus.com, June 30, 1999 as well as the Baseball Prospectus glossary entry for "Pythagenport"[2]. On the construction of the depth charts for each team and the application of PECOTA to estimating team wins, see Nate Silver, "PECOTA Projects the American League," BaseballProspectus.com, March 21, 2005; and Nate Silver, "PECOTA Breaks Hearts," BaseballProspectus.com, March 29, 2006.
- ^ See Clay Davenport, "Playoff Odds Report: The Addition of PECOTA," BaseballProspectus.com, May 3, 2006 and Baseball Prospectus Statistics.
- ^ Nate Silver, "Projection Reflection," BaseballProspectus.com, October 11, 2006.
William Pecota (born February 16, 1960 in Redwood City, California), is a former professional baseball player who played third base in the Major Leagues from 1986-94. ...
Batting average is a statistic in both cricket and baseball measuring the performance of cricket batsmen and baseball hitters, respectively. ...
Major league affiliations National League (1962âpresent) Central Division (1994âpresent) Current uniform Retired Numbers 5, 24, 25, 32, 33, 34, 40, 42, 49 Name Houston Astros (1965âpresent) Houston Colt . ...
Peripheral ERA is a pitching statistic created by the Baseball Prospectus team. ...
Alan Schwarz (b. ...
Alan Schwarz (b. ...
Tom Tango, who has as an online presence as TangoTiger, is an expert in baseball sabermetrics and ice hockey statistical analysis, and runs the Tango on Baseball sabermetrics website. ...
Diamond Mind Baseball is a computer baseball game known for its highly statistical accuracy. ...
It has been suggested that IPORT be merged into this article or section. ...
Keith Woolner is an author for Baseball Prospectus and is the creator of the runs-based statistic VORP or Value Over Replacement Player. ...
Clay Davenport is a baseball sabermetrician and a writer for the Baseball Prospectus. ...
Keith Woolner is an author for Baseball Prospectus and is the creator of the runs-based statistic VORP or Value Over Replacement Player. ...
References - William Hageman, "Baseball by the Numbers," Chicago Tribune, January 4, 2006.
- Jonah Keri, "'Tis the Season to Project Stats," ESPN.com, February 14, 2007[3].
- Rich Lederer, "An Unfiltered Interview with Nate Silver," BaseballAnalysts.com, February 12, 2007[4].
- Alan Schwarz, "Numbers Suggest Mets Are Gambling on Zambrano," New York Times, August 22, 2004.
- Alan Schwarz, "Predicting Futures in Baseball, and the Downside of Damon," New York Times, November 13, 2005.
- Nate Silver, "The Science of Forecasting," BaseballProspectus.com, March 11, 2004[5].
- Nate Silver, "Introducing PECOTA," Baseball Prospectus 2003 (Dulles, VA: Brassey's Publishers, 2003): 507-514.
- Nate Silver, "PECOTA Takes on the Field: How'd It Fare Against Six Other Projections Systems?" BaseballProspectus.com, January 16, 2004[6].
- Nate Silver, "PECOTA 2004: A Look Back and a Look Ahead," Baseball Prospectus 2004 (New York: Workman Publishers, 2004): 5-10.
- Nate Silver, "Rearranging PECOTA," Baseball Prospectus 2006 (New York: Workman Publishers, 2006): 6-11.
- Nate Silver, "Why Was Kevin Maas a Bust?" Baseball Between the Numbers, Ed. Jonah Keri (New York: Basic Books, 2006): 253-271.
- Childs Walker, "Baseball Prospectus Makes Predicting Future Thing of Past," Baltimore Sun, February 21, 2006.
|