Stealing Baseball

Mathematicians break baseball down to its simplest component – cold, hard stats – in hopes of finding the best players.

Brad Kingarchive page

November 16, 2005

Baseball is near and dear to my heart. I love the game. My first memory is that of holding a bat (which explains why I carry a bat around the TR offices). As I grew older, I grew to love the statistical analysis inherent in the history of the game. I can go back through 100 plus years of games, seasons, and careers – comparing statistics from modern players with all-time greats. I can, I’d imagine, quantifiably prove that there was such a thing as the Dead Ball Era (there was) and the Steroid Era (there is).

Despite my love of statistics, I have never been one to rest my heart and soul on simple math when it comes to choosing great players. Numbers never tell a complete story. Life is too complex for 2+2 to always equal 4. So my heart skipped a beat yesterday when I read this USA Today article about a husband-and-wife team that developed a computer simulation which predicted the Major League Baseball’s Cy Young Awards, arguably the most coveted pitching award in the game. The system crunched a variety of numbers, giving weight to each one, and then associated a final number with each pitcher who played last year.

No way could this application be right. For a game built on subtlety, it struck me as odd that a silly computer, relying on cold, hard numbers, could tell a complete story. Formulas and numbers only get you so far. Eventually, you need to make a leap of faith that can’t be explained in 2+2=4. And yet, 2+2 always equals 4. And it’s hard to get away from that.

But there is something inherently distrustful about cold, hard numbers, a fact the mathematicians faced as they prepared to announce their decision:

But being human, and perhaps more importantly, baseball fans, the mathematicians made their own mistake the week before the award announcements. Overriding the model, they instead predicted that the New York Yankees’ Mariano Rivera would win the American League Cy Young. They argued that voters would see it as “lifetime achievement” award in a year of weak American League contenders for the prize.

They were, of course, completely wrong. The computer model correctly picked Chris Carpenter and Bartolo Colon, the two recipients of the award this year.

Keep Reading

Large language models can do jaw-dropping things. But nobody knows exactly why.

And that's a problem. Figuring it out is one of the biggest scientific puzzles of our time and a crucial step towards controlling more powerful future models.