Does the White Sox April Performance Change Anything?
A breakdown of the White Sox month of April performance and a comparison of that performance to other 21st century teams.
There is an old statement about April baseball with which many baseball fans are familiar. “A division can’t be won in April, but it can be lost.” The source of this long-accepted quote seems to be lost to history, but it has been repeated many times.
The next question to ask is how true is this? How important is April performance? What end-of-April winning percentage dooms a team? More importantly for our purposes, what does the White Sox’s current April performance mean in terms of postseason odds, or even the odds of making 2019 a winning year, their first since 2012?
First of all, a basic understanding of correlation in statistics is required. Many of these stats use what is called a correlation coefficient, which analyzes how closely related two sets of data are. The number can be anywhere from negative one to positive one. The closer number is to positive one, the closer the correlation.
For example, if two sets of data were “1, 2, 3, 4” and “2, 4, 6, 8” the data is perfectly correlated. This would mean a correlation coefficient of positive one. If a number from the first set increases, the number from the second set is increasing at the same rate consistently. As for a negative correlation, if the numbers were “1, 2, 3, 4” and “-2, -4, -6, -8” then the correlation would be -1. The closer to one or negative one, the more closely related the data is. If two sets of data have a correlation coefficient near 0, there is no real relation between the two sets of data.
The data set used for this study is every team’s wins, losses, win percentage, runs scored, runs allowed, run differential, expected winning percentage, expected wins and luck. All of these specific stats have been tracked for every team in baseball from 2000-2018, giving us 19 years of data for 30 teams each year. This data analyzes 570 different teams from this century.
Not only are the listed stats analyzed from the beginning of the year, but they are also analyzed for the end of the season, as well as for the season after the month of April. It is, essentially, an insane amount of data.
As for an explanation of some of the more complex stats listed, run differential is quite simply the difference between runs scored and runs allowed. Expected winning percentage (or Pythagorean win percentage) is calculated on Baseball-Reference.com and uses a team’s runs scored and runs allowed to determine how many games a team should be winning. Expected wins is simply expected winning percentage times the number of games played.
“Luck” is a little bit of a strange stat, accounting for the difference between a team’s actual wins and a team’s expected wins (as calculated by run differential). One of the strongest examples of luck is the 2018 Seattle Mariners. At the end of June 2018, the Mariners had a 53-31 record, giving them the fourth-best record in the American League. Their 63.1 percent winning percentage would break their playoff drought very easily.
For anyone that paid attention to run differential in 2018, the Mariners at that point of the season had only scored 21 more runs than they had allowed. Based on run differential, they would be expected to win 44 games at that point. But their 53-31 record showed they were nine games better than their run differential would expect them to be. This is typically not sustainable. By the end of the season, their winning percentage slid to 54.9 percent and they did not make their way into the playoffs.
To make the data a little bit easier to digest, hyperlinks that show data and graphs are linked within the article here.
Now that we have the basics established, it’s time for a ton of data. Some of the most interesting correlations include April winning percentage compared with end-of-year winning percentage, which shows a correlation coefficient of .539. Surprisingly, April run differential actually implies a stronger correlation in terms of end-of-season winning percentage. It isn’t a huge difference, but the correlation of .584 is a bit stronger.
Typically run differential at the start of the season can have a small sample size problem. In this instance, however, a small sample size is not an issue because this data includes 570 different Aprils and over 14,000 games. This would imply that outscoring your opponents is a better indicator of end-of-season success than actual wins and losses. A very surprising discovery indeed.
There is a much weaker link when analyzing winning percentage through the end of April to winning percentage after April, coming out to only .376, which shows there is a correlation between April performance and after-April performance, but it is not very strong. This is much weaker than the .432 correlation of April run differential and after April winning percentage. Essentially, end-of-April win percentage means practically nothing for performance after the month of April and run differential in April is a much better indicator of performance after the month of April.
Now how does the end-of-April performance tie into making the playoffs or even being a winning team? From 2000-2018 there were 108 teams that ended April with a winning percentage of 40 percent or less, and eight of those teams made it to the playoffs. That means that a team under 40 percent at the end of April has a 7.4 percent chance of making the playoffs. As for teams that started truly terribly, not a single team made the playoffs from 2000-2018 with a winning percentage under 30 percent through April.
In this category are a couple of prime candidates who lost their division in April. The 2005 Cleveland Indians and 2016 Houston Astros seem to be the best examples in the 21st century.
That Cleveland team, who put a strong scare in the 2005 World Series champion White Sox, won 60.4 percent of their games after April. They are actually the only team since 2000 to win 60 percent or more of their games after April and not make the playoffs. How did this happen? Well, Cleveland was 9-14 at the end of April, putting them already 7.5 games behind the White Sox. That April truly did lose Cleveland that division.
As for that 2016 Astros team, they started the season 7-17. That is the 20th-worst start in this entire data set. After April, the Astros maintained a 77-61 record (a 55.8 percent winning percentage). That win percentage would be 90 wins over the course of a full season. Most 90-win teams make the playoffs and, if they had reached 90 wins, the Astros would have made the playoffs with a chance to host a home Wild Card game. This team didn’t lose a division in April, but they may have lost a playoff spot during the first month of the season.
On the other hand, teams can dream at the end of April. Of the 17 teams to win at least 70 percent of their April games, six failed to make the playoffs. Only one team has managed to win 70 percent of their April games, win more than half of their games after April and still miss the playoffs. That team? The 2006 White Sox.
End-of-April performance doesn’t seem to have a big impact on postseason success either. Two teams that ended April with a losing record went on to win the World Series. It actually happened in back-to-back years when the 2002 Anaheim Angels and the 2003 Florida Marlins both accomplished this strange feat.
What does all of this data mean for the 2019 White Sox? Well, first of all, the prevailing thought was the White Sox would not be a playoff team in 2019. Does their current record change that? How do teams with their end-of-April performance in winning percentage as well as run differential typically end up?
The White Sox ended this April with a 46.2 percent win percentage. Of the 159 teams from 2000-2018 with a winning percentage somewhere between 40-49.9 percent, 28 of those teams made the playoffs. That would give the White Sox a 17.6 percent chance of making the playoffs. While that’s not very good, it is much better than the 0.6 percent chance Fangraphs is currently giving the team.
The run differential data is not exactly encouraging either. The White Sox run differential sits at -13. Teams with a similar run differential, between -10 and -15, made the playoffs six times. The data shows 41 total teams with that kind of run differential.
Purely looking at the data though, there is a little hope for encouragement. At the end of April, the 2003 Marlins had a 45.9 percent win percentage and a -13 run differential. They were a young team coming off a losing season but they went on to win the World Series. Of course, that Marlins team was one of the most surprising teams in baseball history. The likelihood of the Sox turning their current situation into a great season is pretty low.
Now, if someone were to be a pessimist, the 2004 Diamondbacks actually had a better winning percentage and run differential than these 2019 White Sox. They had a 46.8 percent winning percentage and a -9 run differential…and managed to lose 111 games.
As with most things, the Sox are probably in the middle. Improvement seems pretty likely. The 159 teams with an end-of-April winning percentage of 40-49 percent only had five examples ending with 100 or more losses. That’s a 3.1 percent chance of being a 100-loss team based only on win percentage.
Similarly, 153 teams from 2000-2018 have had a run differential of -1 to -20. Two of those ended the year with 100 or more losses. So that result seems quite unlikely for the White Sox this season.
Overall, the data seems to suggest the White Sox are doing about as expected. They are an improved team compared with last year, but not a very good team overall. For comparison, last year’s Sox team had a 30.8 percent winning percentage at the end of April and a -42 run differential. This season may not be great, but there are some very encouraging signs in terms of the team’s performance going forward.
When looking at last year’s White Sox team, no 2000-2018 team has made the playoffs with that low of a winning percentage. Only one of the 33 teams with as poor of a run differential as the 2018 White Sox went on to make the playoffs. That was the 2006 Twins who actually helped stop the White Sox from making the playoffs and having a chance to defend their 2005 World Series championship.
When there is this much data collected, it would be a waste to not use it for some random facts and some comparisons of other teams’ current performances. First, 15 teams have ended April with a run differential larger than 50. All 15 of those teams won at least 60 percent of their games and three of those teams went on to miss the playoffs. All of them managed to be at least .500 at the end of the year.
The 2011 Indians have the highest April run differential in a season where they ended under .500. At the end of April, they were 18-8 with a positive 46 run differential. After April they went 62-74 with a -102 run differential. That’s a great collapse.
Trying to calculate “luck” in baseball can be very difficult. The stats Baseball Reference uses are pretty good though, looking at 570 total seasons from 2000-2018. Twelve of those seasons had a team winning or losing 10 more games than their run differential would expect. That means 97.9 percent of teams were within 10 games of their expected wins and losses. It isn’t perfect by any stretch, but that is pretty accurate overall.
Run differential and winning percentage both seem quite concerning for this year’s defending World Series champion Red Sox. Thirty-three teams have had a run differential of -25 through -30 in this data set and just three of those teams have made the playoffs. The Red Sox’s 43.3 percent winning percentage compares much more closely to a bottom-ten team than a defending World Series champ. They have improved lately, but they may be that team that lost a division in the first month of the season.
A team like the Reds should have a lot more hope than their winning percentage would imply. I’m sure they would trade their positive run differential for an over .500 winning percentage, but their run differential seems to point to them being a better team than they have shown. Based purely on runs scored and runs allowed, they are the least-lucky team in baseball through the month of April.
It is kind of amazing that the New York Yankees have stayed where they are in the standings and have maintained such a good run differential given the injuries they have had to deal with. If their performance improves throughout the course of the season, they will be extremely difficult for the Tampa Bay Rays to deal with.
In the end, April winning percentage is obviously important but April run differential is statistically more important. April performance has very little to do with performance after the month of April. All April performance seems to do is set teams up for potential success and crush hopes of successful seasons pretty early.