Friday, January 4, 2019

THE SIMPSON PARADOX

Can both of these statements be true?

  • David Justice of the Atlanta Braves had a higher batting average than Derek Jeter of the New York Yankees every year from 1995 to 1997.
  • Derek Jeter had a higher batter average over the three years 1995 to 1997 than David Justice.
Think about it. One batter had a higher batting average each year for 3 straight years than the other batter, but the other batter had a higher batting average over the same 3 year period. Is that possible?

The answer is; yes.

The possibility for one player to have a higher batting average than another player each year for a number of years, but to have a lower batting average across all of those years is a common example of the Simpson Paradox. In the case of batting averages, the Simpson Paradox presents itself when there are large differences in the number of at-bats between the years.

Briefly stated, the Simpson Paradox is a phenomenon in probability and statistics in which a trend appears in several different groups of data but disappears or reverses when the groups are combined.

Justice’s and Jeter’s statistics are shown in the box below. Highest batting each year and for all 3 years combined is shown in bold type. (Note the differences in at-bats between the two players in 1995 and 1996). 



Derek Jeter

1995
1996
1997
TOTAL
Hits
12
183
190
385
At Bats
48
582
654
1,284
Average
.250
.314
.291
.300

David Justice
Hits
104
45
163
312
At Bats
411
140
495
1,046
Average
.253
.321
.329
.298


No comments:

Post a Comment