When the Gini Index Is Redundant

The Gini Index as a measure of income inequality is redundant when using U.S. Census Bureau’s household income statistics, at least at the national and state levels. That Gini index is fully determined and can be replaced by the household income share of the fifth quintile.

The Census Bureau has measured household income since 1967. Over those 47 years it has calculated the Gini index, a widely used measure of income inequality. Here’s a graph of that Gini index from the beginning to the most current data.

GiniYearYou’ve probably already seen a similar graph. But for emphasis, income inequality measured by the Gini index has been increasing on average since 1969. Also note that a step occurred between 1992 and 1993 because of a change in survey methodology related to top-coding.1 Regardless, except for variability the trend is upward, income inequality is getting worse over time.

In spite of its wide use, among the problems with the Gini index is that it’s not readily interpretable. It’s a synthetic index, Thomas Piketty’s designation in Capital in the Twenty-First Century. It’s just a number. You can use it to compare to similarly derived numbers or observe that it’s trending but beyond that (partitioning for example) will most likely require learning new quantitative skills. We can easily do better than that.

Among the Census Bureau’s yearly income statistics are percentage shares of household income by quintiles plus the top five percent. It turns out that over those same 47 years, the percentage share of the top or fifth quintile determines the Gini index.

GiniFifthThe R-square value for this relationship is 0.9994 (where 1.0 is perfect determination). In short, the percentage share of the fifth quintile is equivalent to the Gini index. And it’s readily interpretable! For example, in 2013 the fifth quintile was 51.0, that is the top 20% of the income distribution took in 51% of all income for that year. I also looked at the same relationship but over the 50 states for the year 2013. That relationship was similar but with a bit more variability; the R-square was 0.987. I’d expect that kind of relationship to hold for other years as well. I didn’t try it at the county level but except for a further increase in variability due to smaller populations, I’d expect the association to continue to be robust. On the other hand, I have no idea how a similar relationship would perform in other countries that use the Gini (many don’t). However in the U.S., the Census Bureau is the main source of income data including the Gini index and share of income so substituting the top quintile for the Gini is useful because it’s readily available, makes sense, and lends itself to interpretation.

In 2013, the mean income for the top quintile was $185K, compared to average income for the fourth quintile of $84K, a significant difference of $101K. Also, the fifth quintile averaged over the last few years captured 51% of total income compared to a 43% average income share in the beginning of the timeline. As an inequality measure, the percentage share of the highest quintile is accessible and understandable. And it makes sense. The following graph shows the rise of the fifth quintile while the other four quintiles declined, the four being combined in the graph for visual clarity.

FifthBottomFourYearOnce graphed you can readily see what’s happening. And of course if you want to, you can use the Census tables to break it down further.2

The Gini index is commonly used so you can’t just ignore it. But if you use U.S. income data and need to discern inequality and why it is what it is, you are better served using the percentage income share of the fifth quintile and related data.


  1. See Daniel H. Weinberg, “A Brief Look at Postwar U.S. Income Inequality,” June 1996, http://www.census.gov/prod/1/pop/p60-191.pdf. Also of interest, the first page shows family income inequality (instead of household) which begins with 1947: from 1947 to circa 1968 inequality declines.
  2. The data used for the three graphs are avilable here as an Excel spreadsheet. Search terms, what precedes the parentheses, for the original Census tables are: H02AR 2013 (share aggregate income by fifths), H03AR 2013 (mean HH income by fifths), Census H04 2013 (Gini index). State data can be obtained via FactFinder, tables B19081 (mean income) and B19083 (Gini).

12 thoughts on “When the Gini Index Is Redundant

  1. Pingback: Links for 11-16 -14 | The Penn Ave Post

  2. Pingback: Links 11/17/14 | naked capitalism

  3. Typo Alert: First paragraph “…can be replaced by the household income share of the fifth centile.”

    Quintile, centile…but who’s counting? 😉

    Good article and useful analysis. Thanks!

  4. Pingback: Chart of the day | occasional links & commentary

  5. I like this method and it makes sense. I would note that the use of Census data means that the result is an under-representation or conservative estimate of inequality. The survey misses high income earners,such as the top hedge fund managers. They routinely get paid several billion (and yes, that is salary, not wealth – and yes, billion with a B)
    The inequality at the top 1% and .1% is staggering and often gets omitted, thus making inequality seem less severe than it is.

    • Thank you, Paul, for your observation.

      The Current Population Survey still top codes earnings, I believe, at $1M. That was the change that occurred for 1993, where the CPS censored earnings at $1M, previously it was $300K. (This is noted in the Weinberg reference given in Notes 1.) Additionally, the CBO which estimates a correction for top-coding, in Table 1 of their recent report, “The Distribution of Household Income and Federal Taxes, 2011,” gives before tax income of the highest quintile in 2011 at $246K (compared to $185K in 2013 via the CPS, also pre-tax). So, yes, using Census data understates inequality. However, the tight relationship presented in the second chart above empirically shows that it’s been that way since 1967, regardless whether one uses the Gini index or the highest quintile. Also, to see a partial effect of income censoring note the jump in inequality due to the change in top-coding in 1993. We have consistently understated income inequality.

  6. Pingback: Somewhere else, part 185 | Freakonometrics

  7. This may sound a bit naive but I would expect both the Gini coefficient and the share of the top fith quintile to be heavily autoregressive time series. If this is the case, your estimate for the variance of the estimator is biased and your R2 unreliable.
    Does the result still exist when regressing changes on the Gini on changes of the share of the top fith quintile? And when correcting for higher-order autocorrelation?

  8. While this seems like a good idea in spirit, it appears that the 2 variables in your analysis have been directly computed from each other and rounded off. The residual plots against the predictor or fitted values display a pattern of 8 parallel lines with negative slopes – an indication of possible truncation. We’d expect random scatter in these plots.

    While the Gini indicies seem correct, I’ve seen other reports that indicate the income share earned by the top 20% is far greater than 50% as shown on your plot. Can you check to make sure these are the correct income share values?