Flawed Happiness Research

Among the themes of the next few posts will be the idea of meaningfulness; is your life significant, does it have value. Pursuing the idea of meaningfulness led to my reading on happiness, not that I’m a stranger to happiness literature. This post is a critique of one of the happiness papers.

Meaningfulness and happiness can overlap. However they are not necessarily the same; you can experience one and not the other. A religious missionary might live a meaningful life but not be happy. Conversely, as I’m sure you’ve met, there are people who are happy but whose lives lack objective meaning. Furthermore, the similarities between happiness and meaningfulness can confound survey responses and survey interpretation.

This post is a critique of a September 2011 paper in Psychological Science, “Income Inequality and Happiness” by Shigehiro Oishi, et al. (in the following I will refer to the paper as IIH).1 IIH purportedly shows a statistically significant association between happiness and income inequality, which I challenge here. The following is also a demonstration, in both method and value, of letting the data speak.

The usual happiness question in surveys is similar to the one asked in the General Social Survey (GSS), the one used for IIH: “Taken all together, how would you say things are these days– would you say that you are very happy, pretty happy, or not too happy?” with responses coded as 1, 2, and 3. This question, labeled HAPPY, has been asked in every GSS since 1972, so there’s a history.

The responses to HAPPY are ordered but without measurable distance between them (1 Very Happy, 2 Pretty Happy, and 3 Not Too Happy). That is, they are ordinal numbers. How you analyze ordinal numbers has been and continues to be contentious.2 The prevailing advice tends toward the pragmatic; treat ordinals as if they were interval data, but check that assumption just to be sure.

About the Research
The authors of IIH averaged the three possible responses to HAPPY for each survey year, weighted by the percentages responding. They then compared those averages to income inequality estimates (the Gini coefficient) indicating that as income inequality increased, average happiness declined. The authors note that an estimated trend between the two measures was statistically significant (p < 0.05). They then looked at income subgroups and determined that declining happiness and increasing income inequality was in the lower 40% of income. Investigating possible mediating factors, they note the significant factor of perceived levels of fairness and trust. In conclusion they write:

…Americans are happier when national wealth is distributed more evenly than when it is distributed less evenly. If the ultimate goal of society is to make its citizens happy…then it is desirable to consider policies that produce more income equality, fairness, and general trust.

Based on a quick Google search and what’s noted on the Psychological Science website3, IIH received wide circulation and acceptance.

The Critique
As part of my meaningfulness pursuit, I read IIH and was struck by their Fig. 2, which shows “mean American happiness scores as a function of income inequality.”

Fig2_IIHLooking at their graph I saw more than a cloud of data with a trend, the authors’ claim. I also saw three clusters of data, each of which had zero trend, a possible step function with down-steps occurring around 1980 and 1990. GSS does a sample frame adjustment after each decennial census. Maybe a survey adjustment had something to do with the trend?

My curiosity was piqued so I downloaded the files from the GSS and Census Bureau and regressed the data pairs, the averages of HAPPY and the Gini coefficient. (The data as an Excel file is here.) I got a similar result as the authors.4 Abandoning my thoughts about sample frame adjustments, I reflected on the coding of the three responses. It didn’t seem reasonable that the “interval” between Pretty Happy and Not Too Happy was the same as between the happiest responses. For example, why wasn’t there a middle response such as “Average?” I doubled the distance: I recoded Not Too Happy as a 4 instead of a 3. That change is fair-and who’s to say it’s not since there’s no objective interval measure. Having done that, the trend was no longer statistically significant (p = 0.057).5  Such instability to assumptions is not what you want in your analyses.

Furthermore, does it even make sense to take the mean of the responses-1, 2, 3, weighted by their percentages? Let’s look at the data.

ThreeResponsesHappyInequalityI focused on 1990 as the break point. It could as well have been 1991 or 1993, but I picked 1990 for data exploration. Note that Pretty Happy, the top set of data points in the above graph, accounts for the majority of responses (i.e., it is the mode response). If it weren’t the major response, and momentarily assuming there were only two possible responses, if it and the bottom set of data, Not Too Happy, had similar percentages (say 50% each) then their averaged trends would cancel because they would be mutually exclusive. Fewer people responding “happy” necessarily means more replies of “not happy.” The responses are not independent. Unless one response dominates, averaging will mute, sometimes cancel, their separate effects. What I usually do with ordinal data like these is combine percentages of similar responses. Unless, as in this case, one response category is the clear majority.

It’s Not Income Inequality
The next graph shows the change over time of the Gini coefficient and the percentage responding Pretty Happy. Both sets of data are relative to their respective 1972 values. The broken arrow effect beginning around 1990 is manifest.

PrettyHappyInequalityThe data are telling us something affected Pretty Happy midway-and that something, whatever it is, is sustaining. So what is it? I conjecture the broken arrow is caused by growing economic insecurity, and obviously not a function of increasing income inequality. Around the 1990 recession, risk being pushed down on many of us finally hit home. Subsequently, except through increased borrowing, we didn’t really partake, not in any solid way, of the boom years of 1993 through 1999. Job and wage insecurity have increased, two-income families have risen along with the vulnerabilities that creates, guaranteed pensions are becoming a thing of the past, health benefits have eroded, personal debt has greatly increased, training declined, and precarious employment has grown. On top of all that the political class became more self-centered and indifferent to the people. This is the great risk shift, the transferring of risk from organizations to individuals. And it makes sense that happiness declines with increasing risk and insecurity.

Here’s another slice of data from the same General Social Survey.

JOBSEC This graph shows, as sparse as the data is, a similar broken arrow, but this time the data comes from a question about the importance of job security. Beginning circa 1990, workers feel less secure, and as we’ve seen less happy.

Further Remarks
OK, there’s more going on here than just analysis using ordinal numbers. Inequality measures, like the Gini coefficient, are broad gauges: one can find many associations for them. Recently I saw a table of commute times, which have increased over the years. That data, I’m sure, would strongly correlate with measures of increasing income inequality. Capacious measures like the Gini can fool. They are useful but require care.

Likewise, the happiness concept can accommodate different views. The other day a neighbor told me of not being happy at work. Does she really mean she’s not happy, or is it that her work is not meaningful? That’s an important distinction if you want to understand her feelings and the work situation. When the question is asked in a survey, “Taken all together, how would you say…,” that question channels a response which makes polling easier (numbered 1, 2, 3) but avoids nuance. Limited self-reported responses might be like the shadows on the wall in Plato’s Cave. Happiness research has produced some revealing results. But the data and metrics need to be treated with respect, just as you treat ingredients for cooking with respect.

By way of epistemic context, a zeitgeist of impatience is growing regarding irreproducible research not being noted and the research corrected or results retracted.6 I pursued meaningfulness, then looked to better understand how happiness fits in, which resulted after hours of work in this unplanned blog post. It was not my intent to contribute an irreproducibility study. Nevertheless, here it is. I will contact the IIH authors and others about my findings.

Is taking an average of three mutually exclusive ordinal responses conducive to solid analysis? Probably not. Surely researchers should investigate the data and analytical results for robustness, especially if averaging or other modeling is desired. Claiming statistical significance from ordinal data definitely needs to be checked (e.g., what happens if a 3 is changed to a 4 or 5). Other validation should be done. For example, do the results continue to stand if 10% of the data are withheld, is the trend still there? If there’s an underlying trend, then a subset of the data should also trend.7

It’s time for renewed study and recommendations regarding methods for analyzing ordinal data. Practical guidelines with examples would be especially helpful.

Researchers, their reviewers and editors-all three-need to increase their vigilance to further analytical robustness.

Last, the reasons for the observed decline beginning in the early 1990s in happiness and increasing economic insecurity (the last two graphs) ought to be researched and reported. It was 1990 that Jensen and Murphy urged a change in the way corporations pay CEOs.8 In 1993 tax laws were rewritten to accommodate the Jensen and Murphy ideas. Concomitantly CEO pay soared. Did the emphasis on stock price and subsequent CEO pay inflation cause the declining interest in workers’ welfare and human capital? I think so, but that’s just conjecture. For years prior to 1990, long-term employment was emphasized; job hopping was a black mark against a prospective employee. Then that changed-why?

Actually, a lot changed.


  1. A copy of the IIH paper is here: http://www.factorhappiness.at/downloads/quellen/S13_Oishi.pdf
  2. The 1946 paper by S.S. Stevens remains central in the debates:  http://personal.stevens.edu/~ysakamot/719/week3/Stevens_Measurement.pdf
  3. http://pss.sagepub.com/content/22/9/1095
  4. There were a couple immaterial differences. I used household Gini rather than family: I believe household is more commonly used and it includes one-person households. The authors reversed the HAPPY codes (using 3,2,1) whereas I kept the original GSS coding (1,2,3).
  5. If Not Too Happy is recoded to 5 instead of 4, the p-value changes to 0.12.
  6. This blog post, http://www.washingtonpost.com/blogs/monkey-cage/wp/2014/01/21/where-to-debunk-political-science-findings//?print=1, and its links is a good place to start.
  7. As an example, in an earlier post (http://www.lettingthedataspeak.com/the-human-cost-of-ideology-iii/) I excluded 20% of the data and the trend held just fine.
  8. A copy of the paper, “Performance Pay and Top-Management Incentives” by Michael C. Jensen & Kevin J. Murphy, can be found here: http://leeds-faculty.colorado.edu/bhagat/Jensen-Murphy.pdf.

2 thoughts on “Flawed Happiness Research

  1. Once again, Dr. de Libero, you have demonstrated an aspect of your meaningful energy by making simple the complex for the laity, through your poignant research, and writing. Thank you for food for thought. This article of yours inspires as it reminds me to aim for meaning.
    t.k. dawidalle

  2. Pingback: Skepticism about a published claim regarding income inequality and happiness « Statistical Modeling, Causal Inference, and Social Science Statistical Modeling, Causal Inference, and Social Science