• Posts
  • RSS
  • ◂◂RSS
  • Contact

  • Effectiveness: Gaussian?

    August 8th, 2015
    giving  [html]
    Robin Hanson writes:
    Effective altruists often claim that big efforts to re-evaluate priorities are justified by large differences in the effectiveness of common options. Concretely, MacAskill, following Ord, suggested in his main conference talk that the distribution looks more like a thick-tailed power law than a Gaussian. He didn't present actual data, but one of the other talks there did: Eva Vivalt showed the actual distribution of estimated effects to be close to Gaussian.

    But youth movements have long motivated members via exaggerated claims ...

    Hanson is right that one of the key ideas in effective altruism is that the best options are much better than the typical ones, and so it makes sense to really research your options. Toby Ord mentions this in his Taking Charity Seriously talk, I wrote about it here, Ben Todd uses it to argue that most charity fundraisers cause harm; it's pretty central to the discourse. But maybe it's actually Gaussian and we've been reasoning from bad data?

    First, what's the difference? We have a range of possible things we could do to improve people's lives. Each one has a benefit, say in DALYs, and each one has a cost, say in thousands of dollars. What does it look like if we line them all up from least benefit per dollar to most? Graphing the DCP2 estimates for global health and nutrition interventions we get the blue line in the chart below:


    source: Disease Control Priorities Project (DCP2) (csv)

    The other two lines are fake data to help show what distributions the observed data most resembles. The red line shows samples from a Gaussian distribution while the orange line samples from a log-normal distribution. [1] The log-normal distribution does seem to be a much better fit than the Gaussian.

    So why is Vivalt saying the distribution is Gaussian? She's not. She's making a similar but distinct claim: the effect sizes, in standard deviations, for the interventions AidGrade has RCT data for are normally distributed. These are impacts like "improved stoves on chest pain," "bed nets on malaria," and "deworming on height". If you find that chest pain drops by 0.51 standard deviations for people who receive improved stoves, then this intervention scores 0.51 (more explanation). Do this for all the interventions and effects we've measured and what you see is roughly Gaussian:


    source: comment posted by Vivalt.

    There are reasons not to take this entirely at face value, however. For example, we're including things like "conditional cash transfers on attendance rates" and "microfinance on probability of owning a business" where there's no logical unit size like "one bednet per bed" or "one stove per house". Say one study gives people $10 as an incentive for their kid to attend school while another gives $100. You would expect more money to give a larger effect on attendance rates, but with this methodology we'll just compute the average effect.

    It's also likely that researchers are doing some amount of effect-size targeting. Very small effect sizes require large studies to detect, which is expensive, and small effect sizes aren't very interesting anyway because that means the intervention isn't so valuable, so researchers might avoid studies they expect to detect small effects.

    Even if effect sizes and everything else involved in an intervention are normally distributed, however, that's still consistent with a log-normal distribution of cost-effectiveness, because when you have many independent things that are all normally distributed their product will be log-normally distributed. Cost effectiveness generally depends on a chain of causality where the links combine multiplicatively. For a worked example of this, see GiveWell's cost-effectiveness estimate for the benefit of bednet distribution (xls). Since when fully broken down these estimates involve multiplying very many sequential effects, it really makes a lot of sense that we'd see a log-normal distribution of bottom-line effectiveness.

    So: MacAskill and Ord don't actually disagree with Vivalt, and effective altruists are reasonable to be prioritizing evaluation and measurement.

    (But: the data we have is really limited here because the DCP2 numbers aren't very good and reasoning about what we should expect the distribution to look like isn't a very reliable approach.)


    [1] The Gaussian distribution was generated from the mean (21.5) and standard deviation (47.7) of observed data. The log-normal distribution was chosen to be one whose underlying Gaussian had the same mean (0.56) and standard deviation (0.93) as the log base 10 of the observed data.

    Comment via: google plus, facebook

    Recent posts on blogs I like:

    What should we do about network-effect monopolies?

    Many large companies today are software monopolies that give their product away for free to get monopoly status, then do horrible things. Can we do anything about this?

    via benkuhn.net July 5, 2020

    More on the Deutschlandtakt

    The Deutschlandtakt plans are out now. They cover investment through 2040, but even beforehand, there’s a plan for something like a national integrated timetable by 2030, with trains connecting the major cities every 30 minutes rather than hourly. But the…

    via Pedestrian Observations July 1, 2020

    How do cars fare in crash tests they're not specifically optimized for?

    Any time you have a benchmark that gets taken seriously, some people will start gaming the benchmark. Some famous examples in computing are the CPU benchmark specfp and video game benchmarks. With specfp, Sun managed to increase its score on 179.art (a su…

    via Posts on Dan Luu June 30, 2020

    more     (via openring)


  • Posts
  • RSS
  • ◂◂RSS
  • Contact