A few comments on costeffectiveness of water quality interventions
(This is a workinginpublic draft of a post that I am still working on. Witold)
Here are some extra notes and details on GiveWell’s Change Our Mind entry on costeffectiveness of water quality interventions by Matthew Romer and Paul Romer Present (MRPRP henceforth). Their entry is here.
I won’t summarise the article as it already has a very nice and short abstract. The relevant result that I want to focus here is that they estimate a decrease in dispensers for safe water (DSW) but not inline chlorination (ILC) costeffectiveness:
Using our updated approach, the costeffectiveness of inline chlorination falls by 0.21% and the costeffectiveness estimate of Dispensers for Safe Water falls by 23%.
This difference is attributable to several factors (see Fig 2 in MRPRP entry), but for this exploration I care about the adjustment to deaths only. (This is the factor pulling the DSW estimates downwards and I am not competent to talk about other parts of their analysis.) For deaths, their calculation uses a new approach, which combines updated metaanalysis^{1}, aka Direct approach, and the Indirect approach (what was the “plausbility cap” for GW) using a probabilistic calculation.
All of the relevant references (links to GW’s and their calculations) are at the end.
Looking at the spreadsheet and calculations
 Main inputs into the new analysis of are not much changed and if they are, they change in favour of water interventions:
 They get a larger number than GW: “Pooled relative risk is now 0.81, compared to 0.86 originally”
 For indirect approach, “Diarrhea RR, adjusted is now 0.82, compared to 0.81 originally”
 Moreover, “Internal validity adjustment for over5 mortality is now 0.460.56, compared to 0.51 originally”
 However, they still get a more negative result
 For example, for Kenya DSW (which will be my running example), GW started with metaanalysis estimate of 6.1% reduction, falling down to 5.6% after adjustments. Under this new analysis, they start with metaanalysis estimate of 8.4% but it falls to 4.6%.
 Hence mortality effects estimated under MRPRP analysis can be ~1020% lower than the original adjusted values that GW used.
 There is also a (slightly lower) drop for ILC, but in this case the subsequent adjustments compensate for this lower mortality estimate.
 The reason for the difference is mainly the new probabilistic analysis, i.e. how direct and indirect evidence is combined.
 In other words, the mortality estimate (direct evidence) gets pulled away from 0.81 toward 0.89. To understand why it’s necessary to look at the model they use closely.
I will now quickly compare the GW and MRPRP approach to indirect evidence.
Indirect approach in GW’s case
Calculations for this are captured in “internal validity adjustments” in GW spreadsheet (Water quality CEA (ILC and DSW) (public)  Google Sheets). GW assumes the following (Mortality plausibility modeling (public)  Google Sheets)
“All infectious diseases and nutritional deficiencies are affected. X% reduction in mortality. Half of “other nonatal disorders” are directly or indirectly related to infection (my very rough assumption), so these are reduced by X/2%.”
Calculation of plausibility cap $PC$ then proceeds as follows (this is my notation):
\[PC = S*(1RR)*IV*EV\] $S$: share of allcause GBD mortality that could be linked to morbidity affected by water
 $RR$: risk reduction in Clasen et al. metaanalysis of diarrhea reductions
 $IV$: internal validity adjustment^{2}
 $EV$: external validity adjustment (in case of Kenya DSW just adherence)
Hence $(1RR)IVEV$ corresponds to morbidity reduction. In the case of Kenya DSW^{3} the numbers are:
\[68.6\% * (10.81) * 0.9 * 0.49 = 68.6\% * 8.6\% = 5.6\%.\]MRPRP calculation using indirect approach
Following the same formula for morbidity reduction but using the new calculation we’d get:
$(1RR)IVEV = (10.78) * 0.9 * 0.52 = 10.2\%$
So this is higher than GiveWell (10.2% vs 8.6%). However, rather than multiplying this by $S$ (68.6%), this will now be subject to statistical model. NB the analysis itself introduces additional factors and adjustments, e.g. relationship between morbidity and mortality reductions (see this sheet). These adjustments seem to me similarly arbitrary to GW’s adjustments (not in a bad way) and in any case I am not enough of an expert to comment on them.
The crucial change seems to be that MRPRP interpret the GW’s plausibility cap on share of deaths affected as “2SDs away from the mean”. Important to cite this verbatim:
MillsReincke effect: Modeled as the fraction of all deaths that the X% reduction in mortality applies to. Normal distribution with mean equal to the sum of deaths due to enteric infections, respiratory infections, and a quarter of other infections and nutritional deficiencies, and standard deviation such that “Assumption 4” in GiveWell’s mortality plausibility modeling is 2 standard deviations above the mean.
In the case of Kenya DSW this means $S$ has a mean of 42% and a 95% interval from 21% to 67%, compared to the 68.6% in GW’s cap.
While other adjustments are made by the authors, if we used the previously mentioned formula, $PC$ = $S(1RR)IV*EV$, we get average reduction using indirect evidence of
$42\% * (10.78) * 0.9 * 0.52 = 4.3\%$
(Once again, this is not how the calculation is done exactly, but it serves as an illustration.)
This estimate of 4.3% reduction is then averaged (probabilistically) with estimate of deaths avoided from direct approach. The combined estimate (incorporating the direct approach, i.e. metaanalysis on mortality) for Kenya DSW is 4.6%.
Just to reiterate: where GW calculated the “plausibility cap”, the authors here interpret it as the upper bound of a probabilistic quantity and then incorporate it directly into the model. This is likely what drives down the overall effect, although I’d have to look closer at other adjustments.
But what happens if we assume more deaths could be affected?
After looking at this model I was interested in how uncertainty in the new assumptions drives the estimates. So I made one basic type of modification to the new analysis. Rather than assume Gaussian distribution on $S$ with mean of 42% and SD set to about 12% (mean + 2SD = 68%, GW’s plausibility cap)^{4}, I did the following:
(Remember that GW’s original estimate for Kenya DSW was 5.6% reduction, this new analysis got 4.6%.)
 Used same mean and SD, but instead of Gaussian I assumed a Student’s T distribution (df=1), obtaining 5.1%.
 Used Gaussian with the same mean, but doubling the SD to 0.24. I got 5.5% reduction.
 Used uniform distribution from 0.1 to 0.75. I got 5.1% reduction.
 Used Gaussian with mean of 68% and SD of 0.05, assuming that plausibility cap is in fact our “mean” belief and we’re highly confident in it. I got 6.8% reduction in mortality.
I am not positing that any of these is the right (or wrong) way to go about this. But the steps 13 clearly show how lowering confidence in the model retrieves a quite different result.^{5}
Conclusion
 MRPRP probabilistic calculation of reductions is a very nice step forward/away from using a plausibility cap and toward averaging over multiple models.
 However, the direct and indirect models are hard to compare, especially in the sense of quantifying uncertainty in each (because the two models are quite different).
 Under the MRPRP model the benefits of water quality interventions appear to be highly sensitive to the assumption surrounding the share of deaths that can be prevented. Authors interpret the GW’s assumption on share of allcause mortality that can be affected (MillsReincke) as the upper end of 95% Gaussian interval.
 A lot of the new result seems to be driven by level of confidence in this belief. In other words, the results of this new analysis may be substantially different even without changing the “average” belief about MillsReincke effect, but simply by reducing confidence in this one input.
 In a couple of modifications (intended for illustrative purposes only) I saw that the mortality reductions under Kenya DSW changed from 4.6% (authors new estimate) to over 5%. E.g. doubling SD on the fraction of deaths affected retrieves the original GW result.
 All of the above is intended only as a demonstration of how the model is sensitive to assumptions. I do not have the expertise to intuit the particular values that should be used. I may also be biased against the indirect evidence model simply because I know the direct evidence well and haven’t spent a lot of time thinking about the indirect model.
Links
Contest entry is here: An Examination of GiveWell’s Water Quality Intervention CostEffectiveness Analysis  EA Forum (effectivealtruism.org) Original GW spreadsheet is here: Water quality CEA (ILC and DSW) (public)  Google Sheets New calculations (spreadsheet): GiveWell Water Quality CEA Examination (ILC and DSW)  Google Sheets Code behind the Bayesian part of new calculations: R code and stan code

They do not change the approach to metaanalysis of mortality reductions. It’s still a fixedeffects analysis (compared to randomeffects model in our metaanalysis, Kremer et al. 2022). But that’s for another time. ↩

They say: “This is a general internal validity adjustment for limitations of the trials included in the Clasen et al. 2015 metaanalysis, which is the basis for our mortality reduction estimate. We correct for selfreport bias separately, so this reflects our rough guess of the degree to which the effect size estimate is overestimated for other reasons. We have not put much thought into this adjustment because morbidity only makes up a small share of total benefits.” see cell E3 ↩

Cell B12 in this sheet: Water quality CEA (ILC and DSW) (public)  Google Sheets ↩

The parameter I’m talking about is
frac_of_deaths_impacted_base
in MRPRP Stan code. ↩ 
Arguably we could also choose to model the proportion on a log scale. If using a lognormal distribution with the same 95% interpretation I get the same value as MRPRP. ↩