4.2.1 One ISM Result
For sites where there is a regulatory requirement to calculate a UCL, at least three replicates should be collected within a DU. For sites where there is no regulatory requirement to calculate a UCL, it is important to understand the potential for decision errors if a decision is to be informed by a single ISM result. Two critical components to a decision error are the likelihood of underestimating the mean and the magnitude of the underestimation.
A single ISM result is likely to underestimate the mean more than 50% of the time for most distributions; the likelihood of a decision error increases as the variance in the distribution increases and the difference between the action level and true mean decreases.Each ISM sample provides an estimate of the true mean—the actual average concentration within the DU. As such, the distribution of ISM results is related to but conceptually different from the distribution of discrete samples. The two approaches share the same grand mean but can be expected to have different estimates of variance. For ISM, the mean of replicates is analogous to repeated trials of discrete sampling (i.e., the mean of the means, or the “grand mean”), and the standard deviation is analogous to the standard error for the mean in discrete sampling. Even the most comprehensive sampling protocols will introduce some degree of sampling error, and it is possible that a single ISM sample result can be well above or well below the true mean. The magnitude of the under- or overestimate depends on the overall heterogeneity of the underlying distribution, increasing as the heterogeneity increases. Figure 4-2 illustrates the probability and magnitude of underestimation of a single ISM sample of n=30 increments collected from DUs with underlying lognormal distributions with CVs ranging 1.0–3.0. The following observations are noted:
- A single ISM sample will underestimate the mean more than 50% of the time for all positively skewed distributions.
- The magnitude of the underestimation depends on the degree of variability, as represented by the CV.
- Approximately one-third of the sampling events with a single ISM sample (n = 30 increments) will underestimate the mean by up to 10% for CV = 1 and 20% for CV = 2. For example, if the true mean is 400 ppm, approximately one out of every three ISM samples (n = 30) will yield an estimated mean <360 ppm for CV = 1, and <320 ppm for CV = 2.
- For a distribution with greater dispersion (i.e., CV = 3), approximately one quarter of the sampling events will yield a single ISM result that underestimates the mean by 30%–60%. For example, if the true mean is 400 ppm and CV = 3, approximately one out of every four ISM samples (n = 30) will yield a sample mean 160–280 ppm.
Figure 4-2. Examples of the probability and magnitude of underestimation of the mean from a single ISM sample.
The same issues with underestimation apply when discrete sampling is used to estimate the mean. As heterogeneity of the soil concentrations increases and sample size decreases, the potential magnitude of error in the estimate increases. Consider what would happen if you sent crews out to the same DU 100 times to collect an ISM sample of 30 increments or a series of discrete samples with which to calculate a mean concentration. If the separated estimates of the mean from these sampling events were plotted as a histogram, they might fit distributions shown in Figure 4-3. The top panel shows estimates of the mean that are normally distributed around the true mean of 100. Given that it is a normal distribution, the estimated mean of approximately half of the sampling efforts is below the true mean and half of the efforts produced an estimated mean above the true mean. The spread of the distribution gives an indication of how far away from the true mean some of the estimates were (i.e., an indication of the potential magnitude of error). As the top panel illustrates, although both distributions are unbiased (centered at the true mean), variability in estimates of the mean are generally less for ISM than for comparable discrete samples due to differences in number of samples collected . The lower panel in Figure 4-3 shows that the potential magnitude of error increases as the estimates of the mean becomes skewed due to heterogeneity.
Figure 4-3. Dispersion of means from ISM (based on n = 30 increments) applied to a lognormal distribution (mean = 100) with CVs ranging 0.5–3.0.3
Discrete and ISM samples yield different distributions for the mean. They share the same (grand) mean but have different shapes and variances.From a statistical standpoint, it should be noted that analysis of multiple ISM samples collected with the same sampling protocol (i.e., sampling method and number of increments) provides a direct measure of the variance in the mean. It is important to recognize that the distribution of replicate results is different from, but related to, the distribution of discrete results (Xi) obtained from the same population. As shown in Figure 4-3, both sampling types share the same estimate of the population mean (i.e., 100) but not the same variance. The variance of ISM samples composed of n increments is lower than the variance of discrete samples comprised of n discrete measurements. While this example is an oversimplification of the differences between ISM and discrete sampling, it highlights an important statistical concept related to sampling from populations.
By collecting multiple ISM samples within a DU (i.e., ≥3 replicates), we can obtain a direct measure of variability in sample means and calculate a UCL for the mean with an acceptable decision error rate.In practice, you can’t send a crew out to sample the same DU 100 times and assess the variability. Instead, you typically have to obtain a reliable estimate of the mean through a single sampling exercise. Through understanding the concept of variability in estimates of the mean and the influence of heterogeneity, the limitation of basing a decision on a single ISM sample becomes apparent. There is no way to know whether any one estimate provided by a single sampling event is above or below the actual mean in the DU as well as the potential magnitude of the deviation from the actual mean without additional sampling data to assess heterogeneity of the concentrations within the DU.
With ISM, the UCL can be thought of as a measure of the confidence in the estimate of the "grand mean," or the overall mean of the individual means given by each replicate sample.Recognizing that variability and errors in estimates of the mean exist, regulatory agencies often require a 95% UCL to represent an EPC or to assess compliance with decision criteria. Just as with discrete sampling, the variance for replicate ISM samples can be used to estimate the standard error for the mean, which is one factor in the calculation of a UCL as discussed below. Similar to the difference in discrete and ISM variance estimates described above, the UCL calculated from ISM replicates is generally different (and lower) than the UCL calculated from discrete samples with typical sample sizes. In the case of ISM, the UCL can be thought of as a measure of the confidence in the estimate of the “grand mean,” or the mean of the means given by replicate samples. In practice, it is expected that a typical ISM sampling protocol will consist of a relatively small number of replicates (e.g., three to five replicates). The small number of samples may have several implications on the performance of the ISM sampling effort, depending on the properties of the contaminant distribution at a site (e.g., heterogeneities, spatial patterns in the distribution, etc.).
2Note that the distribution of ISM means (from repeated trials of one ISM sample) and means estimated from discrete samples would be expected to be equivalent if the number of discrete samples was the same as the number of increments.
3ISM generates a distribution of means that approaches normality, as predicted by the central limit theorem. However, the ISM distribution can also by asymmetric, and the asymmetry increases with increasing dispersion of the underlying distribution.