In survey methodology, the design effect (generally denoted as Deff {\displaystyle {\text{Deff}}} , D eff {\displaystyle D_{\text{eff}}} , or D eft 2 {\displaystyle D_{\text{eft}}^{2}} ) is a measure of the expected impact of a sampling design on the variance of an estimator for some parameter of a population. It is calculated as the ratio of the variance of an estimator based on a sample from an (often) complex sampling design, to the variance of an alternative estimator based on a simple random sample (SRS) of the same number of elements.1: 258 The Deff {\displaystyle {\text{Deff}}} (be it estimated, or known a priori) can be used to evaluate the variance of an estimator in cases where the sample is not drawn using simple random sampling. It may also be useful in sample size calculations2 and for quantifying the representativeness of samples collected with various sampling designs.
The design effect is a positive real number that indicates an inflation ( Deff > 1 {\displaystyle {\text{Deff}}>1} ), or deflation ( Deff < 1 {\displaystyle {\text{Deff}}<1} ) in the variance of an estimator for some parameter, that is due to the study not using SRS (with Deff = 1 {\displaystyle {\text{Deff}}=1} , when the variances are identical).3: 53, 54 Intuitively we can get Deff < 1 {\displaystyle {\text{Deff}}<1} when we have some a-priori knowledge we can exploit during the sampling process (which is somewhat rare). And, in contrast, we often get Deff > 1 {\displaystyle {\text{Deff}}>1} when we need to compensate for some limitation in our ability to collect data (which is more common). Some sampling designs that could introduce Deff {\displaystyle {\text{Deff}}} generally greater than 1 include: cluster sampling (such as when there is correlation between observations), stratified sampling (with disproportionate allocation to the strata sizes), cluster randomized controlled trial, disproportional (unequal probability) sample (e.g. Poisson sampling), statistical adjustments of the data for non-coverage or non-response, and many others. Stratified sampling can yield Deff {\displaystyle {\text{Deff}}} that is smaller than 1 when using Proportionate allocation to strata sizes (when these are known a-priori, and correlated to the outcome of interest) or Optimum allocation (when the variance differs between strata and is known a-priori).
Many calculations (and estimators) have been proposed in the literature for how a known sampling design influences the variance of estimators of interest, either increasing or decreasing it. Generally, the design effect varies among different statistics of interests, such as the total or ratio mean. It also matters if the sampling design is correlated with the outcome of interest. For example, a possible sampling design might be such that each element in the sample may have a different probability to be selected. In such cases, the level of correlation between the probability of selection for an element and its measured outcome can have a direct influence on the subsequent design effect. Lastly, the design effect can be influenced by the distribution of the outcome itself. All of these factors should be considered when estimating and using design effect in practice.4: 13
The term "design effect" was coined by Leslie Kish in his 1965 book "Survey Sampling."5: 88, 258 In it, Kish proposed the general definition for the design effect,6 as well as formulas for the design effect of cluster sampling (with intraclass correlation);7: 162 and the famous design effect formula for unequal probability sampling.8: 427 These are often known as "Kish's design effect", and were later combined into a single formula.
In a 1995 paper,9: 73 Kish mentions that a similar concept, termed "Lexis ratio", was described at the end of the 19th century. The closely related Intraclass correlation was described by Fisher in 1950, while computations of ratios of variances were already published by Kish and others from the late 1940s to the 1950s. One of the precursors to Kish's definition was work done by Cornfield in 1951.1011
In his 1995 paper, Kish proposed that considering the design effect is necessary when averaging the same measured quantity from multiple surveys conducted over a period of time.12: 57–62 He also suggested that the design effect should be considered when extrapolating from the error of simple statistics (e.g. the mean) to more complex ones (e.g. regression coefficients). However, when analyzing data (e.g., using survey data to fit models), Deff {\displaystyle {\text{Deff}}} values are less useful nowadays due to the availability of specialized software for analyzing survey data. Prior to the development of software that computes standard errors for many types of designs and estimates, analysts would adjust standard errors produced by software that assumed all records in a dataset were i.i.d by multiplying them by a Deft {\displaystyle {\text{Deft}}} (see Deft definition below).
The design effect, commonly denoted by Deff {\displaystyle {\text{Deff}}} (or D eff {\displaystyle D_{\text{eff}}} , sometimes with additional subscripts), is the ratio of two theoretical variances for estimators of some parameter ( θ {\displaystyle \theta } ):1314
So that:
In other words, Deff {\displaystyle {\text{Deff}}} measures the extent to which the variance has increased (or, in some cases, decreased) because the sample was drawn and adjusted to a specific sampling design (e.g., using weights or other measures) compared to if the sample was from a simple random sample (without replacement). Notice how the definition of Deff {\displaystyle {\text{Deff}}} is based on parameters of the population that are often unknown, and that are hard to estimate directly. Specifically, the definition involves the variances of estimators under two different sampling designs, even though only a single sampling design is used in practice.
For example, when estimating the population mean, the Deff {\displaystyle {\text{Deff}}} (for some sampling design p) is:15: 4 16: 54 17
Where n {\displaystyle n} is the sample size, f = n / N {\displaystyle f=n/N} is the fraction of the sample from the population, ( 1 − f ) {\displaystyle (1-f)} is the (squared) finite population correction (FPC), S y 2 {\displaystyle S_{y}^{2}} is the unbiassed sample variance, and v a r p ( y ¯ p ) {\displaystyle var_{p}({\bar {y}}_{p})} is some estimator of the variance of the mean under the sampling design. The issue with the above formula is that it is extremely rare to be able to directly estimate the variance of the estimated mean under two different sampling designs, since most studies rely on only a single sampling design.
There are many ways of calculation Deff {\displaystyle {\text{Deff}}} , depending on the parameter of interest (e.g. population total, population mean, quantiles, ratio of quantities etc.), the estimator used, and the sampling design (e.g. clustered sampling, stratified sampling, post-stratification, multi-stage sampling, etc.).18: 98 The process of estimating Deff {\displaystyle {\text{Deff}}} for specific designs will be described in the following section.
A related quantity to Deff {\displaystyle {\text{Deff}}} , proposed by Kish in 1995, is the Design Effect Factor, abbreviated as Deft {\displaystyle {\text{Deft}}} (or also D eft {\displaystyle D_{\text{eft}}} ).19: 56 20 It is defined as the square root of the variance ratios while also having the denominator use a simple random sample with replacement (SRSWR), instead of without replacement (SRSWOR):
Deft = var ( θ ^ w ) var ( θ ^ S R S W R ) {\displaystyle {\text{Deft}}={\sqrt {\frac {{\text{var}}({\hat {\theta }}_{w})}{{\text{var}}({\hat {\theta }}_{SRSWR})}}}}
In this later definition (proposed in 1995, vs 1965) Kish argued in favor of using Deft 2 {\displaystyle {\text{Deft}}^{2}} over Deff {\displaystyle {\text{Deff}}} for several reasons. It was argued that SRS "without replacement" (with its positive effect on the variance) should be captured in the denominator part in the definition of the design effect, since it is part of the sampling design. Also, since often the use of the factor is in confidence intervals), it was claimed that using Deft {\displaystyle {\text{Deft}}} will be simpler than writing Deff {\displaystyle {\sqrt {\text{Deff}}}} . It is also said that for many cases when the population is very large, Deft {\displaystyle {\text{Deft}}} is (almost) the square root of Deff {\displaystyle {\text{Deff}}} ( Deft ≈ Deff {\displaystyle {\text{Deft}}\approx {\sqrt {\text{Deff}}}} ), hence it is easier to use than exactly calculating the finite population correction (FPC).21
Even so, in various cases a researcher might approximate the Deft {\displaystyle {\text{Deft}}} by calculating the variance in the numerator while assuming SRS with replacement (SRSWR) instead of SRS without replacement (SRSWOR), even if it is not precise. For example, consider a multistage design with primary sampling units (PSUs) selected systematically with probability proportional to some measure of size from a list sorted in a particular way (say, by number of households in each PSU). Also, let it be combined with an estimator that uses raking to match the totals for several demographic variables. In such a design, the joint selection probabilities for the PSUs, which are needed for a without replacement variance estimator, are 0 for some pairs of PSUs - implying that an exact design-based (i.e., repeated sampling) variance estimator does not exist. Another example is when a public use file issued by some government agency is used for analysis. In such a case the information on joint selection probabilities of first-stage units is almost never released. As a result, an analyst cannot estimate a with replacement variance for the numerator even if desired. The standard workaround is to compute a variance estimator as if the PSUs were selected with replacement. This is the default choice in software packages such as Stata, the R survey package, and the SAS survey procedures.
The effective sample size, defined by Kish in 1965, is calculated by dividing the original sample size by the design effect.22: 162, 259 23: 190, 192 Namely:
This quantity reflects what would be the sample size that is needed to achieve the current variance of the estimator (for some parameter) with the existing design, if the sample design (and its relevant parameter estimator) were based on a simple random sample.24
A related quantity is the effective sample size ratio, which can be calculated by simply taking the inverse of Deff {\displaystyle {\text{Deff}}} (i.e., n eff n = 1 Deff {\displaystyle {\frac {n_{\text{eff}}}{n}}={\frac {1}{\text{Deff}}}} ).
For example, let the design effect, for estimating the population mean based on some sampling design, be 2. If the sample size is 1,000, then the effective sample size will be 500. It means that the variance of the weighted mean based on 1,000 samples will be the same as that of a simple mean based on 500 samples obtained using a simple random sample.
Different sampling designs and statistical adjustments may have substantially different impact on the bias and variance of estimators (such as the mean).
An example of a design which can lead to estimation efficiency, compared to simple random sampling, is Stratified sampling. This efficiency is gained by leveraging information about the composition of the population. For example, if it is known that gender is correlated with the outcome of interest, and also that the male-female ratio for some population is (say) 50%-50%, then sampling exactly half of the sample from each gender will reduce the variance of the outcome's estimator. Similarly, if a particular sub-population is of special interest, deliberately over-sampling from that sub-population will decrease the variance for estimations made about it.
Improvement in variance efficiency might sometimes be sacrificed for convenience or cost. For example, in the cluster sampling case the units may have equal or unequal selection probabilities, irrespective of their intra-class correlation (and their negative effect of increasing the variance of the estimators). We might decide (for practical reasons) to collect responses from only 2 people of each household (i.e., a sampled cluster), which could lead to more complex post-sampling adjustment to deal with unequal selection probabilities. Also, such decisions could lead to less efficient estimators than just taking a fixed proportion of responses from a cluster.
When the sampling design isn’t set in advance and needs to be figured out from the data we have, this can lead to an increase of both the variance and bias of the weighted estimator. This might happen when making adjustments for issues like non-coverage, non-response, or an unexpected strata split of the population that wasn’t available during the initial sampling stage. In these cases, we might use statistical procedures such as post-stratification, raking, or inverse propensity score weighting (where the propensity scores are estimated), among other methods. Using these methods requires assumptions about the initial design model. For example, when we use post-stratification based on age and gender, it is assumed that these variables can explain a significant portion of the bias in the sample. The quality of these estimators is closely tied to the quality of the additional information and the missing at random assumptions used when making them. Either way, even when estimators (like propensity score models) do a good job capturing most of the sampling design, using the weights can make a small or a large difference, depending on the specific data-set.
Due to the large variety in sampling designs (with or without an effect on unequal selection probabilities), different formulas have been developed to capture the potential design effect, as well as to estimate the variance of estimators when accounting for the sampling designs.25 Sometimes, these different design effects can be compounded together (as in the case of unequal selection probability and cluster sampling, more details in the following sections). Whether or not to use these formulas, or just assume SRS, depends on the expected amount of bias reduction vs. the increase in estimator variance (and in the overhead of methodological and technical complexity).26: 426
There are various ways to sample units so that each unit would have the exact same probability of selection. Such methods are called equal probability sampling (EPSEM) methods. Some of the more basic methods include simple random sampling (SRS, with or without replacement) and systematic sampling for getting a fixed sample size. There is also Bernoulli sampling with a random sample size. More advanced techniques such as stratified sampling and cluster sampling can also be designed to be EPSEM. For example, in cluster sampling we can use a two stage sampling in which we sample each cluster (which may be of different sizes) with equal probability, and then sample from each cluster at the second stage using SRS with a fixed proportion (e.g. sample half of the cluster, the whole cluster, etc.). This method will yield EPSEM, but the specific number of elements we end up with is stochastic (i.e., non deterministic).2728: 3–8 Another strategy for cluster sampling that leads to EPSEM is to sample clusters in a way that is proportional to their sizes, and then sample a fixed number of elements inside each cluster.29
In their works, Kish and others highlight several known reasons that lead to unequal selection probabilities:30: 425 31: 185 32: 69 33: 50, 395 34: 306
Adjusting for unequal probability selection through "individual case weights" (e.g. inverse probability weighting), yields various types of estimators for quantities of interest. Estimators such as Horvitz–Thompson estimator yield unbiased estimators (if the selection probabilities are indeed known, or approximately known), for total and the mean of the population. Deville and Särndal (1992) coined the term "calibration estimator" for estimators using weights such that they satisfy some condition, such as having the sum of weights equal the population size. And more generally, that the weighted sum of weights is equal some quantity of an auxiliary variable: ∑ w i x i = X {\displaystyle \sum w_{i}x_{i}=X} (e.g., that the sum of weighted ages of the respondents is equal to the population size in each age group).5354: 132 55: 1
The two primary ways to argue about the properties of calibration estimators are:56: 133–134 57
As we will see later, some proofs in the literature rely on the randomization-based framework, while others focus on the model-based perspective. When moving from the mean to the weighted mean, more complexity is added. For example, in the context of survey methodology, often the population size itself is considered an unknown quantity that is estimated. So in the calculation of the weighted mean is in fact based on a ratio estimator, with an estimator of the total at the numerator and an estimator of the population size in the denominator (making the variance calculation to be more complex).5859: 182
There are many types (and subtypes) of weights, with different ways to use and interpret them. With some weights their absolute value has some important meaning, while with other weights the important part is the relative values of the weights to each other. This section introduces some of the more common types of weights so that they can be referenced in follow-up sections.
There are also indirect ways of applying "weighted" adjustments. For example, the existing cases may be duplicated to impute missing observations (e.g. from non-response), with variance estimated using methods such as multiple imputation. An alternative approach is to remove (assign a weight of 0 to) some cases. For example, when wanting to reduce the influence of over-sampled groups that are less essential for some analysis. Both cases are similar in nature to inverse probability weighting but the application in practice gives more/less rows of data (making the input potentially simpler to use in some software implementation), instead of applying an extra column of weights. Nevertheless, the consequences of such implementations are similar to just using weights. So while in the case of removing observations the data can easily be handled by common software implementations, the case of adding rows requires special adjustments for the uncertainty estimations. Not doing so may lead to erroneous conclusions(i.e., there is no free lunch when using alternative representation of the underlying issues).67: 189, 190
The term "Haphazard weights", coined by Kish, is used to refer to weights that correspond to unequal selection probabilities, but ones that are not related to the expectancy or variance of the selected elements.68: 190, 191
When taking an unrestricted sample of n {\displaystyle n} elements, we can then randomly split these elements into H {\displaystyle H} disjoint strata, each of them containing some size of n h {\displaystyle n_{h}} elements so that ∑ h = 1 H n h = n {\displaystyle \sum \limits _{h=1}^{H}n_{h}=n} . All elements in each stratum h {\displaystyle h} has some (known) non-negative weight assigned to them ( w h {\displaystyle w_{h}} ). The weight w h {\displaystyle w_{h}} can be produced by the inverse of some unequal selection probability for elements in each stratum h {\displaystyle h} (i.e., inverse probability weighting following a procedure such as post-stratification). In this setting, Kish's design effect, for the increase in variance of the sample weighted mean due to this design (reflected in the weights), versus SRS of some outcome variable y (when there is no correlation between the weights and the outcome, i.e. haphazard weights) is:69: 427 70: 191(4.2)
By treating each item as coming from its own stratum ∀ h : n h = 1 {\displaystyle \forall h:n_{h}=1} , Kish (in 1992) simplified the above formula to the (well-known) following version:71: 191(4.3) 72: 318 73: 8
This version of the formula is valid when one stratum had several observations taken from it (i.e., each having the same weight), or when there are just many strata were each one had one observation taken from it, but several of them had the same probability of selection. While the interpretation is slightly different, the calculation of the two scenarios comes out to be the same.
When using Kish's design effect for unequal weights, you may use the following simplified formula for "Kish's Effective Sample Size"7475: 162, 259
The above formula, by Kish, gives the increase in the variance of the weighted mean based on "haphazard" weights. This can also be written as the following formula where y are observations selected using unequal selection probabilities (with no within-cluster correlation, and no relationship to the expectancy or variance of the outcome measurement),76: 190, 191 and y' are the observations we would have had if we got them from a simple random sample:
Deff Kish = var ( y ¯ w ) var ( y ¯ ′ ) = var ( ∑ i = 1 n w i y i ∑ i = 1 n w i ) var ( ∑ i = 1 n y i ′ n ) {\displaystyle {\text{Deff}}_{\text{Kish}}={\frac {{\text{var}}\left({\bar {y}}_{w}\right)}{{\text{var}}\left({\bar {y}}'\right)}}={\frac {{\text{var}}\left({\frac {\sum \limits _{i=1}^{n}w_{i}y_{i}}{\sum \limits _{i=1}^{n}w_{i}}}\right)}{{\text{var}}\left({\frac {\sum \limits _{i=1}^{n}y_{i}'}{n}}\right)}}}
It can be shown that the ratio of variances formula can be reduced to Kish's formula by using a model based perspective.77 In it, Kish's formula will hold when all n observations ( y 1 , . . . , y n {\displaystyle y_{1},...,y_{n}} ) are (at least approximately) uncorrelated ( ∀ ( i ≠ j ) : cor ( y i , y j ) = 0 {\displaystyle \forall (i\neq j):{\text{cor}}(y_{i},y_{j})=0} ), with the same variance ( σ 2 {\displaystyle \sigma ^{2}} ) in the response variable of interest (y). It will also be required to assume the weights themselves are not a random variable but rather some known constants (e.g. the inverse of probability of selection, for some pre-determined and known sampling design).
The conditions on y are trivially held if the y observations are IID with the same expectation and variance. In such cases, y = y ′ {\displaystyle y=y'} , and we can estimate v a r ( y ¯ w ) {\displaystyle var\left({\bar {y}}_{w}\right)} by using var ( y ¯ w ) ¯ = var ( y ¯ ) ¯ × Deff {\displaystyle {\overline {{\text{var}}\left({\bar {y}}_{w}\right)}}={\overline {{\text{var}}\left({\bar {y}}\right)}}\times {\text{Deff}}} .7879 If the y's are not all with the same expectations then we cannot use the estimated variance for calculation, since that estimation assumes that all y i {\displaystyle y_{i}} s have the same expectation. Specifically, if there is a correlation between the weights and the outcome variable y, then it means that the expectation of y is not the same for all observations (but rather, dependent on the specific weight value for each observation). In such a case, while the design effect formula might still be correct (if the other conditions are met), it would require a different estimator for the variance of the weighted mean. For example, it might be better to use a weighted variance estimator.
If different y i {\displaystyle y_{i}} s values have different variances, then while the weighted variance could capture the correct population-level variance, Kish's formula for the design effect may no longer be true.
A similar issue happens if there is some correlation structure in the samples (such as when using cluster sampling).
Notice that Kish's definition of the design effect is closely tied to the coefficient of variation (Kish also calls it relvariance or relvar for short80) of the weights (when using the uncorrected (population level) sample standard deviation for estimation). This has several notations in the literature:81: 191 82: 396
Where V ( w ) = ∑ ( w i − w ¯ ) 2 n {\displaystyle V(w)={\frac {\sum (w_{i}-{\bar {w}})^{2}}{n}}} is the population variance of w {\displaystyle w} , and w ¯ = ∑ w i n {\displaystyle {\bar {w}}={\frac {\sum w_{i}}{n}}} is the mean. When the weights are normalized to sample size (so that their sum is equal to n and their mean is equal to 1), then C V 2 = V ( w ) {\displaystyle {C_{V}}^{2}=V(w)} and the formula reduces to Deff = 1 + V ( w ) {\displaystyle {\text{Deff}}=1+V(w)} . While it is true we assume the weights are fixed, we can think of their variance as the variance of an empirical distribution defined by sampling (with equal probability) one weight from our set of weights (similar to how we would think about the correlation of x and y in a simple linear regression).
Kish's original definition compared the variance under some sampling design to the variance achieved through a simple random sample. Some literature provide the following alternative definition for Kish's design effect: "the ratio of the variance of the weighted survey mean under disproportionate stratified sampling to the variance under proportionate stratified sampling when all stratum unit variances are equal".83: 318 84: 396 Reflecting on this, Park and Lee (2006) stated that "The rationale behind [...][Kish's] derivation is that the loss in precision of [the weighted mean] due to haphazard unequal weighting can be approximated by the ratio of the variance under disproportionate stratified sampling to that under the proportionate stratified sampling".85: 8
Note that this alternative definition only approximated since if the denominator is based on "proportionate stratified sampling" (achieved via stratified sampling) then such a selection will yield a reduced variance as compared with simple random sample. This is since stratified sampling removes some of the variability in the specific number of elements per stratum, as occurs under SRS.
Relatedly, Cochran (1977) provides a formula for the proportional increase in variance due to deviation from optimum allocation (what, in Kish's formulas, would be called L).86: 116
Early papers used the term Deff {\displaystyle {\text{Deff}}} .87: 192 As more definitions of the design effect appeared, Kish's design effect for unequal selection probabilities was denoted Deff Kish {\displaystyle {\text{Deff}}_{\text{Kish}}} (or Deft Kish 2 {\displaystyle {\text{Deft}}_{\text{Kish}}^{2}} ) or simply Deff K {\displaystyle {\text{Deff}}_{K}} for short.88: 8 89: 396 90: 318 Kish's design effect is also known as the "Unequal Weighting Effect" (or just UWE), termed by Liu et al. in 2002.91: 2124
The estimator for the total is the "p-expanded with replacement" estimator (a.k.a.: pwr-estimator or Hansen and Hurwitz). It is based on a simple random sample (with replacement, denoted SIR) of n items ( y k {\displaystyle y_{k}} ) from a population of size N.92 Each item has a probability of p k {\displaystyle p_{k}} (k from 1 to N) to be drawn in a single draw ( ∑ U p k = 1 {\displaystyle \sum _{U}p_{k}=1} , i.e. it is a multinomial distribution). The probability that a specific y k {\displaystyle y_{k}} will appear in the sample is p k {\displaystyle p_{k}} . The "p-expanded with replacement" value is Z i = y k p k {\displaystyle Z_{i}={\frac {y_{k}}{p_{k}}}} with the following expectancy: E [ Z i ] = E [ I i y k p k ] = y k p k E [ I i ] = y k p k p k = y k {\displaystyle E[Z_{i}]=E[I_{i}{\frac {y_{k}}{p_{k}}}]={\frac {y_{k}}{p_{k}}}E[I_{i}]={\frac {y_{k}}{p_{k}}}p_{k}=y_{k}} . Hence Y ^ p w r = 1 n ∑ i n Z i {\displaystyle {\hat {Y}}_{pwr}={\frac {1}{n}}\sum _{i}^{n}Z_{i}} , the pwr-estimator, is an unbiased estimator for the sum total of y.93: 51
In 2000, Bruce D. Spencer proposed a formula for estimating the design effect for the variance of estimating the total (not the mean) of some quantity ( Y ^ {\displaystyle {\hat {Y}}} ), when there is correlation between the selection probabilities of the elements and the outcome variable of interest.94
In this setup, a sample of size n is drawn (with replacement) from a population of size N. Each item is drawn with probability P i {\displaystyle P_{i}} (where ∑ i = 1 N P i = 1 {\displaystyle \sum _{i=1}^{N}P_{i}=1} , i.e. multinomial distribution). The selection probabilities are used to define the Normalized (convex) weights: w i = 1 n P i {\displaystyle w_{i}={\frac {1}{nP_{i}}}} . Notice that for some random set of n items, the sum of weights will be equal to 1 only by expectation ( E [ w i ] = 1 {\displaystyle E[w_{i}]=1} ) with some variability of the sum around it (i.e., the sum of elements from a Poisson binomial distribution). The relationship between y i {\displaystyle y_{i}} and P i {\displaystyle P_{i}} is defined by the following (population) simple linear regression:
Where y i {\displaystyle y_{i}} is the outcome of element i, which linearly depends on P i {\displaystyle P_{i}} with the intercept α {\displaystyle \alpha } and slope β {\displaystyle \beta } . The residual from the fitted line is ϵ i = y i − ( α + β P i ) {\displaystyle \epsilon _{i}=y_{i}-(\alpha +\beta P_{i})} . We can also define the population variances of the outcome and the residuals as σ y 2 {\displaystyle \sigma _{y}^{2}} and σ ϵ 2 {\displaystyle \sigma _{\epsilon }^{2}} . The correlation between P i {\displaystyle P_{i}} and y i {\displaystyle y_{i}} is ρ y , P {\displaystyle \rho _{y,P}} .
Spencer's (approximate) design effect for estimating the total of y is:95: 138 96: 4 97: 401
Where:
This assumes that the regression model fits well so that the probability of selection and the residuals are independent, since it leads to the residuals, and the square residuals, to be uncorrelated with the weights, i.e., that ρ ϵ , W = 0 {\displaystyle \rho _{\epsilon ,W}=0} and also ρ ϵ 2 , W = 0 {\displaystyle \rho _{\epsilon ^{2},W}=0} .98: 138
When the population size (N) is very large, the formula can be written as:99: 319
(since α = Y ¯ − β × P ¯ = Y ¯ − β × 1 N ≈ Y ¯ {\displaystyle \alpha ={\bar {Y}}-\beta \times {\bar {P}}={\bar {Y}}-\beta \times {\frac {1}{N}}\approx {\bar {Y}}} , where c v Y 2 = σ Y 2 Y ¯ 2 {\displaystyle cv_{Y}^{2}={\frac {\sigma _{Y}^{2}}{{\bar {Y}}^{2}}}} )
This approximation assumes that the linear relationship between P and y holds. And also that the correlation of the weights with the errors, and the errors squared, are both zero. I.e., ρ w , e = 0 {\displaystyle \rho _{w,e}=0} and ρ w , e 2 = 0 {\displaystyle \rho _{w,e^{2}}=0} .100: 4
We notice that if ρ ^ y , P ≈ 0 {\displaystyle {\hat {\rho }}_{y,P}\approx 0} , then α ^ ≈ y ¯ {\displaystyle {\hat {\alpha }}\approx {\bar {y}}} (i.e., the average of y). In such a case, the formula reduces to
Only if the variance of y is much larger than its mean, then the right-most term is close to 0 (i.e., 1 relvar ( y ) = Y ¯ σ y ≈ 0 {\displaystyle {\frac {1}{{\text{relvar}}(y)}}={\frac {\bar {Y}}{\sigma _{y}}}\approx 0} ), which reduces Spencer's design effect (for the estimated total) to be equal to Kish's design effect (for the ratio means):101: 5 Deff S p e n c e r ≈ ( 1 + L ) = Deff Kish {\displaystyle {\text{Deff}}_{Spencer}\approx (1+L)={\text{Deff}}_{\text{Kish}}} . Otherwise, the two formulas will yield different results, which demonstrates the difference between the design effect of the total vs. the design effect of the mean.
In 2001, Park and Lee extended Spencer's formula to the case of the ratio-mean (i.e., estimating the mean by dividing the estimator of the total with the estimator of the population size). It is:102: 4
Park and Lee's formula is exactly equal to Kish's formula when ρ ^ y , P 2 = 0 {\displaystyle {\hat {\rho }}_{y,P}^{2}=0} . Both formulas relate to the design effect of the mean of y, while Spencer's Deff {\displaystyle {\text{Deff}}} relates to the estimation of the population total.
In general, the Deff {\displaystyle {\text{Deff}}} for the total ( Y ^ {\displaystyle {\hat {Y}}} ) tends to be less efficient than the Deff {\displaystyle {\text{Deff}}} for the ratio mean ( Y ¯ ^ {\displaystyle {\hat {\bar {Y}}}} ) when ρ y , P {\displaystyle \rho _{y,P}} is small. And in general, ρ y , P {\displaystyle \rho _{y,P}} impacts the efficiency of both design effects.103: 8
For data collected using cluster sampling we assume the following structure:
When clusters are all of the same size n ∗ {\displaystyle n^{*}} , the design effect Deff, proposed by Kish in 1965 (and later re-visited by others), is given by:105: 162 106: 399 107: 9 108109110: 241
It is sometimes also denoted as Deff C {\displaystyle {\text{Deff}}_{C}} .111: 2124
In various papers, when cluster sizes are not equal, the above formula is also used with n ∗ {\displaystyle n^{*}} as the average cluster size (which is also sometimes denoted as b ¯ {\displaystyle {\bar {b}}} ).112113: 105 In such cases, Kish's formula (using the average cluster weight) serves as a conservative (upper bound) of the exact design effect.114: 106
Alternative formulas exists for unequal cluster sizes.115: 193 Followup work had discussed the sensitivity of using the average cluster size with various assumptions.116
In a 1987 paper, Kish proposed a combined design effect that incorporates both the effects due to weighting that accounts for unequal selection probabilities and cluster sampling:117: 16 118: 105 119: 4 120: 2
The above uses notations similar to what is used in this article (the original 1987 publication used different notation).121 A model based justification for this formula was provided by Gabler et al.122
In 2000, Liu and Aragon proposed a decomposition of unequal selection probabilities design effect for different strata in stratified sampling.123 In 2002, Liu et al. extended that work to account for stratified samples, where within each stratum is a set of unequal selection probability weights. The cluster sampling is either global or per stratum.124 Similar work was done also by Park et al. in 2003.125
The Chen-Rust Deff {\displaystyle {\text{Deff}}} extends the model-based justification of Kish’s 1987 formula for design effects proposed by Gabler, el. al.,126 applying it to two-stage designs with stratification at the first stage and to three-stage designs without stratification.127 The modified formulae define the overall design effect using survey weights and population intracluster correlations. These formulae allow for insightful interpretations of design effects from various sources and can estimate intracluster correlations in completed surveys or predict design effects in future surveys.
Henry's Deff {\displaystyle {\text{Deff}}} 128 proposes an extended model-assisted weighting design-effect measure for single-stage sampling and calibration weight adjustments for a case where y i = α + β x i + ϵ i {\displaystyle y_{i}=\alpha +\beta x_{i}+\epsilon _{i}} , where x i {\displaystyle x_{i}} is a vector of covariates, the model errors are independent, and the estimator of the population total is the general regression estimator (GREG) of Särndal, Swensson, and Wretman (1992).129 The new measure considers the combined effects of non-epsem sampling design, unequal weights from calibration adjustments, and the correlation between an analysis variable and the auxiliaries used in calibration.
Lohr's Deff {\displaystyle {\text{Deff}}} 130 is for ordinary least squares (OLS) and generalized least squares (GLS) estimators in the context of cluster sampling, using a random coefficient regression model. Lohr presents conditions under which the GLS estimator of the regression slope has a design effect less than 1, indicating higher efficiency. However, the design effect of the GLS estimator is highly sensitive to model specification. If an underlying random coefficient model is incorrectly specified as a random intercept model, the design effect can be seriously understated. In contrast, the OLS estimator of the regression slope and the design effect calculated from a design-based perspective are robust to misspecification of the variance structure, making them more reliable in situations where the model specification may not be accurate.
D e f f {\displaystyle Deff} may be used when planning a future data collection, as well as a diagnostic tool:131: 85
Considering the design effect is unnecessary when137: 57–62 the source population is closely IID, or when the sample design of the data was drawn as a simple random sample. It is also less useful when the sample size is relatively small (at least partially, for practical reasons).[original research?]
While Kish originally hoped to have the design effect be as agnostic as possible to the underlying distribution of the data, sampling probabilities, their correlations, and the statistics of interest, followup research has shown that these do influence the design effect. Hence, these properties should be carefully considered when deciding which D e f f {\displaystyle Deff} calculation to use, and how to use it.138: 13 139: 6
The design effect is rarely applied when constructing confidence intervals. Ideally, one would be able to determine, for an estimator of a particular parameter, both the variance under Simple Random Sample (SRS) with replacement and the design effect (which accounts for all elements of the sampling design that change the variance). In such scenarios, the basic variance and the design effect could have been multiplied to compute the variance of the estimator for the specific design.140: 259 This computed value can then be employed to form confidence intervals. However, in real-world applications, it is uncommon to estimate both values simultaneously. As a result, other methods are favored. For instance, Taylor linearization is utilized to construct confidence intervals based on the variance of the weighted mean. More broadly, the bootstrap method, also known as replication weights, is applied for a range of weighted statistics.
Kish's design effect is implemented in various statistical software packages:
This article was submitted to WikiJournal of Science for external academic peer review in 2023 (reviewer reports). The updated content was reintegrated into the Wikipedia page under a CC-BY-SA-3.0 license (2024). The version of record as reviewed is: Tal Galili; et al. (5 May 2024). "Design effect" (PDF). WikiJournal of Science. 7 (1): 4. doi:10.15347/WJS/2024.004. ISSN 2470-6345. Wikidata Q116768211.
Kish, Leslie (1965). Survey Sampling. New York: John Wiley & Sons, Inc. ISBN 0-471-10949-5. 0-471-10949-5 ↩
Heo, Moonseong; Kim, Yongman; Xue, Xiaonan; Kim, Mimi Y. (2010). "Sample size requirement to detect an intervention effect at the end of follow-up in a longitudinal cluster randomized trial". Statistics in Medicine. 29 (3): 382–390. doi:10.1002/sim.3806. ISSN 1097-0258. PMID 20014353. S2CID 30001378. Archived from the original on 5 January 2013. https://archive.today/20130105190734/http://www3.interscience.wiley.com/journal/123212319/abstract ↩
Sarndal, Carl-Erik; Swensson, Bengt; Wretman, Jan (1992). Model Assisted Survey Sampling. Springer. doi:10.1007/978-1-4612-4378-6 (inactive 1 November 2024). ISBN 9780387975283.{{cite book}}: CS1 maint: DOI inactive as of November 2024 (link) 9780387975283 ↩
Park, Inho; Lee, Hyunshik (2004). "Design effects for the weighted mean and total estimators under complex survey sampling" (PDF). Survey Methodology. 30 (2): 183–193. ISSN 1492-0921. https://www150.statcan.gc.ca/n1/en/pub/12-001-x/2004002/article/7751-eng.pdf?st=DUPH-397 ↩
I.e., that the design effect is the ratio of variances of two estimators, one from a sample with some design and the other from a simple random sample ↩
Kish, Leslie (1995). "Methods for design effects" (PDF). Journal of Official Statistics. 11 (1): 55. ISSN 0282-423X. https://www.scb.se/contentassets/ca21efb41fee47d293bbee5bf7be7fb3/methods-for-design-effects.pdf ↩
Cochran, William G. (June 1951). "General Principles in the Selection of a Sample". American Journal of Public Health and the Nation's Health. 41 (6): 647–653. doi:10.2105/AJPH.41.6.647. ISSN 0090-0036. PMC 1525569. PMID 14838186. https://www.ncbi.nlm.nih.gov/pmc/articles/PMC1525569 ↩
Everitt, B.S. (2002). The Cambridge Dictionary of Statistics (2nd ed.). Cambridge University Press. ISBN 0-521-81099-X. 0-521-81099-X ↩
A general formula for the (theoretical) design effect of estimating a total (not the mean), for some design, is given in Cochran 1977.[3]: 54 ↩
Kalton, Graham; Brick, J. Michael; Lȇ, Thanh (2005). Estimating components of design effects for use in sample design (PDF). Household Sample Surveys in Developing and Transition Countries (Report). New York: Department of Economic and Social Affairs, Statistics Division, United Nations. pp. 95–121. ISBN 92-1-161481-3. ST/ESA/STAT/SER.F/96. 92-1-161481-3 ↩
The original intention of Kish for Deft {\displaystyle {\text{Deft}}} was to have it "express the effects of sample design beyond the elemental variability S n 2 n {\displaystyle {\frac {S_{n}^{2}}{n}}} , removing both the unit of measurement and sample size as nuisance parameters". The hope was to have the design effect generalizable (relevant for) many statistics and variables within the same survey (and even between surveys).[5]: 55 However, followup works have shown that the design effect depends on the specific sampling design, the outcome, and the statistic of interest (E.g. population total versus the mean). Especially, the Deft {\displaystyle {\text{Deft}}} depends on the association between some specific outcome with a specific design (e.g. the correlation between y i {\displaystyle y_{i}} and the selection probability p i {\displaystyle p_{i}} ).[4]: 5 Hence, current literature does not support the generalizability of the Deft {\displaystyle {\text{Deft}}} across many statistics and outcome measures. ↩
Kish, Leslie (1992). "Weighting for unequal Pi" (PDF). Journal of Official Statistics. 8 (2): 183–200. ISSN 0282-423X. https://www.scb.se/contentassets/f6bcee6f397c4fd68db6452fc9643e68/weighting-for-unequal-empemsubemiemsub.pdf ↩
Leinster, Tom (18 December 2014). "Effective Sample Size". The n-Category Café. https://golem.ph.utexas.edu/category/2014/12/effective_sample_size.html ↩
Wolter, Kirk M. (2007). Introduction to Variance Estimation. Statistics for Social and Behavioral Sciences (2nd ed.). Springer. doi:10.1007/978-0-387-35099-8. ISBN 978-0387329178. 978-0387329178 ↩
As a simple illustration of this, imagine we have clusters of different sizes, and we sample only one cluster (using SRS) and measure all the elements in it. This will lead to EPSEM, but the number of observations we'll get will depend on the cluster size. ↩
Frerichs, R. R. (2004). "Equal Probability of Selection". Rapid Surveys (PDF). unpublished. https://www.ph.ucla.edu/epi/rapidsurveys/RScourse/chap4rapid_2004.pdf ↩
To be more precise: suppose that S i {\displaystyle S_{i}} is the measure of size for cluster i {\displaystyle i} . One common method of PPS (probability proportional to size) sampling is to sample each cluster with selection probability that is proportional to its size as follows: P ( Selecting cluster i ) = m S i ∑ U i ∈ U S i {\displaystyle P({\text{Selecting cluster }}i)={\frac {mS_{i}}{\sum _{U_{i}\in U}S_{i}}}} where m {\displaystyle m} is the number of clusters that we want to sample and U {\displaystyle U} is the frame used for sampling clusters. If we subsampled an equal number, n ¯ {\displaystyle {\bar {n}}} , of elements within each sample cluster using some equal probability method, and S i {\displaystyle S_{i}} is the correct number of elements in cluster i {\displaystyle i} , then the selection probability of element j {\displaystyle j} (in some cluster i {\displaystyle i} ) will be the same for every element across all clusters (i.e., EPSEM): π j = m S i ∑ U i ∈ U S i n ¯ S i = m n ¯ ∑ U i ∈ U S i {\displaystyle \pi _{j}={\frac {mS_{i}}{\sum _{U_{i}\in U}S_{i}}}{\frac {\bar {n}}{S_{i}}}={\frac {m{\bar {n}}}{\sum _{U_{i}\in U}S_{i}}}} . If S i {\displaystyle S_{i}} turns out not to be the correct size, sampling at the rate of n ¯ S i {\displaystyle {\frac {\bar {n}}{S_{i}}}} will still yield EPSEM (equal probability selection method). Notice that if we enumerate (take measurement of) all units in a sample cluster (instead of some fixed number n ¯ {\displaystyle {\bar {n}}} , or a fixed proportion n ¯ S i {\displaystyle {\frac {\bar {n}}{S_{i}}}} ), then each unit in cluster i {\displaystyle i} has the selection probability of the cluster, which will lead to unequal probability of selections between elements of different clusters (i.e., π j ( i ) = m S i ∑ U i ∈ U S i {\displaystyle \pi _{j}(i)={\frac {mS_{i}}{\sum _{U_{i}\in U}S_{i}}}} ). ↩
Valliant, Richard; Dever, Jill A.; Kreuter, Frauke (2013). Practical Tools for Designing and Weighting Survey Samples. New York: Springer. doi:10.1007/978-1-4614-6449-5. ISBN 978-1-4899-9381-6. 978-1-4899-9381-6 ↩
Cochran, W. G. (1977). Sampling Techniques (3rd ed.). Nashville, TN: John Wiley & Sons. ISBN 978-0-471-16240-7. 978-0-471-16240-7 ↩
Neyman, Jerzy (1934). "On the Two Different Aspects of the Representative Method: The Method of Stratified Sampling and the Method of Purposive Selection". Journal of the Royal Statistical Society. 97 (4): 558–625. doi:10.2307/2342192. ISSN 0952-8385. JSTOR 2342192. /wiki/Doi_(identifier) ↩
For example, say that we assume for each cluster i {\displaystyle i} that its size is S i {\displaystyle S_{i}} , we can sample m {\displaystyle m} clusters with the following probability of selection: P ( Selecting cluster i ) = m S i ∑ U i ∈ U S i {\displaystyle P({\text{Selecting cluster }}i)={\frac {mS_{i}}{\sum _{U_{i}\in U}S_{i}}}} . And then, we take a fixed number of n ¯ {\displaystyle {\bar {n}}} elements from each cluster. In such a case, if we say that the real cluster size is, say, S i ∗ {\displaystyle S_{i}^{*}} , then the selection probability for each element j {\displaystyle j} taken from cluster i {\displaystyle i} , will be: π j ( j ) = m S i ∑ U i ∈ U S i n ¯ S i ∗ {\displaystyle \pi _{j}(j)={\frac {mS_{i}}{\sum _{U_{i}\in U}S_{i}}}{\frac {\bar {n}}{S_{i}^{*}}}} . Note that this could be mitigated at the sampling stage if we sample from each cluster using the rate n ¯ S i {\displaystyle {\frac {\bar {n}}{S_{i}}}} , then the selection probability will be EPSEM (even though the real cluster size was S i ∗ {\displaystyle S_{i}^{*}} and not S i {\displaystyle S_{i}} ). ↩
Dever, Jill A.; Valliant, Richard (2010). "A comparison of variance estimators for post-stratification to estimated control totals" (PDF). Survey Methodology. 36 (1): 45–56. ISSN 1492-0921. https://www.rti.org/publication/comparison-variance-estimators-poststratification-estimated-control-totals/fulltext.pdf ↩
Kott, Phillip S. (2006). "Using calibration weighting to adjust for nonresponse and coverage errors" (PDF). Survey Methodology. 32 (2): 133. ISSN 1492-0921. https://www150.statcan.gc.ca/n1/en/pub/12-001-x/2006002/article/9547-eng.pdf ↩
Holt, D.; Smith, T. M. F. (1979). "Post Stratification". Journal of the Royal Statistical Society. Series A (General). 142 (1): 33–46. doi:10.2307/2344652. ISSN 0035-9238. JSTOR 2344652. /wiki/Doi_(identifier) ↩
This formula would apply only if an equal probability sample were selected in stratum h and each element has the same probability of responding. ↩
Ghosh, Dhiren; Vogt, Andrew (2002). "Sampling methods related to Bernoulli and Poisson Sampling" (PDF). Proceedings of the Section on Survey Research Methods. 2002: 3569–3570. ISSN 0733-5830. http://www.asasrms.org/Proceedings/y2002/Files/JSM2002-001080.pdf ↩
Deville, Jean-Claude; Särndal, Carl-Erik (1992). "Calibration Estimators in Survey Sampling". Journal of the American Statistical Association. 87 (418): 376–382. doi:10.1080/01621459.1992.10475217. ISSN 0162-1459. /wiki/Doi_(identifier) ↩
Brick, J. Michael; Montaquila, Jill; Roth, Shelley (2003). "Identifying problems with raking estimators" (PDF). Proceedings of the Section on Survey Research Methods. 2003: 710–717. ISSN 0733-5830. http://www.asasrms.org/Proceedings/y2003/Files/JSM2003-000472.pdf ↩
Keiding, Niels; Clayton, David (2014). "Standardization and Control for Confounding in Observational Studies: A Historical Perspective". Statistical Science. 29 (4): 529–558. arXiv:1503.02853. doi:10.1214/13-STS453. ISSN 0883-4237. /wiki/ArXiv_(identifier) ↩
Lumley, Thomas (25 May 2021). "How to estimate the (approximate) variance of the weighted mean?". Stack Exchange. https://stats.stackexchange.com/q/525770 ↩
"What types of weights do SAS, Stata and SPSS support?". UCLA Statistical Consulting Group. 2021. Archived from the original on 2 September 2023. Retrieved 2 September 2023. https://web.archive.org/web/20230902175545/https://stats.oarc.ucla.edu/other/mult-pkg/faq/what-types-of-weights-do-sas-stata-and-spss-support/ ↩
Kalton, Graham (1968). "Standardization: A Technique to Control for Extraneous Variables". Journal of the Royal Statistical Society. Series C (Applied Statistics). 17 (2): 118–136. doi:10.2307/2985676. ISSN 0035-9254. JSTOR 2985676. /wiki/Doi_(identifier) ↩
Henry, Kimberly A.; Valliant, Richard (2015). "A design effect measure for calibration weighting in single-stage samples" (PDF). Survey Methodology. 41 (2): 315–331. ISSN 1492-0921. https://www150.statcan.gc.ca/n1/pub/12-001-x/2015002/article/14236-eng.pdf ↩
Bock, Tim (24 March 2017). "Design Effects and Effective Sample Size". Displayr. http://docs.displayr.com/wiki/Design_Effects_and_Effective_Sample_Size ↩
Gabler, Siegfried; Häder, Sabine; Lahiri, Partha (1999). "A model based justification of Kish's formula for design effects for weighting and clustering" (PDF). Survey Methodology. 25: 105–106. ISSN 1492-0921. https://www150.statcan.gc.ca/n1/en/pub/12-001-x/1999001/article/4718-eng.pdf?st=kP7KrrRP ↩
Little, Roderick J.; Vartivarian, Sonya (2005). "Does weighting for nonresponse increase the variance of survey means?" (PDF). Survey Methodology. 31 (2): 161. ISSN 1492-0921. https://www150.statcan.gc.ca/n1/pub/12-001-x/2005002/article/9046-eng.pdf ↩
Notice that there is another term called relative variance, which is different. It is the ratio of variance to the mean, while Kish's relvariance is the ratio of the variance to the squared mean. /wiki/Index_of_dispersion ↩
Liu, Jun; Iannacchione, Vince; Byron, Margie (2002). "Decomposing design effects for stratified sampling" (PDF). Proceedings of the Section on Survey Research Methods. 2002: 2124–2126. ISSN 0733-5830. http://www.asasrms.org/Proceedings/y2002/Files/JSM2002-001069.pdf ↩
In the literature, the sample and the population sizes are sometimes marked as n and N, and sometimes m and M. In this article we used n and N. ↩
Spencer, Bruce D. (2000). "An approximate design effect for unequal weighting when measurements may correlate with selection probabilities" (PDF). Survey Methodology. 26: 137–138. ISSN 1492-0921. https://www150.statcan.gc.ca/n1/en/pub/12-001-x/2000002/article/5533-eng.pdf?st=t-Ccnb4p ↩
Park, Inho; Lee, Hyunshik (2001). "The design effect: do we know all about it" (PDF). Proceedings of the Section on Survey Research Methods. 2001. ISSN 0733-5830. http://www.asasrms.org/Proceedings/y2001/Proceed/00144.pdf ↩
Rowe, Alexander K.; Lama, Marcel; Onikpo, Faustin; Deming, Michael S. (2002). "Design effects and intraclass correlation coefficients from a health facility cluster survey in Benin". International Journal for Quality in Health Care. 14 (6): 521–523. doi:10.1093/intqhc/14.6.521. ISSN 1353-4505. PMID 12515339. /wiki/Doi_(identifier) ↩
Bland, Michael (2005). "Cluster randomised trials in the medical literature". University of York. http://www-users.york.ac.uk/~mb55/talks/clusml.htm ↩
Ahmed, Saifuddin (2009). "Methods in Sample Surveys" (PDF). Johns Hopkins University Bloomberg School of Public Health. pp. 5–6. Archived from the original (PDF) on 28 September 2013. https://web.archive.org/web/20130928180152/https://ocw.jhsph.edu/courses/StatMethodsForSampleSurveys/PDFs/Lecture5.pdf ↩
Kish, Leslie (1987). "Questions/Answers" (PDF). The Survey Statistician. Vol. 17. pp. 13–17. ISSN 0214-3240. https://commons.wikimedia.org/wiki/File:Leslie_Kish,_1987,_Survey_Statistician_volume_17.pdf ↩
Lynn, Peter; Gabler, Siegfried (2005). "Approximations to b* in the prediction of design effects due to clustering" (PDF). Survey Methodology. 31 (1): 101–104. ISSN 1492-0921. https://www150.statcan.gc.ca/n1/en/pub/12-001-x/2005001/article/8093-eng.pdf?st=J-njxreT ↩
Gabler, Siegfried; Hader, Sabine; Lynn, Peter (2005). "Design effects for multiple design samples" (PDF). Survey Methodology. 32 (1): 115–120. ISSN 1492-0921. https://www150.statcan.gc.ca/n1/en/pub/12-001-x/2006001/article/9256-eng.pdf?st=YXTS--Q- ↩
The formula for Kish's design effect using the original notation:[36]: 16 Deff Kish = Deft 2 = d e f t u s 2 ( 1 + L ) = ( 1 + ρ ( b ¯ − 1 ) ) n ∑ k j 2 ( ∑ k j ) 2 {\displaystyle {\text{Deff}}_{\text{Kish}}={\text{Deft}}^{2}=deftu_{s}^{2}(1+L)=\left(1+\rho ({\bar {b}}-1)\right){\frac {n\sum {k_{j}^{2}}}{\left(\sum {k_{j}}\right)^{2}}}} ↩
Liu, Jun; Aragon, Elvessa (2000). "Subsampling strategies in longitudinal surveys" (PDF). Proceedings of the Section on Survey Research Methods. 2000: 307–312. ISSN 0733-5830. http://www.asasrms.org/Proceedings/papers/2000_048.pdf ↩
Park, Inho; Winglee, Marianne; Clark, Jay; Rust, Keith; Sedlak, Andrea; Morganstein, David (2003). "Design effects and survey planning" (PDF). Proceedings of the Section on Survey Research Methods. 2003: 3179–3186. ISSN 0733-5830. http://www.asasrms.org/Proceedings/y2003/Files/JSM2003-000820.pdf ↩
Chen, Sixia; Rust, Keith (2017). "An extension of Kish's formula for design effects to two-and three-stage designs with stratification". Journal of Survey Statistics and Methodology. 5 (2): 111–130. doi:10.1093/jssam/smw036. ISSN 2325-0984. PMC 10426793. PMID 37583392. https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10426793 ↩
Lohr, Sharon L. (2014). "Design Effects for a Regression Slope in a Cluster Sample". Journal of Survey Statistics and Methodology. 2 (2): 97–125. doi:10.1093/jssam/smu003. ISSN 2325-0984. /wiki/Doi_(identifier) ↩
Zins, Stefan; Burgard, Jan Pablo (2020). "Considering interviewer and design effects when planning sample sizes". Survey Methodology. 46 (1): 93–119. ISSN 1492-0921. https://www150.statcan.gc.ca/n1/pub/12-001-x/2020001/article/00005-eng.htm ↩
Potter, Frank; Zheng, Yuhong (2015). "Methods and issues in trimming extreme weights in sample surveys" (PDF). Proceedings of the Section on Survey Research Methods. 2015: 2707–2719. ISSN 0733-5830. http://www.asasrms.org/Proceedings/y2015/files/234115.pdf ↩
Lumley, Thomas (2004). "Analysis of Complex Survey Samples". Journal of Statistical Software. 9 (1): 1–19. doi:10.18637/jss.v009.i08. ISSN 1548-7660. R package version 2.2 https://doi.org/10.18637%2Fjss.v009.i08 ↩
Pew Research Center. "pewmethods". GitHub. Retrieved 28 November 2023. https://github.com/pewresearch/pewmethods ↩
Gutierrez Rojas, Hugo Andres (17 January 2020). "samplesize4surveys". The Comprehensive R Archive Network (CRAN). Retrieved 28 November 2023. https://cran.r-project.org/web/packages/samplesize4surveys/index.html ↩
Sarig, Tal; Galili, Tal; Eilat, Roee (2023). "balance -- a Python package for balancing biased data samples". arXiv:2307.06024 [stat.CO]. /wiki/ArXiv_(identifier) ↩
Buskirk, Trent D. (2011). Estimating Design Effects for Means, Proportions and Totals from Complex Sample Survey Data Using SAS® Proc Surveymeans (PDF). Midwest SAS Users Group Conference 2011. Saint Louis, MO: Saint Louis University School of Public Health. pp. 1–13. Archived from the original (PDF) on 11 May 2015. Retrieved 28 November 2023. https://web.archive.org/web/20150511160802/https://www.mwsug.org/proceedings/2011/stats/MWSUG-2011-SA11.pdf ↩
"Survey Data Analysis in Stata 17". UCLA Statistical Consulting Group. 2021. Archived from the original on 7 June 2023. Retrieved 28 November 2023. https://web.archive.org/web/20230607050827/https://stats.oarc.ucla.edu/stata/seminars/survey-data-analysis-in-stata-17/ ↩
"DESCRIPT Example 1" (PDF). RTI International. Retrieved 28 November 2023. https://sudaanorder.rti.org/examples/DESCRIPT%20Example%201.pdf ↩
Choudhry, G. Hussain; Valliant, Richard (2002). WesVar: Software for complex survey data analysis (PDF). Statistics Canada Symposium. Ottawa: Statistics Canada. Retrieved 28 November 2023. https://www150.statcan.gc.ca/n1/en/pub/11-522-x/2002001/session6/6728-eng.pdf?st=do0zuOhp ↩