**Stratified Sampling** is commonly used probability method that is superior to random sampling because it reduces sampling error. A stratum is a subset of the population that share at least one common characteristic. Examples of stratums might be males and females, or managers and non-managers. The researcher first identifies the relevant stratums and their actual representation in the population. Random sampling is then used to select a *sufficient* number of subjects from each stratum. "*Sufficient*" refers to a sample size large enough for us to be reasonably confident that the stratum represents the population. Stratified sampling is often used when one or more of the stratums in the population have a low incidence relative to the other stratums.

Again, what we need is also the function *sample()*. The rest work is just to define strata and do the SRSWOR within strata. If we have organized our data well, we may use *tapply()* to finish the sampling. For example:

> (dat = data.frame(x = 1:15, stratum = gl(3, 5))) x stratum 1 1 1 2 2 1 3 3 1 4 4 1 5 5 1 6 6 2 7 7 2 8 8 2 9 9 2 10 10 2 11 11 3 12 12 3 13 13 3 14 14 3 15 15 3 > attach(dat) > (tapply(x, stratum, sample, size = 2)) $`1` [1] 1 4 $`2` [1] 9 10 $`3` [1] 12 11 > detach(dat)

I just sampled 2 elements from each stratum in the above example.

Every rectangle stands for a stratum; I sampled 3 elements from each stratum.

R code:

x = cbind(rep(1:10, 10), gl(10, 10)) par(mar = rep(0.5, 4), xaxs = "i", yaxs = "i") for (i in 1:100) { plot(x, axes = F, ann = F, type = "n", xlim = c(0.5, 10.5), ylim = c(0.5, 10.5)) rect(rep(0.5, 10), seq(0.5, 10, 1), rep(10.5, 10), seq(1.5, 11, 1), col = c("beige", "white")[rep(1:2, 5)]) points(x, pch = 19, col = "blue") points(x[as.vector(replicate(10, sample(10, 3))) + rep(seq(0, 90, 10), each = 3), ], col = "red", cex = 3, lwd = 2) Sys.sleep(1) }

Please take some time to consider what does this mean: as.vector(replicate(10, sample(10, 3))) + rep(seq(0, 90, 10), each = 3).