Sampling

  1. Random Samples
  2. Non-random Samples
  3. Sample Size
En Español  In Finnish   Contents

The population or target population is that entire group of items or cases from which you want to gather data. The approaches in defining it are discussed on a separate page, titled Demarcating the Study.

In an empirical study, the population usually consists of physical objects like people or artifacts, or of events. In a case study it contains just one object or event, but in theoretically oriented basic research it can be infinite, i.e. you want to know something that is true for every object or event of the given type in the universe.

In some projects every specimen or event of the population is actually measured or recorded. Such a total study gives an excellent description of the population, but it is possible only if the population is not too large and if all the objects are available for study.

Total study is a relatively expensive method, because empirical work takes time and often involves apparatus, travels and other costs. Remember, too, that the objectives of a research project do not always require an absolutely exact account of the entire population and a trustworthy approximation would often suffice. Therefore it is quite common that you measure or record only so many units of the population that you can afford and that are necessary for reaching the goals of the project. To this end, several strategies are available. Some are listed below.

Note that sampling does not mean that you were not equally interested in all the items in the population. On the contrary, you would like to study all of them, but you pick the sample for practical reasons. Perhaps you have a population of millions of objects and it is impossible to reach even a major part of them. Also in those cases (with populations of, say, up to 10,000) where you might choose to study every object, the sampling study may be a prudent choice, because it saves your time and you can then use the time you save to study the sampled items more carefully.

Above, we stated that in sampling research, we are always interested, not in the sample but in the population; more exactly in the properties of the items of the population. When studying the items in the sample, we would like that the average of their attributes is the same or very near the average in the population. If that is the case, our sample is representative.

There are two alternative principles which you may use when selecting a sample:
Random Sample

The act of sampling itself generates two types of disagreement between the target population and the sample:

You might wonder why to use non-random sampling at all, because it involves the risk of bias, a seemingly unnecessary source of disagreement with the population? There are several possible motivations to it:

Random Samples

If a random sample is properly made, it contais no bias and it is therefore relatively representative of the population. Of course, you can never be 100% certain that the results measured from the sample are also true in the population. However, for practical purposes it is often enough if you know that the risk of a deviation from the population is, say, 1%. (Or 5%, or 0.1%.) You will be able to make such statements that are based on probability calculus if you have used a random sample.

The principle in selecting the items to the random sample is the same as when casting lots. All the objects of the population shall have an equal probability to be selected into the sample. This probability is called sampling ratio, and it is equal to the number of the items in the sample divided by the number of the population.
Simple Random Sample There are alternative methods of creating a random sample (in other words, a "probability sample"). In the following diagrams, items of the original population are presented as small dots or as other small symbols, and items selected in the sample are shown as bold symbols.

Non-random Samples

Non-random (or "nonprobability") samples are selected by the discretion of the researcher. They are often quick and cheap to create, even if they usually are less representative than random ones.

In informative studies the presence of bias is usully a grave handicap, because it can prohibit generalizing the results. This is a difficulty that you will meet later in your project, when Assessing Non-Random Sampling and when writing the final chapter of your report, so it can be prudent to think about it in advance, when selecting the sampling method.

In research and development projects the risks in using non-random samples are smaller, because the possible bias can be compensated later. For example, it is common to use convenience sampling when selecting potential customers to a think-tank in order to develop an early product concept. The selection of persons will probably be biased, as well as the proposals from the think-tank, but the proposals will be rectified at a later stage when they are evaluated anew by another, larger group of people.

Common types of non-random samples include, Convenience sample

When designing a non-random sample you should always keep in mind the original population. Is the sample representative? Are the results valid in the population? Is it certain that the criterion that you have used in selecting the sample (e.g. the willingness of people to participate) has no correlation with those variables that you want to record from the sample? If there is correlation, your sample will be biased.

Inappropriate methods of sampling

Overstepping the limits of population. You must not include in your sample items that are not members of the defined population. For example, in snowball sampling it often happens that some interviewed people nominate candidates that do not belong to the same population. Of course, you have often the option of altering your original delimitations.

Sample of specialists. It might look like a sensible idea to ask directly those, usually few, people that know a lot about the topic, instead of asking a large sample of randomly selected laymen whose knowledge can be sporadic and opinions may diverge. In this way, we might, for example:

All the examples above are from real life, and they are no doubt rational, quick and effective methods, because you need to interview just a few people and in the discussion you get quickly to the point. Nevertheless, you should not think that a sample of "specialists" could be taken as a sample of "non-specialists". These are two different populations. You should not generalize the results from "specialists" to any other population than just the population of "specialists" whoever they may be.

If you anyway choose to interview specialists, you can do it, of course. If you then additionally want to gather the opinions of the average consumers, you should define these as a second population and select a suitable sample of it, too. One possibility is to make these two surveys in succession. You could perhaps use the results from the specialists as new hypotheses to be tested with another sample of the consumer population. In other words, you would use the interview of the specialists as a preliminary study only. Or the other way round - you can first ask consumers and then the specialists.

Normative sampling. Normative aspect is acceptable in development projects which aim at improving similar objects in the future, but it is better to keep it out of sampling because it upsets the principles of representativeness and generalization.

Sample of the best"Sample of the best" cases is quite a tradition in art history: you only take into account the great masters. The idea is that the best cases are closest to the ideals that artists had in their time and in this sense they represent the truest art of the era. They, too, had the greatest influence into later development. However, it is self-evident that the best works are not typical of the era and they do not represent average works of art. This does not suggest that you should not study them, but if you do it, do not call it a "sample" if you mean that the population of your study are the great masters. Cf. the discussion under Demarcating the Study.

Later in the project, when analyzing the data, you can easily uphold the normative aspect if it is needed, cf. Normative Case Study , Normative Comparison , Normative Classification , Normative History, and Normative Study of Variables, so there is no need to mix up the sampling procedure with normative considerations.

Sample Size

The main purpose of sampling is to reduce the need for empirical operations which entail labor and cost. How small can a sample then be without losing its usability? In other words, what is the smallest number of cases that still give us reliable enough data about the population?

Random Samples

Data that we can get from the sample are normally slightly different from those of the population. The reason is that the random selection has brought to the sample, not only average items of the population, but also some more or less exceptional items. How many of them, can be anticipated by using the theory of probabilities. It can also tell us how large is the risk of getting erroneous data because of these exceptional cases. The risk is roughly proportional to the variance of the variables and in inverse relation to sample size.

If we use the formula the other way round and know the desired level of statistical significance of the data we are going to record from the sample, we can calculate the required sample size on the basis of the number of variables, and their variances. The variances are often not known in advance, but in that case an approximation can be used.

You have, for example, measured two variables from a small sample and found that their correlation is 0.26. Now it is always possible that such a correlation has been created in the sample just accidentally and it is not true in the population. You want that the probability of such an accident be less than 1%. If you consult the table presented under t-test you will find that a sample of 100 cases is needed before the probability of getting accidentally a 0.26 correlation diminishes to 1%.

Another example. You are studying percentages and you want to be 95% certain that the percentage that you have measured from a sample is true in the population as well, you can use the formula of confidence interval:

where

p = percentage calculated from a sample
n = sample size.

If the confidence interval, according to the formula, is too wide, you can cut it down by using a larger sample. From the formula you can infer that if you multiply the sample size by four, the confidence interval will shrink into half. Note that the formula is independent of the size of the population.

The formulas for calculating statistical significances are exact but somewhat cumbersome to use because almost every type of statistic has a different formula for evaluating its statistical significance. That is why they are not presented here. In important projects with ample resources, a statistician is usually consulted for the calculations.
In a research project with limited resources, the rule of thumb is: Use as large a sample as you can afford.

Non-random Samples

There is no formula to determine the size of a non-random sample. Often, especially in qualitative research, you may simply enlarge your sample gradually and analyse the results as they come. When new cases no longer yield new information, you may conclude that your sample is saturated, and you finish the job. This method is however very sensitive to biased sampling, so you should be careful and make sure that you do not omit any groups from your population.

Remember also that if a sample is biased it does not help to increase the sample size. The added sample will be just as biased if you use the same method of selection as for the original sample.

If you can afford to make a second sample, try creating it with another method of selection. Keep initially separate the data from each of the samples. By comparing them you have an excellent means of judging the presence of bias in either of them.

Before deciding the size of a non-random sample, you might want to read how to assess the results from a non-random sample. Otherwise you might experience quite a nasty surprise when trying, too late, to define the field where your results could be declared valid.

Failing Cases

It often happens that some cases in the sample turn out fruitless because they cannot be reached, measurements fail, interviewees refuse to co-operate etc. If you want do the sampling very carefully, you should ask yourself: Is it probable or possible that the failing cases differ from the successful ones in any respect that is interesting in your project? Only when the answer is no, you can rest assured that the absence of these cases will not introduce bias in the results. The normal method is then to overdimension the sample slightly, and then simply forget the failing cases.

If you, on the contrary, think that the failing cases systematically differ from the rest, you can try to neutralize the bias by giving different weights to the data that come from these two groups. The method is explained in The Problem of No-Reply.

En Español  In Finnish   Contents

Febuary 6, 2005. Original location: http://www2.uiah.fi/projects/metodi
Comments to the author: