Statistics


Statistics is a mathematical science pertaining to the collection, tabulation, classification, analysis and explanation of quantitative data. The analysis and explanation of this quantitative data may involve making predictions, forecasts and drawing conclusions.

This discussion is just a very brief introduction to statistics.

Descriptive and Inferential Statistics

Descriptive statistics involves the collection, organization, summarization, and presentation of data. Inferential statistics involves the generalizing from samples to populations using probabilities. Inferential statistics a is more advanced topic that includes performing hypothesis testing, determining relationships between variables, and making predictions.

Population and Sample

The population includes all objects of interest whereas the sample is only a portion of the population. Parameters are associated with populations and statistics with samples. Parameters are usually denoted using Greek letters (mu, sigma) while statistics are usually denoted using Roman letters (x, s). There are several reasons why we don’t work with populations. They are usually large, and it is often impossible to get data for every object we’re studying. Sampling does not usually occur without cost, and the more items surveyed, the larger the cost.

Discrete and Continuous

Discrete variables are usually obtained by counting. There are a finite or countable number of choices available with discrete data. You can’t have 2.63 people in the room. Continuous variables are usually obtained by measuring. Length, weight, and time are all examples of continuous variables. Since continuous variables are real numbers, we usually round them. This implies a boundary depending on the number of decimal places. For example, a data element of value 64 may really represent anything greater than or equal to 63.5 and less than 64.5. Boundaries always have one more decimal place than the data and end in a 5.

Levels of Measurement

There are four levels of measurement: Nominal, Ordinal, Interval, and Ratio. These go from lowest level to highest level. Data is classified according to the highest level which it fits. Each additional level adds something the previous level didn’t have.

Nominal is the lowest level. Only names are meaningful here. Nominal is a level of measurement which classifies data into mutually exclusive, all inclusive categories in which no order or ranking can be imposed on the data
Ordinal adds an order to the names. Ordinal is a level of measurement which classifies data into categories that can be ranked or “put in order”. Differences between the ranks do not exist.

Interval adds meaningful differences. Interval is a level of measurement which classifies data that can be ranked and differences are meaningful. However, there is no meaningful zero, so ratios are meaningless.
Ratio adds a zero so that ratios are meaningful. Ratio is a level of measurement which classifies data that can be ranked, differences are meaningful, and there is a true zero. True ratios exist between the different units of measure.

Types of Sampling

There are five types of sampling: Random, Systematic, Convenience, Cluster, and Stratified.
Random sampling is analogous to putting everyone’s name into a hat and drawing out several names. Each element in the population has an equal chance of occurring. While this is the preferred way of sampling, it is often difficult to do. It requires that a complete list of every element in the population be obtained. Computer generated lists are often used with random sampling.

Systematic sampling is easier to do than random sampling. In systematic sampling, the list of elements is “counted off”. That is, every kth element is taken. This is similar to lining everyone up and numbering off “1,2,3,4,5; 1,2,3,4,5; and so on”. When you are done numbering, all people numbered 5, for example, would be used to obtain data from.

Convenience sampling is very easy to do, but it’s probably the worst technique to use. In convenience sampling, readily available data is used. That is, the first people the surveyor runs into are included in the study.
Cluster sampling is accomplished by dividing the population into groups — usually geographically. These groups are called clusters or blocks. The clusters are randomly selected, and each element in the selected clusters are used.
Stratified sampling also divides the population into groups called strata. However, this time it is by some characteristic, not geographically. For instance, the population might be separated into males and females. A sample is taken from each of these strata using either random, systematic, or convenience sampling.

Measures of Central Tendency

The term average can have four different meanings: mean, median, mode or midrange.
Mean The mean is calculated by summing all of the data elements and dividing by the number of elements. The mean of the set 2, 3, 4, and 5 is (2 + 3 + 4 + 5)/4 = 3.5.

Median The median is the middle value, after the data has been sorted. The median of the set 3, 4, 5, 6, 8 is 5.
Mode The mode is the most frequently occurring data value. The mode of the set 2, 3, 3, 6, 6, 6, 7 is 6.
Midrange The midrange is the midpoint between the highest and lowest values. The midrange of the set 2, 4, 5, 7, 10 is 6, which is calculated as (2 + 10) / 2

Leave a comment

Your email address will not be published. Required fields are marked *