x
GMAT
Select Test Select Question Types

GMAT: Statistical Analysis With Categorical Data
Types of Statistical Analysis

This lesson introduces the concept of statistical analysis and its uses. Then it explores the broad categories and various types of analysis that are used.

Types of Statistical Analysis

Statistical Analysis

In this modern world, we are surrounded by data everywhere-- be it our shopping behavior, eating habits, sleeping patterns, education, or jobs. Data can be captured from almost anything. So how are these data used and what do they tell us? The data in itself does not tell much. In order to get meaningful information out of the data, certain analysis needs to be carried out on the data.

The science of analyzing large amounts of data to explore the underlying patterns, trends and hidden insights from them is called statistical analysis.

Types of Statistical Analysis

  • Descriptive analysis helps in summarizing the available data. It analyzes the structure and distribution of either the entire data. This analysis generates limited insights that only presents a way to summarize the data.
  • Inferential analysis is used to deduce some insights from the data that are not apparently visible. It can be used to make judgements and infer insights from the data. Now, we will look at the various types of analysis within each of these categories with the help of an example. Let's say Marie is a math teacher and teaches a class of 50 students. She wants to analyze the test scores of her students. We'll see what types of statistical analysis techniques she can use.

Measures of Central Tendency

These are a type of descriptive analysis that are used to represent the central cluster (or typical scenario) depicted by the data. The most commonly used measures here are:

Median is calculated by arranging the data points in ascending order and then taking the middle number. In case of two middle numbers, their average is taken. For a set of n data points,

Example1
Example

In our example, if Marie finds out that the median test score in the class is 60, it means that half of the students scored higher than 60 and half scored below 60.

Mean represents the average of the data points. It is calculated by dividing the sum of all the data points by the number of data points.

Example2
Example1

Mode represents the most frequently occurring data point in the sample of data. In our example, Marie finds out that four students scored a perfect 100, and this was the most frequently occurring score. In such a case, 100 would be the mode.

Measures of Dispersion

These are a type of descriptive analysis that are used to explain how spread out the data points are. The most commonly used measure is standard deviation.

Standard deviation is calculated by taking the difference of each data point from the mean of the data, squaring them, adding them up, dividing the sum by the number of data points and then taking the square root. If that sounds like a headache to calculate, one can use computer software to calculate the standard deviation. With big data sets, statisticians have no other choice.

Example3
Example2

Standard deviation is very useful in a normal distribution. A normal distribution is a variable that is distributed evenly about a mean. It is usually bell-shaped, and symmetrical about the mean.

In a normal distribution, approximately two-third of the data points lie between one standard deviation above and below the mean.

kkkExample3
Approximately 2/3 of all data are within one standard deviation above and below the mean in a normal distribution.

For example, if the scores in Marie's class are normally distributed, the mean score is 55 and the standard deviation is 15, then two-third of the students have scored between 40 and 70.

Tests of Difference

These are the type of inferential statistical analysis that help figuring out whether the difference between various groups in a data sample occurs randomly or due to another variable.

Two commonly used tests for this are:

  • T test determines whether the difference between the averages of two groups in a data is statistically significant, or unlikely to be due to random chance. A t-test calculates a ratio called a t-value to analyze whether the difference is large enough to be significant.
  • One-way ANOVA test is the analysis of variance that tests whether the difference in averages among a number of groups in the data is significant or not. It is similar to t-test, but does not tell which two groups are different, only tells that there is a difference. For example, if in the class the girls scored an average of 60 and boys scored an average of 50, it may be because either the girls are better than boys or a chance occurrence. To test this out, Marie would have to run a t-test on the test scores of the two groups. If the t-value turns out to be significant, she can conclude that girls are better than boys.

Tests of Relationship

These are also inferential statistical analysis used to establish the relationship between various variables. The most commonly used measures are:

Correlation is a measure that indicates the extent to which two variables move in tandem. Positive correlation means that the variables increase and decrease together and negative correlation means that the variables move in opposite direction.

Example5
Example4
Positive correlation
kkkExample5
Negative correlation

In our example, if Marie wants to determine the relationship between the number of hours a student studied and his/her score, she can run a correlation test on the number of hours studied and test score data of the students.

Regression is a technique used to measure how strong the cause-and-effect relationship between two variables is. It examines the change in a dependent variable based on the change in an independent variable.

Share This Page