Thursday, December 29, 2011

Correlation Measurements with Microsoft Excel

Excel provides useful statistical functions for measuring correlation between two variables. As a reminder, the benefit of using a correlation coefficient to measure the relationship between two variables as opposed to using covariance is that the unit of measurement doesn't matter.

But a caution: Remember that correlation does not show causation. That is, you could easily show that as the number of ice cream cones consumed increases during a year, so does the number of drownings. But this does not mean that eating ice cream causes people to drown-more likely, these variables are both independently related to another variable-that of temperatures. Correlation is symmetrical, so you get the same coefficient if you switch the variables. Don't calculate a correlation coefficient if you manipulated one of the variables. Use linear regression instead.

Software

CORREL

You use the CORREL function in Excel to determine whether two data sets are related, and if so, how strongly. The correlation coefficient ranges from +1, indicating a perfect positive linear relationship, to -1, indicating a perfectly negative linear relationship. To calculate a correlation coefficient for a sample, Excel uses the covariance of the samples and the standard deviations of each sample. To use the CORREL function in Excel, just select the two sets of data to use as the arguments and use the following syntax:

=CORREL(data set 1,data set 2)

For example, if you have a set of preliminary test scores for a sample of employees in column
A and a set of performance feedback scores in column B, as shown in Figure 4-6, and
you want to find out whether they're related and if so, how strongly, you can use Excel to
find the correlation coefficient for the samples.

The function returns the value 0.87, indicating that the sets are positively related (as the value
of one goes up, the value of the other also increases), but the relationship isn't perfect.

PEARSON

The Pearson product moment correlation coefficient function, PEARSON, uses a different
equation for calculating the correlation coefficient. This formula doesn't require the
computation of each deviation from the mean. Still, the correlation coefficient ranges from
+1, indicating a perfect positive linear relationship, to -1, indicating a perfectly negative linear
relationship. The PEARSON function uses the following syntax:

=PEARSON(data set 1,data set 2)

Using the PEARSON function on the data shown in Figure 4-6 to compute the correlation coefficient returns the same value as the CORREL function does.

RSQ

The RSQ function calculates the square of the Pearson product moment correlation coefficient through data points in the data sets. You can interpret the r-squared value as the proportion of the variance in y attributable to the variance in x. The RSQ function uses the following syntax:
=RSQ(data set 1,data set 2)

Correlation Measurements with Microsoft Excel

No comments:

Post a Comment