Exploring Correlations: Understanding Statistical Methods When it comes to analyzing data, one of the most crucial aspects isunderstanding the relationship between di erent variables. Statisticalmethods allow us to measure and interpret these correlations, providinginsights that can help us make informed decisions. In this article, we willexplore the concept of correlations, and the statistical method that iscommonly used to measure them: Pearson correlation. What is Pearson Correlation? Pearson correlation is a statistical method that measures the strength of thelinear relationship between two continuous numerical variables. It is alsoreferred to as the Pearson product-moment correlation coe cient, namedafter its creator, Karl Pearson. The Pearson correlation coe cient is ameasure of the degree of correlation between two variables, with valuesranging from -1 to 1. Interpreting Pearson Correlation Coe cient and P-Value The Pearson correlation method provides two values: the correlationcoe cient and the P-value. The correlation coe cient represents thestrength and direction of the relationship between the variables. A valueclose to 1 implies a large positive correlation, while a value close to negative1 implies a large negative correlation, and a value close to zero implies nocorrelation between the variables. On the other hand, the P-value tells us how certain we are about thecorrelation that we calculated. A value less than.001 gives us a strongcertainty about the correlation coe cient that we calculated. A valuebetween.001 and.05 gives us moderate certainty. A value between.05 and.1will give us a weak certainty. And a P-value larger than.1 will give us nocertainty of correlation at all.
When determining the strength of a correlation, we can say that there is astrong correlation when the correlation coe cient is close to 1 or negative 1,and the P-value is less than.001. A moderate correlation is when thecorrelation coe cient is between.5 and.8, and the P-value is between.001and.05. A weak correlation is when the correlation coe cient is between.3and.5, and the P-value is between.05 and.1. Finally, no correlation is whenthe correlation coe cient is close to zero, or when the P-value is largerthan.1. To illustrate this, let's look at an example of the correlation between thevariable horsepower and car price.
Using the SI/PI stats package, we can easily calculate the Pearsoncorrelation. In this example, the correlation coe cient is approximately.8,which is close to 1, indicating a strong positive correlation betweenhorsepower and car price. The P-value is also very small, much lessthan.001, giving us strong certainty about the correlation. Creating a Correlation Heatmap Once we have calculated the Pearson correlation coe cient for eachvariable, we can create a correlation heatmap to visualize the relationshipsbetween the variables. A correlation heatmap is a graphical representationof the correlation matrix, where each cell represents the correlationcoe cient between two variables. The heatmap uses a color scheme to indicate the strength of the correlationbetween two variables. A dark red color indicates a high positive correlation,while a dark blue color indicates a high negative correlation. A white colorindicates no correlation. When we create a correlation heatmap for all the variables, we can see adiagonal line with a dark red color, indicating that all the values on thisdiagonal are highly correlated. This makes sense because when you lookcloser, the values on the diagonal are the correlation of all variables withthemselves, which will be always 1. The correlation heatmap gives us a good overview of how the di erentvariables are related to one another, and most importantly, how thesevariables are related to price. We can see that the variables horsepower and
car weight have a high positive correlation with car price, while the variableengine displacement has a moderate positive correlation.