By Team Escalera
on 26 Jul 2018 11:26 AM
  • Data Analytics, Covariance, Correlation, Coefficient of Determination
0 comments

Measures of linear relationship in Data Analytics

(Covariance, coefficient of correlation and Coefficient of determination)

Here we introduce three numerical measures of linear relationship that provide below mentioned information:

1-Covariance

2-Coefficient of correlation

3-coefficent of determination

Covariance

Covariance: - Covariance measures the linear relationship.

<Equation>

Where X and Y are two different variables

Xi   is the observation for X variable

Yi   is t the observation for X variable

µ1 and µ2 is the mean of all observation for X and Y

N is the total number of observation of X an Y .which are always same.

  • When two variables move in the same direction (both increases and both decreases) covariance will be large positive number.
  • When two variation moves in appositive direction covariance will be a large negative number.
  • When there is no particular pattern the co-variance will be small number.

There are different ways to calculate the covariance of two different variables. Consider a dataset named ‘tool costs ‘for example to make better understand. In this dataset we can get a information that how many electrical tools are sold out on which costs in different days.

Day

No of tools

Costs

1

7

23.8

2

3

11.89

3

2

15.98

4

5

26.11

5

8

31.79

6

11

39.93

7

5

12.27

8

15

40.06

9

3

21.38

10

6

18.65

 

In above example no of tools and costs are two different variable named X and Y.

1-Find the covariance in excel by formula:

Type or import the data into two columns .type the following formula into any empty cell.

= COVAR (Input range of one variable, input range of second variable)

In this example we would enter = COVAR (B2:B11, C2:C11) and get the result 32.453

See the below picture for better understand.

<Image>

2-find the covariance in excel through “data analysis”:

  • Click on data>data analysis tab
  • Select covariance and click OK button on data analysis dialog box.
  • As we click on OK button on data analysis dialog box, the covariance dialog box will open.
  • Select the input range, group by and output range and click on Ok button.

<Image>

  • As we click on OK button of covariance dialog box we get the following result.
  • Result Interpretation is explained in below mention picture.

<Image>

3-Find the Co-variance through R :-

We use following command in R to get covariance of two variables of any data set.

Here we will calculate the covariance between number of tools and electrical costs through R.

First we will import the data set and run the command of covariance for desired variables.

See the below mentioned command.

Import the data set:- 

>  data1= read.csv(file.choose())
Run the command to get covariance:-

 Dataset_name$variable1, dataset _name $variable2

Result = cov(data1$Number.of.tools,data1$Electrical.costs)

Coefficient of correlation

Coefficient of correlation explains the relationship between two variables. Mathematically the ratio of covariance and standard deviation is defined as the coefficient of correlation.

Coefficient of correlation(ρ)= <Equation>

The population parameter is denoted by the Greek letter ρ

The advantage of the coefficient of correlation has over the covariance is that the former has a set lower and upper limit. The limits are -1 and +1.

  • When the coefficient of correlation equals -1, there is negative linear relationship.
  • When the coefficient of correlation equals +1, there is a perfect positive linear relationship.
  • When the coefficient of correlation equals 0 , there is no linear relationship .

The drawback of coefficient of correlation is that except of three values -1, 0 and +1. We cannot interpret the correlation.

 

 

 

 

 

Comments (0)

Leave A Comment