Tuesday, 5 June 2018

Advanced Statistics Quiz 8 - Concepts from Multivariate Linear Regression

Hello, good to see you!
I made this post to help you prepare for oral and written quiz or exam on advanced or mathematical statistics. 
So basically, I would be explaining the concepts in very easy to understand way to help you understand it and be able to answer in your own words.


Question 1: What is Partial F-Test and when is it applied
Partial F-Test is a statistical test used to in multivariate linear regression where extra variables have been included, to determing whether the extra variables  provide enough extra explanatory power as a group.

It is used when the simultaneous test of the statistical significance of a group of variables is being tested and requires two regression models.

Question 2: Outline and Expain the 3 Modelling Algorithms
The three modelling algorithms are:
  • Forward Selection
  • Backward Selection
  • Stepwise Selection

Foward Selection
Step 1: In the first step of this algorithm, a list of independent variables with the highest correlation coefficient in the absolute value with the target variable is selected.
Then calculate the F-statistic to see if a very strong linear relationship exists between the variables.
If there is a strong relationship, then stop, else go to step 1
Step 2: In the second step, take the next variable, the highest partial correlation coefficient among the residues with the target variable.. Then calculate Fchange, the extended linear regression. If the calculated value, shows a strong relationship stop, else repeat the process

Backward Elimination
This algorithm works in opposite direction to the Foward Selection. In this case, we start by including all the variables from the start, then we select the least suitable ones. Then we calculate the error. Based on the error value, we remove or eliminate the least suitable variables from the selection. We repeat the caculation until the error is at a minimum value. 

Stepwise Selection
This algorithm is a combination of the Foward Selection and Backward Elimination. This involves gradually removing variables fromthe list of independent variables repeatadly. The stop rule now has a minimum and maximum value. We stop when the F statistic has reached a significant level.


Question 3: What is Adjusted Coefficient of Determination R2 and What is it Used For?  What is the difference between R2 and Adjusted R2?
The Adjusted Coefficient of Determination is used in multiple regression to determine how well a multiple regression equation fits the sample data.
The main difference between R2 and adjusted R2 is that R2 increases automatically as new independent variables are added to the regression equation(even if they don't contribute to the explanatory power of the equation)
However, the adjusted R2 increases only if the added new independent variables contribute to the explanatory power of the regression equation. Therefore, the adjusted R2 is mor useful measure of regression fit than just the R2.

Question 4: What is Heteroscedasticity?
Heteroscedasticity is a concept in regression that means unequal scatter. It is a systematic  variation is the spread of the residues  oeer the change of measured values.Remember that residue is the errror measured between the regression line and the data point. It is the opposit of homoscedasticity(which means constant variation in the residues)

Question 5: What are the Type of Residues
Common Residual
Deleted Residual
Standardized Residual
Studendized Residual

Question 6: What is Covariance Ratio?
 The coveriance ratio is given by:
CovR = CovTL/CovTLj
where:
CovTL is the covariance between the target variable and the linear regression
CovTLj is the covariance betweent the target variable and teh linear regression gotten by omitting the jth case.

Question 7: What is Multicollinearity in Regression?
Multicolinearity means the linear correlation relationship between the two or more explanatory variables.

Question 8: Mention some problems caused by Multicollinearity?
  • the effect of the explanatory variables on the depended variables cannot really be separated
  • the explanatory variables can assume the role of other  explanatory variables
  • estimation of regression coefficients becomes unreliable
  • in extreme situations, analysis cannot be performed on the data
We continue with the next part. You can ask a question in the box below or in the form by the left of this page.


Thank you!