Saturday, 23 June 2018

AngularJS Tutorial for Beginers - Course Outline - 1

Hello! Today, you have a choice to make: whether or not to learn the most popular programming/scripting language in the world, AngularJS. This tutorial is for you. But let me tell you a bit about how this tutorial is organized.


This AngularJS tutorial we are about to kick-off is specially designed for beginners who want to learn a new technology in a very simple and easy way.

The interesting thing about AngularJS is the fact that it was developed by Google, not for experts but for everyone. You don't need any programming knowledge to learn and use it.

This Tutorial is based on 3 strategies: Practice-While-Learning, Small bit a a time and Homeworks.

1. Practice-While-Learning Approach
This tutorial uses a PWL approach. PWL means Practice While Learning, which means that all through the period, you will have examples to practice what you learn. Sample code would be provided for you

2. Small Bit of Lessons at a Time
This means that the lessons would be short and interesting. Every week there would be a new lesson, you simply need to follow 'Learn Computer Programming' to get notified each time a lesson it published. The idea is that you will learn without any much effort.

3. Homeworks
At the end of each lesson, you will have a homework. This homework is expected to be completed in one week, that is before the next lesson is published.




The Course Outline - 1
  1. Introduction to AngularJS
  2. Setting up AngularJS
  3. Your First AngularJS Application
  4. AngularJS Directives
  5. AngularJS Expressions
  6. AngularJS Controllers
  7. AngularJS Filters
  8. Working With Tables in AngularJS
  9. Document Object Model in AngularJS
  10. AngularJS Modules
  11. Working With Forms in AngularJS
  12. AngularJS Includes
  13. Views in AngularJS
  14. Scopes in AngularJS
  15. Services in AngularJS





We hope that in 15 classes we would cover this outline and trust me, you will be amazed by the time we are done, that you have learnt something new without difficulty.

For those that already have a knowledge of AngularJS, I'm working on another outline, that is the AngularJS Tutorial for Programmers - Course Outline 2. This is for programmers who already have a knowledge of AngularJS

I will be relating with a number of programmers to complete this ouline and start the lessons.

Then the last tutorial would be AngularJS Tutorials for SharePoint Developers. This is very interesting because it allows you to build complete intranet applications if you already have access to SharePoint and have some knowledge of SharePoint.

For this third group, I would challenge you to examine the following lessons and give me recommendations on what you expect as a SharePoint Developer






So let's brace up, as the first part of the series 'AngularJS Tutorial For Beginners - Course Outline -1' kicks  off by by the first week of July 2018.

If you have any comments or recommendations, leave it in the comment box below or to the left of this page.

Thanks and best wishes in your learning effort!





Thursday, 21 June 2018

Introduction to Partial F-Test in Multivariate Linear Regression

Good to see the effor you put in learning. I assure you that these concepts are really not hard to understand.

In this lesson, I would explain the concept of Partial F-Test in Multivariate Linear Regression. I assume that you have a basic knowlege of Regression, so we would begin with a recap of Multivariate Linear Regression.
I would try to keep it very simple and clear.





Content
  1. What is Multivariate Linear Regression?
  2. What is Partial F-Test?
  3. How it Works
  4. The Fchange Statistic
  5. Automated Modelling Algorithms
  6. Final Notes


1. What is Multivariate Linear Regression?

Remember that Regression has to do with trying to find a relationship between the one variable called the dependent(or target) variable and the independent variable (or explanatory variable).
The typical equation of a regression, is the slope of the linear regression line given by:

y = ax + b

where y is the dependent variable
x is the independent variable

In case of multivariate linear regression, the dependent variable y depends on two or more independent variables x1, x2, ..., xn

The equation for multivariate linear regression is given as:

y = β0 + β1x1 + ... + βkxk + Ɛ

where x1, x2,...xn are the independent variables
y is the dependent variable
 β0, β1,...βn are the regression coefficients
Ɛ is the error term



2. What is Partial F-Test?

Partial F-Test is a statistical analysis used in multivariate linear regression to determine independent variables are to be considered when fitting a multivariate linear regression model. In order words, how many variables are to be considered to create a good fit.
This is neccessary because if too many variables are considered,  then the model would be too complex. On the other hand, if too few variables are used, then we may get a very weak fit.


3. How it works

Let's assume that we have created a multivariate regression model between the dependent variable y and the independent variables x1, x2,...xf.
Now we would like to improve the model by adding additional independent variables xf+1, xf+2,...xk.
The question would be if continuously adding these additional variable would improve the model or not.
So we are going to perform a Partial F-Test to determine this and we need to calculate the Fchange statistic.

Learn about Overfitting and Underfitting here.




4. The Fchange Statistic

In this session, I'm going to explain how to carry out the partial F-Test.
The first step is to assume a that the the model would not be improved by adding new independent variables to the model. This assumption would be out null hypothesis.This would also mean that the coefficients of terms in the regression model containing the additional variables would be equal to zero.
So, first, we can state the null hypothesis this way:

H0 : βf+1 = βf+2 = ... = βk = 0 (additional new variables does not improve the model)

This is our null hypothesis

The next step is to calculate the Fchange statistics



where
n-k-1 is the degrees of freedom
vi is part of the correlation coefficient of the ith variable
q is the number of variable left out (q = k - f)

If the calculated Fchange follows an F distribution  with q and n-1-k degrees of freedom if the null hypothesis is true.

If the level of significance is high enough, then we can accept the null hypothesis that the additional variables does not add any improvement to the model.
However if the level of significance is close to 0, then additional variables needs to be included in the model





5. Automated Modelling Algorithms

 Just as you may have figured out, it would be very difficult to perform the Partial F-Test manually. You're right!. This is especially true if the number of independent variables being considered is much. So there are automated methods that can be used to produce a good fit.

How they work: These algorithms sort out the list of independent variables in the final model according to different variables in the final model according to different strategies.

(a) Foward Selection


This algorithm follows two steps
Step 1: In this step, a list of independent variables are selected that have highest correlation coefficients in the absolute value with the dependent variable. Then calculate the F-statistic for linear regression with this variable to see if a strong linear relationship has resulted in a measurable fitting. If the test is significant, then then the algorithm stops. Otherwise, continue to Step 2

Step 2: Take the next variable that have the highest partial correlation coefficient among the residues with the dependent variable. Also calculate the F-statistic with this new variable. If the calculated value is larger than a required set value, then we take another variable and repeat Step 2. Otherwise the algorithm stops.

(b) Backward Elimination


This algorithm is the opposite of the Foward Selection procedure.
In this case, we begin by including all the variables from the start into the regression model. Then we select the least suitable ones.
Beginning wiht the smallest variable, we examine beta coefficient variable and the F-statistic of the reduced model.
F must be larger than the required set minimum value. If after the reduction step, the criteria is no longer reached,, then the reduced model will be the final result

Stepwise Selection


This algorithm tends to combine the features of the Foward Selection and Backward Elminiation. It repeatedly adds or removed variables from the list of independent variables. So the algorithm terminates either based on a minimum or a maximum set treshhold.
If the F-statistic or the significance level is exited fromthe interval, the the algorithm stops.





6. Final Notes

I would make a video explanation of the Partial F-Test in Multivariate Linear Regression. Also check my Channel for Lessons of Introduction to Linear Regression and Introduction Multivariate Linear Regression as well.
If you have any challenges following this lesson let me know in the comment box below or by the left of this page.
Thank you for reading and welldone for your efforts in learning Partial F-Test in Multivariate Linear Regression Analysis.

Watch Linear Regression Video lesson here




Wednesday, 20 June 2018

How to Perform One-Sample t-Test Step by Step

I am going to teach you how to perform one-sample t-Test, step by step using an example
 

Question
A research believes that the mean incom of workers in Akokwa town is 20,000 dollars. She wants to test this hypothesis against the alternative that the mean income is not equal to 20,000 naira. A random sample of 9 workers in the town is chosen, and their incomes (in dollars) turn out to be the following:
24000, 13400, 18400, 22900, 13800, 8200, 11100, 9300, 14600.

(a) If the sociologist set a 5 percent significance level, should she accept or reject this hypothesis?
(b) If she sets a 1 percent significance level, shoud she accept or reject the null hypothesis.

Solution
We would start with the first step and progress through the last step. These are the steps:
  • Define the null and alternative hypothesis
  • State the alpha
  • Calculate the degrees of freedom
  • State the decision rule (acceptance criteria)
  • Calculate the test statistic
  • State the results
  • State your conclusion



Step 1: State the null and alternate hypothesis
The null hypothesis is the what is currently believed and needs to be tested. From the question the believe is that the mean income is 20,000 naira. Therefore

H0: mean = 20,000 naira
H1: mean ≠ 20,000 naira

Step 2: State the alpha
From the question, we have 5% significance level. This is the same as alpha level of 0.05 ( or 95% confidence). So we can state the alpha level as:
α = 0.05

Step 3: Calculate the degrees of freedom
Degrees of freedom is given as df = n-1. So in this case:
Degrees of freedom df = 9 -1 = 8

Step 4: State the decision rule (acceptance criteria)
To do this we need to look up the critical value of the t from table of t-distribution or t-Table.
In the table, the alpha is placed in the column while the degrees of freedom are on the rows. In our case we look for df of 8 under 0.05. It is normally written as:
Tcrit or T8, 0.05 = 2.306

Decision rule: Accept the null hypothesis if absolute value of t is less than 2.306 or greater than -2.306

We can do a little sketch to illustrate this in Figure 1.0. Note that the 5% has been devided into two parts of 2.5% on both sides of the graph





Figure 1.0: Sketch of Accept and Reject region

5. Calculate the test Statistic (Absolute value of t)
The formula for the test statistic is given by:

where:
xn is the mean(we need to calculate it)
μ0 is the mean to be tested (given in the question - 20,000)
sd is the standard deviation of the sample (we need to calculate it)
n is the number of elements,  in this case it is 9

xn = Sum/9

Sum =  24000+ 13400+18400+22900+13800+ 8200+11100+9300+14600
Sum =  135700
xn = 135700/9 = 15077.78
sd =  5623.78

Note: Standard deviation is sum of squared deviation divided by n.

Using the formula for t, we can calculate the value of t as:

Step 6: State your results
The value of t is -546.93 and the decision rule states that the the null hypothesis is accepted if the value of t is less than 2.306 or greater than -2.306.
In this class t is not greater than 2.306 and therefore, the test has failed.
Result: We reject the null hypothesis, H0




Step 7: State your conclusion
From the result, our conclusion would be that the mean income of workers in Akowa town is not 20,000 dollars

If you follow these steps in your statistics quiz, then you are sure to make the maximum grade for How to Perform t-Test.
Success in your exams and thanks for your efforts!




Tuesday, 19 June 2018

Is Time Travel Possible? Does Time Machines Exist? - An Expert Explains

I have seen many videos of persons claiming to be time travelers from different years telling their stories. Some say they came from 2100, 4897, 2050, 2935 etc. And they tell stories of what the future looks like: cars driving themselves, no governments, almost everything being automated, digital cities and so on.




First all this are imaginations or rather science fiction videos simple meant to entertain the viewers, and I do think that that is exactly what they are meant for, entertainment.

This is similar to the movie, Interstellar that featured, the actor John Cooper traveling with his team to a distant planet and by the time he came back, his daughter have become so advanced, looking like a grandmother to him.

Now on the real side of things, science has theoretically proven that it is possible to travel forward in time. Forward in time to the extent of a fraction of a second.  This is possible due to Newton's theory of relativity.

The Principle of Causality
This is also called the Cause and Effect principle. It simply states that observed or unobserved effects have causes.
In simple terms, if the device you are reading this blog with is put into fire(cause), then it gets damaged (effect). Another example is that if there is an egg, then it was laid by a chicken.


Let's now apply this to Time Travel. Now, let's assume Time Travel is possible. Let's say today, you have a car you have been using for 6 years, say from 2012 to 2018. So today being June 2018, you  decide to travel six years back in time to when the car was new, that is say January 2012. As you traveled back you met some angry youths set the car ablaze and burnt it to ashes. Afraid of your life, you decide to travel back to June 2018.

The question now is:
Where is the car? Remember you have used this car for six years. Now if it was set ablaze in January 2012, then it is not possible you have been using it from 2012 to 2018.
In order words, the principle of causality has it that if the car has been used for six years (effect), then it have existed for years(cause). You simply can't change it, you can't go back in time to remove remove the cause of an already effected outcome.

Therefore, the universal principle of causality makes in impossible to travel back and forth in time.

Other important questions I would discuss are:

  • Is it possible to travel to the future, just to see how it would look like, without doing anything there?
  • How would the world be like in 200 years time, say in 2218?
  • Are there some living creatures in other worlds?
  • Are aliens real? Do they exist?
  • Are there some humans with super powers (that is superheroes)?

I would discuss these topics and you would know the clear answer to them. If you have any other question you would like me to answer, leave it in the comment box below.


Monday, 18 June 2018

What is Heteroscedasticity and Multicolinearity in Regression Analysis?

Hello! Good to see you. I'll be very happy to explain to you these two important concepts
  • Multicolinearity
  • Heteroscedasticity

These are big words, but trust me they are very easy to understand. Normally, these two terms are not directly related but they are two situations that create some problem during regression analysis.

So just as the two words are difficult to pronouce, they also make analysis procedure difficult!

As you know, I normally explain maths and statistics concepts in a simple way to allow you clearly understand it and be able to present it in your own words during a quiz or an exam.

What is Heteroscedasticity?
Heteroscedasticity  means unequal scatter. This means that the  variability(or scatter) of a variable is unequal accross the range of values of the other variable that is used to predict it.
It is a systematic change in spread of the residual over the range of the measured values. This is illustrated in Figure 1.0


The Problem of Heteroscedasticity
In Linear Regression, it is assumed that the measured error (or residual) between the measured value and the regression line maintains a constant variance (homoscedasticity).
But in the case of heteroscedasticity, the residuals have an unequal variance which makes the regression assumption unreliable.

What is Multicolinearity?
To understand this term, you need to keep in mind that in multivariate linear regression,  the objective is to examine the effect of more than one independent variable, x1, x2,... xk on a dependent variable Y.
So the regression cuntion would be as shown below:

Where
Y is the dependent variable
x1, x2,... xk are the independent variables
Ɛ is the error term

Now, these independent  variables not only affects the dependent variable but also either a strenghtening or weakening effect on each other. Therefore, multicolinearity means the linear correlation relationship between two or more independent variables.

Effects of Multicolinearity
Multicolinearity has unpleasant effects which are highlighted below:
the independent variables may assume each other's role
the effect of the independent variables on the dependent variable cannot be distinguished
estimation of the regression coefficients is rendered unreliable
In some cases, analysis becomes very difficult to perform

Measure of Multicolinearity
There are three metrics used to examine the effects of multicolinearity. They are:

  • VIF (Variation Inflation Factor)
  • Tolerance
  • Condition Indices


Variation Inflation Factor (VIF): The VIF shows the actual variance of the estimated coefficient of the ith variable, how many variables after this variable would be without the effect of muticolinearity. Same is calculated for the other variables.
if the VIF is between 1 and 2 , then there is strong multicolinearity. If betwee 2 to 5, then it is very. For values above 5, there is very high multicolinearity.

VIF is given by the formula

Tolerance: This is given as the inverse of the Variation Inflation factor.  A low tolerance value indicates an increasing multicolinearity


Condition Index: This is a measure of the relative amount of variation associated with an eigenvalue. Large value of the condition index indicates a high degree of collinearity.

Quiz: Try to read more on Condition Indices and write the formula. Leave a comment if you succeed in doing this.

For now, I would like to thank you for reading


What is Coefficient of Determination in Linear Regression

If you are learning linear regression, then you need to clearly understand the concept of Coefficient of Determination R2 and the Adjusted Coefficient of Determination R2adj.

I am going to explain these concepts in a very easy way.

We are going to cover the following:


  1. What is Coefficient of Determination
  2. Properties of Coefficient of Determination
  3. Adjusted Coefficient of Determination
  4. Final Notes





1. What is Cofficient of Determination?


The coefficient of determination, is used to determin the proportion of the variation of one of the variables  that is predictable from the other variable.

Look at the table below. What do you think is the relationship between X and Y?
It seems that Y equal X/2. But carefully looking at the table we see that this is not exactly true for two of the data points.

But we can say that 80% of the time, Y is X/2. This means that the coefficient of determination is 0.8 (or 80%)

YX
1.02.0
2.04.0
3.07.0
4.08.0
5.010.0
6.012.3
7.014.0
8.016.0
9.018.0
10.020.0

Table 1: For 8 out of the 10 points, y=x/2

The coefficient of determination is a measure of how certain we are in making predictionsf rom a certain model. It determines the ratio of the explained variation to the total variation.

The value of R2 ranges from 0 to 1, that it:
0 < R2 < 1

It denotes the strength of the linear association between x and y. When we are using a line of best fit, then the coefficient of determination represents the percent of the data that is closest to the line of best fit.

For example, if R = 0.89 then R2 = 0.792 which means that 79.2% of the total variation in y can be explained bz the linear relationship between y and x (as described by the regression equation, in our case it is y = x/2.

The other 20.8% of the variation remains unexplained.

So we can say that the coefficient of determination is a measure of how well the regression line represents the data.

Formula for R2 is given by:




2. Properties of Coefficient of Determination


Let's now outline some of the properties of R2  that you need to know. To get used to these properties, take some time to write then out in your note.

0 ≤ R2  ≤  1  if  f(X) = r(X) = E(Y | X)
if X and Y are independent, then R2 = 0
if Y = f(X) then R2 = 1
if f(X) = a*X + b* then the theoretical linear regression is given by R2 = (R(X,Y))2
if the joint distribution of X and Z is normal, then R2 = (R(X,Y))2



3. Adusted Coefficient of Determination R2adj


Just like the Coefficient of determination, the adjusted Coefficient of Determination R2adj is used to determine how well a multiple regression equation fits the sample data.

The difference between R2 and R2adj is that R2 increases automaticallz as new independent variables are added to the regression equation even if they don't contribute to any new explanatory power to the equation.
However the R2adj increases ONLY IF the new independed variables added, increases the explanatory power of the regression equation. Theis makes the R2adj more reliable in measuring how well a multiple regression equation fits the sample data.


4. Final Notes


I hope this brief discussion have helped you understand the concept of Coefficient of Determination and Adjusted Coefficient of Determination as it applies to Regression Analysis. Especially take not of the difference betweent the two as this always appears in statistics quiz and exams.

Thank you for reading and remember to leave a comment below if you have any challenges following the explanation.

Monday, 11 June 2018

How to Perform Mann-Witney U Test(Step by Step) - Hypothesis Testing

This is an interesing procedure and really very straightforward if you would follow through. I happy to know that you are making effort to learn hypothesis testing.

Feel free to leave me a comment below if you have any challenges. I would be happy to give you needed support.


Content
  1. What is Mann-Whitney u-Test?
  2. Exercise 1
  3. Solution Steps

1. What is Mann-Whitney u-Test?


Mann-Whitney u-Test is a non-parametric test used to test whether two independent samples were selected from population having the same distribution. Another name for the Mann-Whitney U Test is Wilcoxon Rank Sum Test.
Note: This is not the same as Wilcoxon Signed Rank Test which is used for dependent samplest.

Just as you know, the easiest way to understand as statistical test is to just perform the test yourself. So now we are going to go through an example and be sure to follow along with a pen,  notebook and a calculator. It is really easy and fun!

2. Excercise


A  researcher gave an aptitude test to 24 respondents, 12 were men and 12 of them were women. He recorded the scores for each of the responded and tabulated it in the table below:

Men807992658384957881857352
Women828789919376747088996194

Use thid data provided to test the null hypothesis that the distribution of scores is the same for men as for women. Use a significance level of 0.05. (Use the Wilcoxon Rank Sum Test)

3. Solution Steps


We would follow the step by step procedure. We would also use excel to tabulate our data to make it easier to perform calculation.

Step 1: State the null and alternate hypothesis and rejection criteria


The null hypothesis states that median difference between the pairs ranks of the observations is zero (that is there is no difference in the ranks of the two pairs of observations) and the alternate hypothesis states that the median difference between the ranks of data is not zero.

These are stated below.

H0: μm = μw
H1: μm ≠ μw

Alpha = 0.05

Rejection Criteria: Reject the null hypothesis if Ustat < Ucrit

Step 2: Perform a ranking of all the observation


In this case, we simply subtract corresponding pairs of the data.
We would used MS Excel to help us do it faster. So, I have transfered the table excel and the added a column for the difference in the two pairs of observations.

 
 
Total number of observation is 24, so we need to rank the observations from 1 to 24. In the figure, I have colored the observation for Men as blue and for Women as red. This is so that, when I sort the data, I would know which one belongs to which group.

I sorted the data to make it easy to assign the ranks to each of the observation.

Step 3: Calculate the Rank Sums


We calculate the sum of the ranks for the two groups

Sum for Men (n1) = 10+9+20+3+13+14+23+8+11+15+5+1 = 132

Sum for Women (n2) = 12=16+18+19+21+7+6+4+17+24+2+22 = 168


Step 4: Calculate the U Statistic for the Two Groups


The formula the the U statistic is given as:


For the Men observation, we have



We also calculate the U for the Women observation, we have
The U-stat is the smaller value of the two and that would be

Ustat = 66



Step 5:Determine the Critical value from Table


From Mann-Withney u-test table, we check the value under column 12 and row 12
We have a critical value of U to be

Ucrit = 37

Since the calculated value of U is greater than the critical value, we accept the null hypothesis and agree that the two groups are the same.


Thanks for learning. Leave a comment if you have any challenges following this lesson.