Friday, 13 July 2018

What are Dialogs in Chatbots? (Same as Forms in Web Application)

If you have been following my lessons on  How to Build Chatbots or MS Bot Framework Tutorials, you would realize for you to develop a useful chatbot application, you simply need the right information, and from the right person. That is exactly what I do for my subscribers.

  1. Introduction
  2. What are Dialogs in Chatbots?
  3. Where to Start
  4. Adding New Dialogs
  5. Redirecting to Another Dialog
  6. Final Notes

1. Introduction

Honestly, it's really very easy to develop applications no matter how complex they are. Ask me why. The reason is because, the tools are already there. Also bits and pieces of any application have already be developed, you simply need to find them and piece them together. And that's it.

Today, I would tell you about dialogs in Chatbots. If you understand the concepts of dialogs in chatbots, then you will see that development of chatbots is really easy if you have a basic knowleges of programming.

2. What is a Dialog?

A dialog si simply a single flow of communication between the user and the chatbot. For example:

User: Please play me a song
Bot: Which song would you like me to play
User: Ada Ada by Flavour
Bot: I could not get Ada Ada by Flavour. Would you like Golibe by Flavour?
User: Ok

You see fromthe above that this is a single flow of conversation that follows a particular logic. If along the line, the user would like to know about the wheather, then another dialog would be needed. Or if the user wants the bot to get stock market reports, similarly, another dialog would be needed.

3. Where to Start

To start developing a chatbot you simply need to download the template.This template already is a complete chatbot but just with one dialog.You could just build and run this template and it works perfectly. Just
This is the same way that you have a single form when you start a new web application in visula studio.
So just go ahead to download the template from this link. It is in a zip file. Unzip it. Open it in Visual Studio and then run it.
To actually test it by sending messages and getting response, you need an Emulator Applicaiton. You can also get it free from here and install.
Refer to this tutorial. Build Your First Chatbot in Visual Studio
Video Lessons here

4. Adding New Dialog

Just like you can add several forms or pages to your web application, you can also add several dialogs to your Chatbot application. Once you have the template with the root dialog, you can simply add new dialog by adding a new class file. You could also copy and modify the existing dialog file

5. Redirecting to Another Dialog

You can easily redirect from on dialog to another in a chatbot application. But unlike web application, you don't use hyperlinks, you simply write the code to do this.
When you redirect to another dialog, you are adding it to the stack. This means that you could return to the previous dialog later.
I would tell you two ways to redirect to another dialog:

1. Using This simple calls the new child dialog without passing any parameter

2. Using context.forward(): This method allows us to invoke a child dialog and pass an a parameter (which could be a message) to the dialog.
The syntax is:

        public async Task None(IDialogContext context, IAwaitable<IMessageActivity> message, LuisResult result)
            var qnaDialog = new qnaDialog();
            var messageToSend = await message;
            await context.Forward(qnaDialog, AfterQNADialog, messageToSend, CancellationToken.None);
Listing 1: Foward message to a QnaDialog

This Listing 1 shows how to foward the program execution to a QNA Dialog when no intent is identified by the LUIS dialog

Here, a new instace of the qnaDialog is created  and th foward method takes an input message and passes it to the new dialog. It also specifies a callback when the new child dialog is completed. In this case the function to call after completion is AfterQNADialog.

Once it has completed, the AfterQNADialog will call context.Done.

Bot Framework Tutorial Series

6. Final Notes

Just as I mentioned previously, I just want to provide an overview of what Dialogs are and I hope by now you clearly understand the concept of dialogs. I would say you should take some time to do the beginners tutorials on MS Bot Framework so it becomes clearer to you.

Thank you for learning and if you have any comments or questions, leave them in the comments box below or by the left of this page.

Sunday, 8 July 2018

Build Your First ChatBot in Visual Studio (Step by Step)

In this short tutorial, I would take through how to build a chatbot using Visual Studio (2015 or 2017).
Watch my video on How to Create a Bot Application using Bot Framework here

  1. Have Visual Studio Installed
  2. Download the Bot Application Template from this link (Skip this if you are using Visual Studio 2017)
  3. Download the Bot Framework Emulator from this link

Step 1: Unzip the Bot Template you donwloaded into this location:
C:\Users\Kindson\Documents\Visual Studio 2013\Templates\ProjectTemplates\Visual C#

Note: Skip Step 1 if you are using Visual Studio 2017. Visual Studio 2017 comes with a Bot Template.

Step 2: Create a Bot Application in Visual Studio. You can find Bot Application as shown in Figure1

Figure 1: Bot Application

Step 3: Run the Application. Click on the Run button to run the project. If you are successful, the browser window opens showing  the url of the application. Look at Figure 2.

Figure 2:Successful run of Bot Application

Step 4: Test in an Emulator: The Bot Framework Emulator simulates application like skype or any other messenger application.
Open the Emulator. It opens as show in Figure 3

Figure 3: Bot Framework Emulator

Enter the messages endpoint as shown in the Figure 3. Make sure the port number matches the one on your browser.
  • Microsoft App Id: Leave it blank
  • Microsoft App Password: Leave it blank
  • Locale: Leave it blank

Click on Connect

Now you can enter a message and hurray!!! you will have a response.

So this is how to create your First Chatbot in Visual Studio using Bot Template.

In the next post we would talk about how to troubleshoot common errors that may occur along the line. If you received any errors, mention it in the comment box below.

Watch my video on How to Create a Bot Application using Bot Framework here

Friday, 6 July 2018

How to Generate Stored Procedure Automatically From Visual Studio

In this tutorial, I would explain to you how to generate stored automatically from Visual Studio without having to write codes.

Let's assume you have created the customer's table in Microsoft SQL Server Managment Studio. This is shown in Figure 1.

Figure 1: Customer Table Definition in MS SQL Server

You already know that it would not be easy to write the four stored procedure by writing it manually. So we are going to generate the following stored procedures:
  • SelectCustomer
  • UpdtateCustomer
  • DeleteCustomer
  • InsertCustomer
Now  let's go to Visual Studio and start a new project. I have already done that. I am using Visual Studio 2015, but it woud also work with Visual Studio 2013 and 2017.

  1. Right-click on your project and click on you project in Solution Explore
  2. Click on Add New Item
  3. In the dialog box, Select Data on the left and Select DataSet from the list of items. This is shown in Figure 2

Figure 2: Add a DataSet to the Project

Give it a name and Click on Add ( i named it CustomerData). The DataSet designer opens as shown in Figure 4.

Right-click and select Add, the Select Table Adapter

In the Database Connection window, Create a New database connection (or you can select from existing on if you have previously created one.

Click on Next. Click on Next again

Choose Create new Stored Procedure and Click Next

In this window, you can either enter the statement (Select * From Customers) or you can use the Query Builder to select the table

If you click on Query Builder, you can select the Customers table and then click on the * for select all columns

After you close the Query Builder, you should have your Select Statement written out for you.

Click on Next
The next window gives you  option to name your stored procedures

About Naming Stored Procedures

Use SelectCustomers, InsertCustomer, UpdateCustomer and DeleteCustomer. Note that the select statement uses the plural for the tablename while others uses singular
This is because, the select statements selects multiple records at a time which other procedures executes on a single record at a time.

Click on Next

Make sure to uncheck the last checkbox that says: 'Create methods to send updates directly to the database'

Click on Next to generate the stored procedurs

If you have gotten to this point, then all the four store procedure has been generated in the SQL database.
Click on Finish to go back to the Dataset designer window

Now, let's head back to SQL Server Managment Studio to check is the Stored Procudures have been generated.
If you expand the databases > Programability > Stored Procedures, you will see that the four procedures have been created successfully.

So this is how to generate all your stored procdures from Visual Studio withou writing a single line of code! Isn't ist amazing!

Do leave a comment if you have any challenges following this tutorial

Saturday, 23 June 2018

AngularJS Tutorial for Beginers - Course Outline - 1

Hello! Today, you have a choice to make: whether or not to learn the most popular programming/scripting language in the world, AngularJS. This tutorial is for you. But let me tell you a bit about how this tutorial is organized.

This AngularJS tutorial we are about to kick-off is specially designed for beginners who want to learn a new technology in a very simple and easy way.

The interesting thing about AngularJS is the fact that it was developed by Google, not for experts but for everyone. You don't need any programming knowledge to learn and use it.

This Tutorial is based on 3 strategies: Practice-While-Learning, Small bit a a time and Homeworks.

1. Practice-While-Learning Approach
This tutorial uses a PWL approach. PWL means Practice While Learning, which means that all through the period, you will have examples to practice what you learn. Sample code would be provided for you

2. Small Bit of Lessons at a Time
This means that the lessons would be short and interesting. Every week there would be a new lesson, you simply need to follow 'Learn Computer Programming' to get notified each time a lesson it published. The idea is that you will learn without any much effort.

3. Homeworks
At the end of each lesson, you will have a homework. This homework is expected to be completed in one week, that is before the next lesson is published.

The Course Outline - 1
  1. Introduction to AngularJS
  2. Setting up AngularJS
  3. Your First AngularJS Application
  4. AngularJS Directives
  5. AngularJS Expressions
  6. AngularJS Controllers
  7. AngularJS Filters
  8. Working With Tables in AngularJS
  9. Document Object Model in AngularJS
  10. AngularJS Modules
  11. Working With Forms in AngularJS
  12. AngularJS Includes
  13. Views in AngularJS
  14. Scopes in AngularJS
  15. Services in AngularJS

We hope that in 15 classes we would cover this outline and trust me, you will be amazed by the time we are done, that you have learnt something new without difficulty.

For those that already have a knowledge of AngularJS, I'm working on another outline, that is the AngularJS Tutorial for Programmers - Course Outline 2. This is for programmers who already have a knowledge of AngularJS

I will be relating with a number of programmers to complete this ouline and start the lessons.

Then the last tutorial would be AngularJS Tutorials for SharePoint Developers. This is very interesting because it allows you to build complete intranet applications if you already have access to SharePoint and have some knowledge of SharePoint.

For this third group, I would challenge you to examine the following lessons and give me recommendations on what you expect as a SharePoint Developer

So let's brace up, as the first part of the series 'AngularJS Tutorial For Beginners - Course Outline -1' kicks  off by by the first week of July 2018.

If you have any comments or recommendations, leave it in the comment box below or to the left of this page.

Thanks and best wishes in your learning effort!

Thursday, 21 June 2018

Introduction to Partial F-Test in Multivariate Linear Regression

Good to see the effor you put in learning. I assure you that these concepts are really not hard to understand.

In this lesson, I would explain the concept of Partial F-Test in Multivariate Linear Regression. I assume that you have a basic knowlege of Regression, so we would begin with a recap of Multivariate Linear Regression.
I would try to keep it very simple and clear.

  1. What is Multivariate Linear Regression?
  2. What is Partial F-Test?
  3. How it Works
  4. The Fchange Statistic
  5. Automated Modelling Algorithms
  6. Final Notes

1. What is Multivariate Linear Regression?

Remember that Regression has to do with trying to find a relationship between the one variable called the dependent(or target) variable and the independent variable (or explanatory variable).
The typical equation of a regression, is the slope of the linear regression line given by:

y = ax + b

where y is the dependent variable
x is the independent variable

In case of multivariate linear regression, the dependent variable y depends on two or more independent variables x1, x2, ..., xn

The equation for multivariate linear regression is given as:

y = β0 + β1x1 + ... + βkxk + Ɛ

where x1, x2,...xn are the independent variables
y is the dependent variable
 β0, β1,...βn are the regression coefficients
Ɛ is the error term

2. What is Partial F-Test?

Partial F-Test is a statistical analysis used in multivariate linear regression to determine independent variables are to be considered when fitting a multivariate linear regression model. In order words, how many variables are to be considered to create a good fit.
This is neccessary because if too many variables are considered,  then the model would be too complex. On the other hand, if too few variables are used, then we may get a very weak fit.

3. How it works

Let's assume that we have created a multivariate regression model between the dependent variable y and the independent variables x1, x2,...xf.
Now we would like to improve the model by adding additional independent variables xf+1, xf+2,...xk.
The question would be if continuously adding these additional variable would improve the model or not.
So we are going to perform a Partial F-Test to determine this and we need to calculate the Fchange statistic.

Learn about Overfitting and Underfitting here.

4. The Fchange Statistic

In this session, I'm going to explain how to carry out the partial F-Test.
The first step is to assume a that the the model would not be improved by adding new independent variables to the model. This assumption would be out null hypothesis.This would also mean that the coefficients of terms in the regression model containing the additional variables would be equal to zero.
So, first, we can state the null hypothesis this way:

H0 : βf+1 = βf+2 = ... = βk = 0 (additional new variables does not improve the model)

This is our null hypothesis

The next step is to calculate the Fchange statistics

n-k-1 is the degrees of freedom
vi is part of the correlation coefficient of the ith variable
q is the number of variable left out (q = k - f)

If the calculated Fchange follows an F distribution  with q and n-1-k degrees of freedom if the null hypothesis is true.

If the level of significance is high enough, then we can accept the null hypothesis that the additional variables does not add any improvement to the model.
However if the level of significance is close to 0, then additional variables needs to be included in the model

5. Automated Modelling Algorithms

 Just as you may have figured out, it would be very difficult to perform the Partial F-Test manually. You're right!. This is especially true if the number of independent variables being considered is much. So there are automated methods that can be used to produce a good fit.

How they work: These algorithms sort out the list of independent variables in the final model according to different variables in the final model according to different strategies.

(a) Foward Selection

This algorithm follows two steps
Step 1: In this step, a list of independent variables are selected that have highest correlation coefficients in the absolute value with the dependent variable. Then calculate the F-statistic for linear regression with this variable to see if a strong linear relationship has resulted in a measurable fitting. If the test is significant, then then the algorithm stops. Otherwise, continue to Step 2

Step 2: Take the next variable that have the highest partial correlation coefficient among the residues with the dependent variable. Also calculate the F-statistic with this new variable. If the calculated value is larger than a required set value, then we take another variable and repeat Step 2. Otherwise the algorithm stops.

(b) Backward Elimination

This algorithm is the opposite of the Foward Selection procedure.
In this case, we begin by including all the variables from the start into the regression model. Then we select the least suitable ones.
Beginning wiht the smallest variable, we examine beta coefficient variable and the F-statistic of the reduced model.
F must be larger than the required set minimum value. If after the reduction step, the criteria is no longer reached,, then the reduced model will be the final result

Stepwise Selection

This algorithm tends to combine the features of the Foward Selection and Backward Elminiation. It repeatedly adds or removed variables from the list of independent variables. So the algorithm terminates either based on a minimum or a maximum set treshhold.
If the F-statistic or the significance level is exited fromthe interval, the the algorithm stops.

6. Final Notes

I would make a video explanation of the Partial F-Test in Multivariate Linear Regression. Also check my Channel for Lessons of Introduction to Linear Regression and Introduction Multivariate Linear Regression as well.
If you have any challenges following this lesson let me know in the comment box below or by the left of this page.
Thank you for reading and welldone for your efforts in learning Partial F-Test in Multivariate Linear Regression Analysis.

Watch Linear Regression Video lesson here

Wednesday, 20 June 2018

How to Perform One-Sample t-Test Step by Step

I am going to teach you how to perform one-sample t-Test, step by step using an example

A research believes that the mean incom of workers in Akokwa town is 20,000 dollars. She wants to test this hypothesis against the alternative that the mean income is not equal to 20,000 naira. A random sample of 9 workers in the town is chosen, and their incomes (in dollars) turn out to be the following:
24000, 13400, 18400, 22900, 13800, 8200, 11100, 9300, 14600.

(a) If the sociologist set a 5 percent significance level, should she accept or reject this hypothesis?
(b) If she sets a 1 percent significance level, shoud she accept or reject the null hypothesis.

We would start with the first step and progress through the last step. These are the steps:
  • Define the null and alternative hypothesis
  • State the alpha
  • Calculate the degrees of freedom
  • State the decision rule (acceptance criteria)
  • Calculate the test statistic
  • State the results
  • State your conclusion

Step 1: State the null and alternate hypothesis
The null hypothesis is the what is currently believed and needs to be tested. From the question the believe is that the mean income is 20,000 naira. Therefore

H0: mean = 20,000 naira
H1: mean ≠ 20,000 naira

Step 2: State the alpha
From the question, we have 5% significance level. This is the same as alpha level of 0.05 ( or 95% confidence). So we can state the alpha level as:
α = 0.05

Step 3: Calculate the degrees of freedom
Degrees of freedom is given as df = n-1. So in this case:
Degrees of freedom df = 9 -1 = 8

Step 4: State the decision rule (acceptance criteria)
To do this we need to look up the critical value of the t from table of t-distribution or t-Table.
In the table, the alpha is placed in the column while the degrees of freedom are on the rows. In our case we look for df of 8 under 0.05. It is normally written as:
Tcrit or T8, 0.05 = 2.306

Decision rule: Accept the null hypothesis if absolute value of t is less than 2.306 or greater than -2.306

We can do a little sketch to illustrate this in Figure 1.0. Note that the 5% has been devided into two parts of 2.5% on both sides of the graph

Figure 1.0: Sketch of Accept and Reject region

5. Calculate the test Statistic (Absolute value of t)
The formula for the test statistic is given by:

xn is the mean(we need to calculate it)
μ0 is the mean to be tested (given in the question - 20,000)
sd is the standard deviation of the sample (we need to calculate it)
n is the number of elements,  in this case it is 9

xn = Sum/9

Sum =  24000+ 13400+18400+22900+13800+ 8200+11100+9300+14600
Sum =  135700
xn = 135700/9 = 15077.78
sd =  5623.78

Note: Standard deviation is sum of squared deviation divided by n.

Using the formula for t, we can calculate the value of t as:

Step 6: State your results
The value of t is -546.93 and the decision rule states that the the null hypothesis is accepted if the value of t is less than 2.306 or greater than -2.306.
In this class t is not greater than 2.306 and therefore, the test has failed.
Result: We reject the null hypothesis, H0

Step 7: State your conclusion
From the result, our conclusion would be that the mean income of workers in Akowa town is not 20,000 dollars

If you follow these steps in your statistics quiz, then you are sure to make the maximum grade for How to Perform t-Test.
Success in your exams and thanks for your efforts!

Tuesday, 19 June 2018

Is Time Travel Possible? Does Time Machines Exist? - An Expert Explains

I have seen many videos of persons claiming to be time travelers from different years telling their stories. Some say they came from 2100, 4897, 2050, 2935 etc. And they tell stories of what the future looks like: cars driving themselves, no governments, almost everything being automated, digital cities and so on.

First all this are imaginations or rather science fiction videos simple meant to entertain the viewers, and I do think that that is exactly what they are meant for, entertainment.

This is similar to the movie, Interstellar that featured, the actor John Cooper traveling with his team to a distant planet and by the time he came back, his daughter have become so advanced, looking like a grandmother to him.

Now on the real side of things, science has theoretically proven that it is possible to travel forward in time. Forward in time to the extent of a fraction of a second.  This is possible due to Newton's theory of relativity.

The Principle of Causality
This is also called the Cause and Effect principle. It simply states that observed or unobserved effects have causes.
In simple terms, if the device you are reading this blog with is put into fire(cause), then it gets damaged (effect). Another example is that if there is an egg, then it was laid by a chicken.

Let's now apply this to Time Travel. Now, let's assume Time Travel is possible. Let's say today, you have a car you have been using for 6 years, say from 2012 to 2018. So today being June 2018, you  decide to travel six years back in time to when the car was new, that is say January 2012. As you traveled back you met some angry youths set the car ablaze and burnt it to ashes. Afraid of your life, you decide to travel back to June 2018.

The question now is:
Where is the car? Remember you have used this car for six years. Now if it was set ablaze in January 2012, then it is not possible you have been using it from 2012 to 2018.
In order words, the principle of causality has it that if the car has been used for six years (effect), then it have existed for years(cause). You simply can't change it, you can't go back in time to remove remove the cause of an already effected outcome.

Therefore, the universal principle of causality makes in impossible to travel back and forth in time.

Other important questions I would discuss are:

  • Is it possible to travel to the future, just to see how it would look like, without doing anything there?
  • How would the world be like in 200 years time, say in 2218?
  • Are there some living creatures in other worlds?
  • Are aliens real? Do they exist?
  • Are there some humans with super powers (that is superheroes)?

I would discuss these topics and you would know the clear answer to them. If you have any other question you would like me to answer, leave it in the comment box below.

Monday, 18 June 2018

What is Heteroscedasticity and Multicolinearity in Regression Analysis?

Hello! Good to see you. I'll be very happy to explain to you these two important concepts
  • Multicolinearity
  • Heteroscedasticity

These are big words, but trust me they are very easy to understand. Normally, these two terms are not directly related but they are two situations that create some problem during regression analysis.

So just as the two words are difficult to pronouce, they also make analysis procedure difficult!

As you know, I normally explain maths and statistics concepts in a simple way to allow you clearly understand it and be able to present it in your own words during a quiz or an exam.

What is Heteroscedasticity?
Heteroscedasticity  means unequal scatter. This means that the  variability(or scatter) of a variable is unequal accross the range of values of the other variable that is used to predict it.
It is a systematic change in spread of the residual over the range of the measured values. This is illustrated in Figure 1.0

The Problem of Heteroscedasticity
In Linear Regression, it is assumed that the measured error (or residual) between the measured value and the regression line maintains a constant variance (homoscedasticity).
But in the case of heteroscedasticity, the residuals have an unequal variance which makes the regression assumption unreliable.

What is Multicolinearity?
To understand this term, you need to keep in mind that in multivariate linear regression,  the objective is to examine the effect of more than one independent variable, x1, x2,... xk on a dependent variable Y.
So the regression cuntion would be as shown below:

Y is the dependent variable
x1, x2,... xk are the independent variables
Ɛ is the error term

Now, these independent  variables not only affects the dependent variable but also either a strenghtening or weakening effect on each other. Therefore, multicolinearity means the linear correlation relationship between two or more independent variables.

Effects of Multicolinearity
Multicolinearity has unpleasant effects which are highlighted below:
the independent variables may assume each other's role
the effect of the independent variables on the dependent variable cannot be distinguished
estimation of the regression coefficients is rendered unreliable
In some cases, analysis becomes very difficult to perform

Measure of Multicolinearity
There are three metrics used to examine the effects of multicolinearity. They are:

  • VIF (Variation Inflation Factor)
  • Tolerance
  • Condition Indices

Variation Inflation Factor (VIF): The VIF shows the actual variance of the estimated coefficient of the ith variable, how many variables after this variable would be without the effect of muticolinearity. Same is calculated for the other variables.
if the VIF is between 1 and 2 , then there is strong multicolinearity. If betwee 2 to 5, then it is very. For values above 5, there is very high multicolinearity.

VIF is given by the formula

Tolerance: This is given as the inverse of the Variation Inflation factor.  A low tolerance value indicates an increasing multicolinearity

Condition Index: This is a measure of the relative amount of variation associated with an eigenvalue. Large value of the condition index indicates a high degree of collinearity.

Quiz: Try to read more on Condition Indices and write the formula. Leave a comment if you succeed in doing this.

For now, I would like to thank you for reading

What is Coefficient of Determination in Linear Regression

If you are learning linear regression, then you need to clearly understand the concept of Coefficient of Determination R2 and the Adjusted Coefficient of Determination R2adj.

I am going to explain these concepts in a very easy way.

We are going to cover the following:

  1. What is Coefficient of Determination
  2. Properties of Coefficient of Determination
  3. Adjusted Coefficient of Determination
  4. Final Notes

1. What is Cofficient of Determination?

The coefficient of determination, is used to determin the proportion of the variation of one of the variables  that is predictable from the other variable.

Look at the table below. What do you think is the relationship between X and Y?
It seems that Y equal X/2. But carefully looking at the table we see that this is not exactly true for two of the data points.

But we can say that 80% of the time, Y is X/2. This means that the coefficient of determination is 0.8 (or 80%)


Table 1: For 8 out of the 10 points, y=x/2

The coefficient of determination is a measure of how certain we are in making predictionsf rom a certain model. It determines the ratio of the explained variation to the total variation.

The value of R2 ranges from 0 to 1, that it:
0 < R2 < 1

It denotes the strength of the linear association between x and y. When we are using a line of best fit, then the coefficient of determination represents the percent of the data that is closest to the line of best fit.

For example, if R = 0.89 then R2 = 0.792 which means that 79.2% of the total variation in y can be explained bz the linear relationship between y and x (as described by the regression equation, in our case it is y = x/2.

The other 20.8% of the variation remains unexplained.

So we can say that the coefficient of determination is a measure of how well the regression line represents the data.

Formula for R2 is given by:

2. Properties of Coefficient of Determination

Let's now outline some of the properties of R2  that you need to know. To get used to these properties, take some time to write then out in your note.

0 ≤ R2  ≤  1  if  f(X) = r(X) = E(Y | X)
if X and Y are independent, then R2 = 0
if Y = f(X) then R2 = 1
if f(X) = a*X + b* then the theoretical linear regression is given by R2 = (R(X,Y))2
if the joint distribution of X and Z is normal, then R2 = (R(X,Y))2

3. Adusted Coefficient of Determination R2adj

Just like the Coefficient of determination, the adjusted Coefficient of Determination R2adj is used to determine how well a multiple regression equation fits the sample data.

The difference between R2 and R2adj is that R2 increases automaticallz as new independent variables are added to the regression equation even if they don't contribute to any new explanatory power to the equation.
However the R2adj increases ONLY IF the new independed variables added, increases the explanatory power of the regression equation. Theis makes the R2adj more reliable in measuring how well a multiple regression equation fits the sample data.

4. Final Notes

I hope this brief discussion have helped you understand the concept of Coefficient of Determination and Adjusted Coefficient of Determination as it applies to Regression Analysis. Especially take not of the difference betweent the two as this always appears in statistics quiz and exams.

Thank you for reading and remember to leave a comment below if you have any challenges following the explanation.

Monday, 11 June 2018

How to Perform Mann-Witney U Test(Step by Step) - Hypothesis Testing

This is an interesing procedure and really very straightforward if you would follow through. I happy to know that you are making effort to learn hypothesis testing.

Feel free to leave me a comment below if you have any challenges. I would be happy to give you needed support.

  1. What is Mann-Whitney u-Test?
  2. Exercise 1
  3. Solution Steps

1. What is Mann-Whitney u-Test?

Mann-Whitney u-Test is a non-parametric test used to test whether two independent samples were selected from population having the same distribution. Another name for the Mann-Whitney U Test is Wilcoxon Rank Sum Test.
Note: This is not the same as Wilcoxon Signed Rank Test which is used for dependent samplest.

Just as you know, the easiest way to understand as statistical test is to just perform the test yourself. So now we are going to go through an example and be sure to follow along with a pen,  notebook and a calculator. It is really easy and fun!

2. Excercise

A  researcher gave an aptitude test to 24 respondents, 12 were men and 12 of them were women. He recorded the scores for each of the responded and tabulated it in the table below:


Use thid data provided to test the null hypothesis that the distribution of scores is the same for men as for women. Use a significance level of 0.05. (Use the Wilcoxon Rank Sum Test)

3. Solution Steps

We would follow the step by step procedure. We would also use excel to tabulate our data to make it easier to perform calculation.

Step 1: State the null and alternate hypothesis and rejection criteria

The null hypothesis states that median difference between the pairs ranks of the observations is zero (that is there is no difference in the ranks of the two pairs of observations) and the alternate hypothesis states that the median difference between the ranks of data is not zero.

These are stated below.

H0: μm = μw
H1: μm ≠ μw

Alpha = 0.05

Rejection Criteria: Reject the null hypothesis if Ustat < Ucrit

Step 2: Perform a ranking of all the observation

In this case, we simply subtract corresponding pairs of the data.
We would used MS Excel to help us do it faster. So, I have transfered the table excel and the added a column for the difference in the two pairs of observations.

Total number of observation is 24, so we need to rank the observations from 1 to 24. In the figure, I have colored the observation for Men as blue and for Women as red. This is so that, when I sort the data, I would know which one belongs to which group.

I sorted the data to make it easy to assign the ranks to each of the observation.

Step 3: Calculate the Rank Sums

We calculate the sum of the ranks for the two groups

Sum for Men (n1) = 10+9+20+3+13+14+23+8+11+15+5+1 = 132

Sum for Women (n2) = 12=16+18+19+21+7+6+4+17+24+2+22 = 168

Step 4: Calculate the U Statistic for the Two Groups

The formula the the U statistic is given as:

For the Men observation, we have

We also calculate the U for the Women observation, we have
The U-stat is the smaller value of the two and that would be

Ustat = 66

Step 5:Determine the Critical value from Table

From Mann-Withney u-test table, we check the value under column 12 and row 12
We have a critical value of U to be

Ucrit = 37

Since the calculated value of U is greater than the critical value, we accept the null hypothesis and agree that the two groups are the same.

Thanks for learning. Leave a comment if you have any challenges following this lesson.

How to Perform Wald-Wolfowitz Test - Testing for Homogeneity with Run Test

Hello! Good to see your interest in learning statistics.

Today, we are going to go through the steps of performing the Walf-Wolfowitz run test. Remember, the easiest way to understand hypothesis testing is to solve an example. So intead of boring you with explanations, we would solve an example together. I would be explaining as we solve.

  1. What is Wald-Wolfowitz Test?
  2. Formula for the Wald-Wolfowitz Test?
  3. Exercise 1
  4. Solution Steps

What is Wald-Wolfowitz Test?

Wald-Wolfowitz Test (also called Wald-Wolfowitz run test) is a non-parametric hypothesis test used to test the randomness of a two-valued data sequence. It tests to see if the sequence are mutually independent.

Formula for Wald-Wolfowitz Test

The three formulas for the Wald-Wolfowitz run test  are given below:

where the mean is given by the formula

and the variance is given by the formula

Note: Variance is the square of the standard deviation. So we calculated variance. To get the standard deviation, we must take the square root of the variance.

Let's now solve an example!

Example 1

There are two IVM(Innoson Vehicle Manufacturers) buses, one with 48 passengers,  and another with 38 passengers.
Let X and Y denote the number of miles travelled per day for the 48-passenger and 38-passenger buses respectively. Innoson would like to test the equality of the two distributions.
That is, if:

H0: F(z) = G(z)

The company observed the following data on a random sample of n1 = 10 buses carrying 48 passengers and n2 = 11 buses carying 38 passengers.

X: 104 253 300 308 315 323 331 396 414 452
Y:  184 196 197 248 260 279 355 386 393 432 450

Using normal approximation to R, conduct a Wald-Wolfowitz test at 0.05 level of significance


We would solve this problem step by step.

Step 1: State the null and the alternate hypothesis and rejection criteria

H0: F(z) = G(z)
H1: F(z) = G(z)

Rejection criteria: Reject the null hypothesis if

Step 2: Merge the two lists and sort in ascending order

104  184  196  197  248  253  260  279  300  308  315  331  355  386  393  394  414  432  450  452

Step 3: Count the number of runs: R, n1 and n2

Number of runs R = 9
n1 = 10
n2 =  11

Step 4: Calculate the mean

We calculate the mean using the formular and we have the results below

Step 5: Calculate the variance

We calculation the variance using the calculation steps below

Step 6: Calculate Z

We calculate the value of Z following the formula below:

Note: Used 9.5 intead of 9  because we applied half-unit correction for continuity

Step 7: Draw your conclusion

We fail to reject the null hypothesis ast the 0.05 level because the P value is greater thatn 0.05. This means that there is not sufficient evidence at 0.05 level to conclude that the two distribution functions are not equal.

Thanks for your effort in learning statistics. If you have any challenge, let me know in the comment box below.

Thursday, 7 June 2018

Advanced Statistics Quiz 11 - Cluster Analysis

Good to see you here!
Today's quiz would be based on Cluster Analysis. So let's get started.

Question 1: What is Cluster Analysis?
Cluster Analysis is the statistical procedure that is aimed at grouping data object basedon the information found in the data set that describes the objects and their attributes

Question 2: What is the Goal of Cluster Analysis?
The objective of cluster analysis ia to group objects with similar characteristics into one cluster.

Question 3: What are the two types of Clustering?
The two types of clustering are:
Hierarchical Clustering: Clusters are arranged in a hierarchical tree
Partitioning Clustering: Data are grouped into distinct subsets that does not overlap

Question 4: Describe the k-Means Clustering
K-Means clustering is a partitioning clustering approach where each cluster is  associated with a centroid or center point and each data point is assigned to  the centroid that is closest to it. The number of clusters is specified in advance.

Question 5: Write the k-Means Clustering Algorithm?
i. Choose the initial value of K
ii. repeat
iii. Form K clusters by assigning each point to the closest centroid
iv. Recalculate the centroid of each cluster
v. Move the centroid to the new computed position
vi. until The centroids position don't change

Question 6: How do you Choose Initial Value of K for k-Means Clustering
  • Use another clustering method to estimate it
  • Run the algorithm with different values of K and then choose the one that is optimal
  • Use the prior knowledge about the characteristics of the data

Question 7: How do you choose the centroid for the cluster?
  • Random selection from the feature space
  • Random selection from the data set
  • Look for dense regions of space
  • Space them uniformly around the feature space

Question 8:  How is the quality of a cluster measured?
  • The size of the cluster vs the distance betweent the clusters
  • The Distance between members of the clusters
  • Teh Diameter of the smallest sphere

Question 9: What are some limitations of k-Means Clustering?
Not efficient if data contains outliers
Fails for non-convex round clusters

Question 9: What is McQueen's Algorithm used for?
The McQueen's Algorithm is used for measuring the goodness of the clustering and for minimizing the compactness function in finite steps

Question 10: Outline and explain the two types of Hierarchical Clustering
The two types of hierarchical clustering are:
Top-Down Clustering
Bottom-Top Clustering

How Bottom-Top or Agglomerative Clustering work
  • Start with each of the data points in its own cluster
  • Merge two clusters that are similar
  • Repeat the merging untill there is a single cluster of allt he data points

How Top-Down or Divisive Clustering Work
  • Start with all examples in one big cluster
  • Remove the data point that seems to far away from other points
  • Repeat the process untill all points is in its own cluster

Question 11: Mention three ways to compute dissimilarity between clusters
  • Single Link
  • Complete Link
  • Group Average

Question 12: Compare k-Means and Hierarchical Clustering
k-Means produces single partition while hierarchical produces different partitions
k-Means needs the number of clusters specified in advance while hierarchical does not
k-Means is have a more efficient run-time than the hierarchical

Question 13: What is a Dendrogram?
A dendrogram is a tree diagram used to illustrate the arrangement of clusters in hierarchical clustering.

I would stop here so I can allow you some time to get your head around these concepts.
Thank you for reading.!
Feel free to check out the quiz on other Statistics topics.