Wednesday, 5 December 2018

20 Cool Machine Learning and Data Science Concepts (Simple Definitions)

Hello everyone, as you know, I'm Kindson The Genius. I would like to share with you these 20 cool Machine Learning and Data Science Concept as well as a brief explanation of each.



I assume you are Learning Machine Learning and I would like to encourage you to continue learning and don't give up, even if it appears a bit tough initially. Just keep moving. At the end your efforts will pay off.

I made this article because it is easier to read and understand a concept if you already have an overview of what it is all about.

Feel free to reach me via my website www.kindsonthegenius.com

1. Supervised Learning: This is a class of Machine Learning problem where the a dataset is provided(training dataset) and each observation has a corresponding class or label or target vector

2. Unsupervised Learning: This is a class of Machine Learning problem where a dataset is provided but no classes are provided.

3. Reinforcement Learning: This is a more recent class of Machine Learning problem developed in 1998 and is concerned with finding the best action to take in a given situation to maximize the expected outcome. One application  area is reinforcement learning is in Game Playing where the objective at each point is to take an action that increases the changes of wining.

4. Deep Learning: This is a class of machine learning algorithms which may be either supervised or unsupervised, that cascades multiple layers for computational units to extract features from input data. Deep Learning is a subset of Machine Learning. Also check Deep Neural Networks



5. Ensemble Learning: This is a learning process that tries to combine a number of learning algorithms to improve the performance of a model

6. Classification: Classification is a supervised learning task that attempts to classify series of observation into two or more classes. In classification, a dataset is of observation is provided along with the classes. The goal is to create a model using the given dataset that would be used to predict the class of a new observation given without the class. When there are just two classes (say 1 and 0), then it is called binary classification.

7. Regression: This is a class of supervised learning task similar to classification, but in this case, the task is the find the function that relates the feature set to the classes. Example: given set of X = X1, X2, . . . , Xp for p features and Y = Y1, Y2, . . . , Yn, the goal is to find f(.) such that f(X) = Y

8. Clustering (Cluster Analysis): Clustering is a class of unsupervised learning tasks that is concerned with finding subsets or clusters within a dataset. We try to group elements in such a way that elements within the same cluster have certain similarity while being different from elements outside the cluster.  Two main clustering methods are:

9. Dimensionality Reduction: This is a statistical process of reducing the number or variables being considered to a lower number of variables that can be used to explain the given dataset.  Types of dimensionality reduction include:
  • feature extraction
  • feature selection.



10. Principal Components Analysis (PCA): PCA is  a statistical technique used to map data from a higher dimensional space to a lower-dimensional space such that the lower dimensional data represents that maximum variance in the original data. In PCA, the principal components are obtain by a process of eigen-decomposition.

11. Singular Value Decomposition(SVD): This is a more recent and powerful method of dimensionality reduction where the original dataset is factorized or decomposed into 'singular values'.
Given a matrix X with is an m x n matrix, then we can decompose X into three matrises:
X = UΣV*
where:

  • X is the original matrix
  • U is m x m unitary matrix
  • Σ is m x n diagonal matrix
  • V* is n x n unitary matrix


12. Support Vector Machine(SVM): Support Vector Machine is a tool used for classification. SVM performs classification by determining a line or a plane (hyperplane) that separates the data sets into tow classes such that margin between the classes is maximum.

13. Hyperplane: A hyperplane is defined as a plane that is one dimension less than the ambient plane. This simply means that if we have set of points in a 2d plot, then the hyperplane would be a 1d line. Similarly the hyperplane for a 3d space is a 2d plane. And so one

14. Neural Network: This is also called Artificial Neural Network (ANN). This is a network made up of interconnection of nodes called neurons  and try to mimic the functioning of the neural network (nervous system) of living things. Node in neural network is not scattered randomly but arranged  in different layer as shown in the Figure

  • the first layer is the input layer
  • the last layer is the output layer
  • in between, there are 1 or more layers called hidden layers





15. Neuron: This is the basic computing unit of a Neural Network. The neuron represents the edges of the neural network and is used to store pieces of information.

16. Perceptron: A perceptron is the simplest neural network you can think of. It is a neural network made up of a single node.  A perceptron has input (or set of input), a neuron, and an output(or set of inputs)

17. Deep Neural Network: An neural network is referred to as a deep neural network if it is made up of more than on hidden layer.

18. Recurrent Neural Network(RNN): This is a class of neural networks where the edges connecting the node forms a directed graph with traceable sequence. This property makes RNN suitable for modelling temporal behavior.

19. Constitutional Neural Network (CNN): CNN is a class of neural networks that is a bit more complex and are applied in image recognition and analysis.



20. Activation Function: Activation Function are the function that produces the output of a neuron in a neural network. The inputs, the weights and the bias are passed into an activation function. The activation function fires if a threshold is reached.
Some common activation function includes:

  • Step Function
  • Linear Function
  • Sigmoid Function
  • Hyperbolic Tan Function(Tanh)
  • Rectified Linear Unit(ReLU)






Sunday, 25 November 2018

Simple Explanation of Tensors 1 - An Introduction

I have spent some time trying to understand tensors but I have spend more time trying to find an simple way to explain it to you. 
The fact remains that to understand tensors, you need to take some time to think. But now I would try to make it easy be breaking it down into different chunks of short tutorials.



What you are reading now is the very first part (Simple Explanation of Tensors 1 - An Introduction). So you have a good start. In this first part, we would cover the following:


  1. What is a Tensor(A Simple Definition)
  2. Note About Representation
  3. Review of Vectors
  4. Vector Transformation
  5. Representing Transformations


1. What is a Tensor(A Simple Definition)


Tensors are a type of data structure used in machine learning to represent various kinds of objects including  scalars, vectors, arrays, matrices and other tensors.
Some define tensors as multidimensional arrays. We would use this definition later, when we would be performing some tensor operation using Python and R.



2. Note About Representation


We would represent a scalar with a variable in square brackets. For example [a] is a scalar, not just a number.
We would represent the coordinate system using variable and subscripts. For example, the two dimiensional axes would be x1, x2 axis and not x, y. Similarly the 3 dimensional axis would be x1, x2, x3 axis and not x, y z.
The reason for this is that we can't determine the limits of the dimensions of data using 26 letters alphabets.

This is illustrated in the figure:



To represent a scalar we just need just a single number [x1], e. g 15
To represent a vector we need two numbers [x1, x2] e.g [1, 3]
To represent an object in 3 dimension we need [x1, x2, x3] e.g [2, 3, 1]
Same in four dimension and so on

Now also note that a scalar is also a tensor of rank 1 (we would explain this later). A vector can be view as combination of two scalars. So a scalar is like a building block. (Remember from the definition of tensors that tensors are made up of other tensors.




3. Review of Vectors


A vector is normally viewed as a point in  space. For a two dimensional space  we would use two numbers. [2, 3]
For three dimensional space, we would use three numbers [3, 4, 2]
Vectors can also be represented as column of numbers.

What does the numbers represent?
The numbers does not represent distances along the the coordinate system. Rather, they represent direction. That is the endpoint coordinate the vector would reach if it were at the origin. This means that they don't represent movement along the axis but direction in space.




4. Vector Transformation


Transformation refers to the changing of the coordinate system of a vector. In this case, the numbers representing the vectors change.
We would consider a transformation from X to X'.
The coordinate system X' has its origin at (5,0,0) of the first system X.
Assuming we have a vector [1,1,0]. This vector is at point (0,0,0) of the coordinate system X'.
If we transform the coordinates from X' to X, what would happen to the vector [1,1,0]?
What would happen is that the it would remain the same [1,1,0]. It does not change, just the coordinates of its position changes.

Why so?
Simply because, we did not move the vector! You only transformed the coordinate. The vector remains where it  is! Sure this is fair enough.




5. Representing Transformation


Let's represent the transformation of the coordinate with T and  let's call the vector V.
Let the transformation from X to X' be called T'
Let the the transformation from X' to X be called T.
Let's call the vector V' after transformation from X to X'

So we can write all of this in notations as:

  • V' = T'(V) 
  • V = T(V')

You can read this, right?

What have all of this got to do with Tensors?
This is to let you know that there are transformation rules that applies to vectors.
We have played around with scalars and tensors. So I would like to state the following for you

  • Scalars are tensors of rank 
  • Vectors are tensors of rank 1

Then comes the big question.
What the heck is a rank? We would continue this in Tutorial 2.

Note: If you have any challenge following, let me know in the comment box below or to the left of this page.

Saturday, 24 November 2018

What is Principal Component Analysis (PCA) - A Simple Tutorial

In this simple tutorial, I would explain the concept of Principal Components Analysis (PCA) in Machine Learning. I would try to be as simple and clear as possible.
The we would use Python in Tutorial 2 to actually do some of the hands-on, performing principal components analysis.

What is Principal Components Analysis?
Principal Components Analysis is an unsupervised learning class of statistical techniques used to explain data in high dimension using smaller number of variables called the principal components.
In PCA, we compute the principal component and used the to explain the data.

How PCA Work?
Assuming we have a set X made up of n measurements each represented by a set of p features, X1, X2, … , Xp. If we want to plot this data in a 2-dimensional plane, we can plot n measurements using two features at a time. If the number of features are more than 3 or four then plotting this in two dimension will be a challenge as the number of plots would be p(p-1)/2 which would be hard to plot.
We would like to visualize this data in two dimension without losing information contained in the data. This is what PCA allows us to do.



How to Computer Principal Components?
Given a dataset X of dimension n x p, how do we compute the first principal components?
To do this we look for linear combination of the feature values of the form:


that has the largest sample variance subject to the constraint that:


This means that  the first principal component loading vector solves the optimization problem such that we need to maximize the objective function subject to some constraint.
The objective function is given by:


And this is subject to the constraint:




The objective function (function to maximize) can be rewritten as:


Since this also holds:


Therefore the average of z11,..., zn1 will also be zero. Therefor the objective function that is being maximized is simply the sample variance of the n values of zi1.
z11, z2,...,zn1 are referred to as the scores of the first principal component.



How then do we maximize the given objective function? 
We do this by performing eigen decomposition of the covariance matrix. Details of how to perform eigen decomposition is explained here.

Explaining the Principal Components
The loading vector Ф1 with elements Ф11, Ф21,...,Фp1  defines a direction in the feature space along which there is maximum variance in the data.
Thus, if we are to project the n data points x1, x2,..., xn onto this direction, then projected values are the actual principal component scores z11, z21, …, zn1.

After the first principal components, Z1 of the features has been determined, then the second principal component is the linear combination of X1, ,X2,... Xp that has the highest variance out of all the linear combinations that are uncorrelated with Z1. The second principal component scores z12, z22,...,zn2 take the form


where Ф2 is the second principal component loading vector, with elements Ф11, Ф12, … ,Фp2 . It turns out that constraining Z2 to be uncorrelated with Z1 is the same as constraining the direction of Ф2 to be orthogonal to the direction of  Ф1
We would now take an example to see how PCA works.





Thursday, 22 November 2018

Difference Between Prediction and Inference in Machine Learning

Hello friend, I'll like to share with you this brief explanation of the difference between Prediction and Inference. They appear similar, to us researchers and data scientists, the difference should be well-drawn out.
Lets begin with Prediction


Content
1. What it Prediction?
2. What is Inference
3. The Mathematical Modelling




1. What is Prediction?


Let's illustrate with an example.
Assuming an organization wants to conduct a marketing survey. The objective of the survey is to determine the response of people in a given area to an advertising campaign being carried out.
The company gathers the demographic features of the particular area the campaign is to be conducted. This features make up a feature set of the predictors (variables). These variables may include:
  • population density
  • average education level
  • class of people
  • area type(rural, sub-urban, urban)
  • media type to be used etc
The company would like to know if the campaign would either succeed or fail, where success is measured by the response of the people to the campaign based on certain threshold.
The situation like this where the objective is to predict either success or failure (classification) given a set of input variables is an example of prediction modelling.
In this case the focus is not to understand the relationships between individual predictor variable and the output.




2. What is Inference?


Still using the advertising campaign example. If the objective of the survey is to determine the relationship between each individual prediction and the output this would be inference. In this case the following questions would be answered:
  • Which media type have most effect on the response
  • How does the population density affect the  outcome
  • What class of people respond most to the campaign
In this case, the goes is not just to get a classification of success or failure but to infer relationships between predictors.
Let's look at the  mathematical model


3. The Mathematical Model


Let the input predictor variables be X1, … Xp. and the target variable Y.  The we would be interested in estimating how the output Y is affected by variation in X.
The goal is the create a relationship between X and Y such that
f(X) = Y
First we assume a simple approach and the would be that the model is linear. Then we use a parametric method.
Assuming the function f(X) is linear, then we have:


Next we then train the model by using a procedure that finds the values of the βs such that:


Here you can see that the equivalent sign is used which means that the predicted that there exist some error value incurred in the process which we would like to minimize.
More details on this is discussed in Linear Models of  Regression

Wednesday, 21 November 2018

Datarmatics Researchers Decimate Biscuit-and-Banana Puzzle





Dr. Wonu Nduka
Mathematically!😁

Let the biscuit be "x"
let banana be "y" and
Let lollipop be "z"

Such that the pictorial equations can be transformed into algebraic equations,  thus:

x+x+x=30.....(1)
y+y+x=14.....(2)
y+z+z=8.......(3)
'z'+0.5y+0.5y'x'=?.....(4)

Equations 1, 2 &3 could further be reduced to:

3x=30........(5)
x+2y=14....(6)
y+2z=8......(7)

From equation (5) we have:
3x=30
x=10
substitute x=10 in equation (6)
2y+10=14
2y+10-10=14-10
2y=4
y=2

Now, substitute y=2 in equation (7)
y+2z=8
2-2+2z=8-2
2z=6
z=3

To evaluate the expression (4), where x=10, y=2 and z=3.
This is the overarching part of the problem-solving episode.

Considering the time, the fingers of the banana and the number of dots on the biscuits.

If x=10 when dots were 10, then x=7, when dots became 7.

If y=2 when we had two fingers of banana, then y=1 when the fingers of banana reduced to 1.

If z=3, when the time was 3 o'clock, then z=2, when the time became 2 o'clock.

When the logic above is being applied to the expression (4) we have:
👇
y+yx+z
1+1*7+2
1+7+2=10

EVALUATION
Check using the previous equations, with x=10, y=2 and z=3

x+x+x=30.....(1)
y+y+x=14.....(2)
y+z+z=8.......(3)

From equation (1)
10+10+10=30✅

From equation (2)
2+2+10=14✅

From equation (3)
2+3+3=8✅







Prof. Ebi Efebo
We need to think outside the box
Each of the biscuits have 10 dots
3×10=30
Each dot represents 1
2nd equation
Each finger of banana is  1
2+ 2+10=14
Equation 3
2+3+3=8
2 fingers of banana and the 2 clocks at 3 o'clock  each
Equation 4
2 o'clock+ 1 finger of banana + another finger of banana × 7 dots on the biscuits
2+1+1×7
Applying BODMAS
7×1+1+2
7+1+2=10
That is the answer





Saturday, 17 November 2018

Datarmatics International Coding Festival Comming up...

Datarmtics International Coding Festival Coming up in December.
Program would be provided soon.




Thursday, 15 November 2018

How Bubble Sort Algorithm Works (Implementation in Java)

This would be a very simple explanation with the program in Java on how the Bubble Sort Algorithms works.

Bubble Sort works by iterating through the elements of the array and doing pairwise swap of adjacent elements that are out of order.


We would examine the program in Java and then explain it line by line. The we would actually run the program using Netbeans IDE to see how it works.

The java program is explained as follows: There would be three functions in addition to the main methods

BubbleSort(): This is the function that performs the sorting on an array. This function takes an array as parameter.


    //THE BUBBLESORT FUNCTION IN JAVA
    public static void BubbleSort(int[] a)
    {
        int i, j;
        int N = a.length;
        
        for(j=N-1; j>0; j--) {
            for(i=0; i<j; i++){
                if(a[i]> a[i+1])
                    swap(a, i, i+1);
            }
        }       
    }
Listing 1: BubbleSort function



Swap(): This function performs a pairwise swap of two array elements. It take three parameters: the first is the array, the second and third are the indices of the elements to be swapped.


    //FUNCTION TO PERFORM SWAP OF TWO ITEMS
    public static void swap(int[] ar, int x, int y)
    {
        int k = ar[x];
        ar[x] = ar[y];
        ar[y] = k;
    }
Listing 2: Swap Function

Display(): This function displays the elements of the array to the output by looping through the array elements and calling System.out.println().


    //FUNCTION TO DISPLAY ARRAY ITEMS TO THE OUTPUT
    public static void Display(int a[])
    {
        int i=0;
        while(i != a.length)
        {
            System.out.println(a[i]);
            i++;
        }       
    }
Listing 3: Swap Function




Main(): Runs the program by calling the each of the functions.


    //MAIN PROGRAM
    public static void main(String[] args) {
      int a[] = {5,7,3,8,0,2};    //INITIALIZE AN ARRAY WE ARE GOING TO SORT
      Display(a);                 //DISPLAY THE UNSORTED ARRAY
      BubbleSort(a);              //SORT THE ARRAY USING BUBBLESORT     
      System.out.println("The Sorted list is given below:");
      Display(a);                 //DISPLAY THE SORTED ARRAY
    }  
Listing 4: Main Program

This short video explains how to put everything together and run it in Netbeans