M04 - Regression, Classification, and Unsupervised Learning

1. Regression and Classification process modeling

Just to remind what we understand as a regression and classification:

Both are Supervised Learning models.

Regression - Regression models predict a numeric value for a label based on a function that applies coefficients to a set of known feature values. Regression is a form of supervised learning, so the function and coefficients are determined by training a regression algorithm with a training dataset, and evaluating it against a testing data set, in which the label values are known.

Classification - Classification models predict a categorical label value based on a function that applies coefficients to a set of feature values. The simplest classification models predict True or False (1 or 0), but you can also create multi-class classification models that are used to classify entities into a set of defined classes.

How the process looks like:

1.1 We have to understand data relationships, we have to know what kind of data it is and from where the data come from

1.2 In the second step we have to select only those features which are important for our study

1.3 After this we should select metric which suits best our needs

1.4 Modeling our experiment:

Modeling usually needs to take below actions to improve its results:

Cross Validation steps:

Azure ML modules:

2. Unsupervised Learning models

Just to remind what we understand as a unsupervised Learning models:

Unsupervised Learning models - Unsupervised learning models are based on a function that categorizes entities by applying coefficients to numeric feature values. The main difference between supervised learning (like regression and classification) and unsupervised learning (like clustering) is that in unsupervised learning, there are no known label values with which to train the model. The model simply groups entities together into a specified number of clusters based on similarities, which are usually determined by calculating the mathematical distance between the entities.

Before we start the process it will be better to know:

2.1 What is a business problem

2.2 There is no ground truth because there are no labels

2.3 Evaluation is a huge challenge (mainly visualizations helps here)

2.4 What does the structure of the data tell us

2.5 Do different models yield different results

2.6 How many clusters we are expecting, how many of those will be usefull

Most popular algorithms:

How to evaluate:

Maybe they are lying on each other are they are just a random big ellipses.

When we are looking at plots or projections does it tell us anythin.

Azure ML modules:

Posted with : Machine Learning

If you liked this post, you can share it with your followers or follow me on Twitter!