# M04 - Regression, Classification, and Unsupervised Learning

22 Oct 2015# 1. Regression and Classification process modeling

Just to remind what we understand as a regression and classification:

Both are Supervised Learning models.

**Regression** - Regression models predict a numeric value for a label based on a function that applies coefficients to a set of known feature values. Regression is a form of supervised learning, so the function and coefficients are determined by training a regression algorithm with a training dataset, and evaluating it against a testing data set, in which the label values are known.

**Classification** - Classification models predict a categorical label value based on a function that applies coefficients to a set of feature values. The simplest classification models predict True or False (1 or 0), but you can also create multi-class classification models that are used to classify entities into a set of defined classes.

### How the process looks like:

1.1 We have to understand data relationships, we have to know what kind of data it is and from where the data come from

1.2 In the second step we have to select only those features which are important for our study

1.3 After this we should select metric which suits best our needs

1.4 Modeling our experiment:

Create model

Evaluate model

Improve model

Cross Validate model

### Modeling usually needs to take below actions to improve its results:

understanding residuals

filter, transform the data

feature engineering

better feature selection

use different type of model

choice of model parameters

### Cross Validation steps:

divide data into approximately-equallt sized 10 parts

train the algorithm on 9 parts, compute the evaluation measure on the last part

repeat this 10 times, using each part in a turn as test part

report the mean and standard deviation of the evaluation measure over 10 parts

Azure ML modules:

# 2. Unsupervised Learning models

Just to remind what we understand as a unsupervised Learning models:

**Unsupervised Learning models** - Unsupervised learning models are based on a function that categorizes entities by applying coefficients to numeric feature values. The main difference between supervised learning (like regression and classification) and unsupervised learning (like clustering) is that in unsupervised learning, there are no known label values with which to train the model. The model simply groups entities together into a specified number of clusters based on similarities, which are usually determined by calculating the mathematical distance between the entities.

### Before we start the process it will be better to know:

2.1 What is a business problem

2.2 There is no ground truth because there are no labels

2.3 Evaluation is a huge challenge (mainly visualizations helps here)

2.4 What does the structure of the data tell us

2.5 Do different models yield different results

2.6 How many clusters we are expecting, how many of those will be usefull

Most popular algorithms:

- K-means
- Hierarchical Agglomerative clustering

How to evaluate:

- Are cluster well separated?

Maybe they are lying on each other are they are just a random big ellipses.

- Does the structure tell us anything?

When we are looking at plots or projections does it tell us anythin.

Azure ML modules:

Posted with : Machine Learning