# M01 - Introduction and Data Science Theory

30 Sep 2015## 1. Introduction to Data Science

Doing Data Science means using data to make decisions that drives to action.

Data Science mainly involves:

- Finding data
- Acquiring data
- Cleaning and transforming data
- Understanding data and its relationships
- Delivering value from data

*Predictive analytics* is about using past data to predict future values.

*Prescriptive analytics* is about using those predictions to drive decisions.

## 2. Data Science Process

Data Science process is iterative and involves these steps:

- Data selection
- Preprocessing
- Transformation
- Data Mining
- Interpretation and evaluation

Historical approach to Data Science

These documents were written in different years in the past but have many things in common.

## 3. Introduction to Machine Learning

*supervised learning* - machine learning model is trained using a set of existing, known data values.

**Classification**- is used to identify Boolean (True/False) values**Regression**- is used to identify real numeric values.

**Terms to know:** *feature, label, over-fitted (works only with training data), under-fitted (too general)*

*unsupervised learning* - analyzing data and looking for patterns

**Clustering**- machine learning is used to group (or cluster) data entities based on similar features.**Recommendations**- machine learning solutions that match individuals to items based on the preferences of other similar individuals, or other similar items that the individual is already known to like.

Occam's Razor: The best models are simple models that fit the data well.

## 4. Regression

Algorithms

- SLR (Simple Linear Regression)
- Ridge Regression
- SVM (Support Vector Machine)

Evaluation of algorithms

- Cross-Validation
- Nested Cross-Validation

## 5. Classification

Algorithms

- Decision Trees
- Multi Class Classification

**Terms to know:** *Loss function, Imbalanced data, TPR (True Positive Rate), FPR (False Positive Rate), ROC (Receiver Operator Characteristic)*

## 6. Clustering

Algorithms

- K-Means
- Hierarchical Agglomerative Clustering

**Terms to know:** *distance metric*

## 7. Recommendation

**Terms to know:** *Matrix Factorization*

Posted with : Machine Learning