# Systematized Predictive Modeling

*Preprocessing*

- Zero mean (subtract the mean from each predictor) to center the data.
- Divide by standard deviation to scale the data.
- DateTime
- One-Hot Code
- Look For skewness, log/sqrt/Box Cox transform if necessary (Boxcox)
- Resolve Outliers (and understand their meaning) (apply spatial sign if model is sensitive to outliers)
- Eliminate Missing Data (Can be problematic if missingness is predictive. Tree Based Models can deal with missing data)
- Imputation/Interpolation (KNN or intermediate regression model)

*Exploratory Data Analysis*

- Maximal Information Coefficient Matrix / Correlation Matrix
- Box-Chart Everything
- Scatter Every Combination of Features
- Pivot Tables
- Group by particular features
- Histogram Everything
- Outlier Analysis
- Transform Variables (Square, Cube, Inverse, Log) and Plot
- Summary (Mean, Mode, Minimum, Maximum, Upper/Lower Quartiles, Identify Outliers)

*Data Reduction*

- Principal Component Analysis 2.Linear Discriminant Analysis (For Classification)
- Feature Selection (Only use the components that account for a majority of the information when Modeling
- Remove Low/Zero Variance Predictors
- Remove multicollinear heavily correlated features
- Isomap
- Lasso

*Algorithms for Regression*

- Linear Regression
- Ridge Regression / Lasso / Elastic Net
- Best Subset Selection
- Forward and Backward Stepwise, Stagewise

- Partial Least Squares
- Principal Components Regression
- Neural Networks
- CNN
- RNN -LSTM

- Multivariate Adaptive Regression Splines
- Support Vector Regressor
- K-Nearest Neighbors
- Regression Decision Trees
- Bagged Trees
- Random Forests
- Extremely Random Forests
- Gradient Boosted Trees
- Generalized Linear Model
- Generalized Additive Model

*Evaluating Regression*

- RMSE
- MAE
- Median
- R2
- Visualization

*Algorithms for Classification*

- Logistic Regression
- L1, L2, Elastic Net

- Discriminant Analysis
- Linear Discriminant Analysis
- Quadratic Discriminant Analysis
- Neural Networks
- CNN
- RNN
- LSTM

- Support Vector Classifier
- K-Nearest Neighbors
- Naive Bayes
- Classification Trees
- Bagged Trees
- Random Forests
- Extremely Random Forests
- Gradient Boosted Trees
- Generalized Additive Model

*Evaluating Classification*

- ROC Curve
- Confusion Matrix
- F1 Score
- Heat Map
- Overall accuracy rate
- Kappa Statistic
- Sensitivity
- Specificity
- AUC

*Unsupervised Learning*

- K-Means
- K-Means++
- K-Medoids

- Hierarchical Agglomerative Clustering
- Single Linkage, Complete Linkage, Average Linkage, Centroid Criterion

- Principal Components Analysis
- Spectral Clustering
- Affinity Propagation
- Biclustering
- Gaussian Mixture Model

*Classification Class Imbalance*

- Model Tuning (Tune Parameters For Sensitivity)
- Alternate Cutoffs (Using ROC Curve)
- Adjusting Prior Probability
- Unequal Case Weights
- Down Sampling
- Up Sampling
- Alter Cost Function
- Dynamic Structure (Cascade of classifiers)

*Feature Evaluation*

- Coefficients in Linear Models
- Random Forest Importances (variance for regression, information gain for classification)
- Pearson Correlation with Outcome
- Maximal Information Coefficient (MIC)
- Distance Correlation (code)
- Model with/without feature
- Randomly shuffle the feature between data points, check difference in model quality
- Lasso Automatic Selection
- Mean Decrease Accuracy (code)
- Stability Selection
- Recursive Feature Elimination

*Parameter Tuning*

- Cross Validation
- Bootstrap
- Grid Search (ex)

*Text Features*

- n-Grams
- Word Vector Representations (Word 2 Vec)
- Bag of words
- Word counts
- Lengths
- Tf-idf
- Term frequency, weighted by its rarity
- topic modeling (LDA)

*Modeling Techniques*

- Feature Engineering
- Basis Expansions
- Combine Features
- average values, median values, variances, sums, differences, maximums or minimums, and counts.

- Stacking (using output of one algorithm as input to the next)
- Internal Prediction
- Blending (Especially with differentiated models)
- Account For Missing Data (It can be information)
- External Data
- Acquire Domain Knowledge for Feature Engineering
- Random Forest, Boosters, Trees Importances for Feature Exploration
- Clustering for feature creation
- Distance to Class Centroid