Data Science Project Charter
1. Data Gathering
Data Gathering and dictionary creation. Data understanding plays a key role to understand what the data means.
2. Exploratory Data Analysis
EDA is divided into 2 phases
Univariate Analysis
Analyze and understand the ranges and categories of variables and check their distributions.
Bivariate Analysis
Plot the variables of interest against the dependent variable.
Correlation
Generate a correlation heatmap to figure out the highly correlated independent variables.
3. Data Wrangling
Clean/Impute Missing Variables
Data with a lot of missing variables generate noise within the model. It is advised to clean the data by either removing the variable if a high proportion of missing values exist.
Outlier Treatment
Take a call whether to keep the outliers or cap them at a certain level.
Feature Engineering
Generate new features based on existing variables.
Standardizing
Standard or Normalize the data to run the model
3. Modeling
Train Test Split
Split the data into training and testing datasets (ideally 70:30)
Initial Modeling
Run initial model to check for feature importance and prediction accuracy. Run multiple models to check variances across.
Hyperparameter Tuning
Using grid search run a combination of hyperparameter tuning and check for best results
Finalize Data and Model
Based on the initial modeling, drop/keep variables of interest/importance and run the final model.
Generate Insights
Present the generated insights as a flowing story that connects the business requirement and the data modeling.