This repo cantained basic tools for data science.
- connect_to_mysql
- send_statement
- get_tables
- get_columns
- get_table_as_df
- get_quarry
Basic EDA workflow
get_numerical_categorical - basic separation for numerial/ categorical features
tools for assumptions testing (all_values_in_list and all_values_in_range)
plotting functions
- multi numerical box
- multi numerical EDA
- multi categorical coutplot
- multi numerical corr
- one numerical multi categorical KDA
- numerical + categorical bars
- numerical + numerical scatter
- categorical + categorical
- random split
- one hot incoding
- ordinal incoding
- Normalize feature
- Dimensionality Reduction with PCA
- feature selection
- Dimensionlity Reduction with PCA
- cross validation
- grid search
- random search
- roc curve
- confusion matrix
- simple random forest classifier
- Linear Regresion
- Polyunomial Regression
- print metrics
- number clusters (Elbow method)
- print clusters on 2 PCA
- Anomaly detection (local outlier and Isolation forest)
- A walkthrough on solving a time series problem.
- ARIMA model
- results evaluation