Nand Kishor is the Product Manager of House of Bots. After finishing his studies in computer science, he ideated & re-launched Real Estate Business Intelligence Tool, where he created one of the leading Business Intelligence Tool for property price analysis in 2012. He also writes, research and sharing knowledge about Artificial Intelligence (AI), Machine Learning (ML), Data Science, Big Data, Python Language etc... ...
Full BioNand Kishor is the Product Manager of House of Bots. After finishing his studies in computer science, he ideated & re-launched Real Estate Business Intelligence Tool, where he created one of the leading Business Intelligence Tool for property price analysis in 2012. He also writes, research and sharing knowledge about Artificial Intelligence (AI), Machine Learning (ML), Data Science, Big Data, Python Language etc...
3 Best Programming Languages For Internet of Things Development In 2018
879 days ago
Data science is the big draw in business schools
1052 days ago
7 Effective Methods for Fitting a Liner
1062 days ago
3 Thoughts on Why Deep Learning Works So Well
1062 days ago
3 million at risk from the rise of robots
1062 days ago
Top 10 Hot Artificial Intelligence (AI) Technologies
345669 views
2018 Data Science Interview Questions for Top Tech Companies
97716 views
Want to be a millionaire before you turn 25? Study artificial intelligence or machine learning
92646 views
Here's why so many data scientists are leaving their jobs
90264 views
Google announces scholarship program to train 1.3 lakh Indian developers in emerging technologies
70095 views
Tutorial on Automated Machine Learning using MLBox


- What is MLBox?
- MLBox in comparison to other Machine Learning libraries.
- Installing MLBox
- Layout/Pipeline of the MLBox
- Building a Machine Learning Regressor using MLBox
- Basic Understanding of Drift
- Basic Understanding of Entity Embedding
- Pros and Cons of MLBox
- End Notes
- Fast reading and distributed data preprocessing/cleaning/formatting
- Highly robust feature selection and leak detection
- Accurate hyper-parameter optimisation in high-dimensional space
- State-of-the-art predictive models for classification and regression (Deep Learning, Stacking, LightGBM,...)
- Prediction with models interpretation"
- Drift Identification - A method to make the distribution of train data similar to the test data.
- Entity Embedding - A categorical features encoding technique inspired from word2vec.
- Hyperparameter Optimization


- Pre-Processing
- Optimisation
- Prediction
- Reading and cleaning a file
s=","
r=Reader(s) #initialising the object of Reader Class
path=["path of the train csv file","path of the test csv file "]
target_name="name of the target variable in the train file"
space={'ne_numerical_strategy':{"search":"choice","space":['mean','median']},
'ne_categorical_strategy':{"search":"choice","space":[np.NaN]},
'ce_strategy':{"search":"choice","space":['label_encoding','entity_embedding','random_projection']},
'fs_strategy':{"search":"choice","space":['l1','variance','rf_feature_importance']},
'fs_threshold':{"search":"uniform","space":[0.01, 0.3]},
'est_max_depth':{"search":"choice","space":[3,5,7,9]},
'est_n_estimators':{"search":"choice","space":[250,500,700,1000]}}
"accuracy"
, "roc_auc"
, "f1"
, "log_loss"
, "precision"
, "recall"
Scoring values for Regression-
"mean_absolute_error", "mean_squarred_error", "median_absolute_error", "r2"
opt=Optimiser(scoring="accuracy",n_folds=5)
# coding: utf-8
# importing the required libraries
from mlbox.preprocessing import *
from mlbox.optimisation import *
from mlbox.prediction import *
# reading and cleaning the train and test files
df=Reader(sep=",").train_test_split(['/home/nss/Downloads/mlbox_blog/train.csv',
'/home/nss/Downloads/mlbox_blog/test.csv'],'Item_Outlet_Sales')
# removing the drift variables
df=Drift_thresholder().fit_transform(df)
# setting the hyperparameter space
space={'ne__numerical_strategy':{"search":"choice","space":['mean','median']},
'ne__categorical_strategy':{"search":"choice","space":[np.NaN]},
'ce__strategy':{"search":"choice","space":['label_encoding','entity_embedding','random_projection']},
'fs__strategy':{"search":"choice","space":['l1','variance','rf_feature_importance']},
'fs__threshold':{"search":"uniform","space":[0.01, 0.3]},
'est__max_depth':{"search":"choice","space":[3,5,7,9]},
'est__n_estimators':{"search":"choice","space":[250,500,700,1000]}}
# calculating the best hyper-parameter
best=Optimiser(scoring="mean_squared_error",n_folds=5).optimise(space,df,40)
# predicting on the test dataset
Predictor().fit_predict(best,df)





- Automatic task identification i.e Classification or Regression
- Basic Pre-processing while reading the data
- Removal of Drifting variables
- Extremely fast and accurate hyperparameter optimisation.
- A wide variety of Feature Selection Methods.
- Minimal lines of code.
- Feature Engineering via Entity Embeddings
- It is still under active development and things may break or make at any point in time.
- No support for Unsupervised Learning
- Basic Feature Engineering. You still have to create your own features.
- Purely mathematical based feature selection method. This method may remove variables which make sense from the business perspective.
- Not truly an Automated Machine Learning Library.