...
Full Bio
Google Go Language Future, Programming Language Programmer Will Get Best Paid Jobs
128 days ago
New Coder Tool Promises to Turbo-Charge Coding In Major Programming Language
129 days ago
Why Many Companies Plan To Double Artificial Intelligence Projects In The Next Year
129 days ago
Why 75% SMBs Are Not Worried About Artifical Intelligence Killing Their Jobs
129 days ago
Interview Study Guide for Data Science To Get Job Quickly
132 days ago
Highest Paying Programming Language, Skills: Here Are The Top Earners
641736 views
Top 10 Best Countries for Software Engineers to Work & High in-Demand Programming Languages
495966 views
Which Programming Languages in Demand & Earn The Highest Salaries?
447828 views
50+ Data Structure, Algorithms & Programming Languages Interview Questions for Programmers
265104 views
100+ Data Structure, Algorithms & Programming Language Interview Questions Answers for Programmers - Part 1
226521 views
How To Get Started With Kaggle Competitions in Machine Learning
- Cross-validation (CV): Always split the training data into 80% and 20%. That way when you train on 80% of the data, you can manually cross-check with 20% of the data to see if you have a good model. To quote the discussion board on Kaggle, "Always trust your CV more than the leaderboard." The leaderboard has 50% to 70% of the actual test set, so you cannot be sure about the quality of your solution based on the percentages. Sometimes your model might be great overall, but bad on the data, specifically in the public test set.
- Cache your intermediate data: You will do less work next time by doing this. Focus on a specific step rather than running everything from the start. Almost all python objects can be pickled, but for efficiency, always use .save() and .load() functions of the library you are using for your code.
- Use GridSearchCV: It is a great module that allows you to provide a set of variable values. It will try all possible combinations until it finds the optimal set of values. This is a great automation for optimization. A finely tuned XGBoost can beat a generic neural network in many problems.
- Use the model appropriate to the problem: Using a knife in a gunfight is not a good idea. I have a simple approach: For text data, use XGBoost or Keras LSTM. For image data, use Pre-trained Keras model (I use Inception most of the time) with some custom bottleneck layers.
- Combine models: Using a kitchen knife for everything is not enough. You need a Swiss army knife. Try combining various models to get even more accurate information. For example, Inception plus the Xception model work great for image data. Combined models take a lot of RAM, which g2.2xlarge might not provide. So avoid them unless you really want to get that accuracy boost.
- Feature extraction: Make the work easier for the model by extracting multiple simpler features from one feature, or combining several features into one feature. For example, you can extract the country and area code from a phone number. Models are not very intelligent, they are just algorithms that fit data. So make sure that the data is appropriate for optimal fit.



