Interviewing for all jobs is already intimidating but to get that one job you need to prepare yourself, get over your nervousness, start learning from your failures and get back for the next one. Every company requires different sets of talents, while some jobs require theoretical knowledge; some jobs are based more on your practical knowledge. Same is the case when it comes to getting jobs in data science, getting jobs in data science has it owns unnerving effects. A great portfolio in this case is a good way to showcase your skills what you are good at what your strengths are. This is the best way to show your employers your skills that you have been learning.
Some of the data science projects for your portfolio that will showcase your skill and will help you get in a job:
One of the main reasons to create a skill around data cleaning is because most of the time spent as a data scientist is working on new projects which require a lot of cleaning, in a team cleaning is a pain to work on. Showing the skill in data cleaning and proving yourself will help you in becoming a valuable part of any data scientist team. If you're working with Python, Pandas is a great library to use, and if you're working with R, you can use the dplyr package. Make sure to showcase the following skills:
1. Importing data
2. Joining multiple datasets
3. Detecting missing values
4. Detecting anomalies
5. Imputing for missing values
6. Data quality assurance
Just find some data sheets and start cleaning them.
Exploring data analysis
This is another key feature of data science, the process of generating questions, and investigating them with visualizations. It allows an analyst to extract conclusions from data to drive business impact to create interesting insights based on customers segment, or sales trends based on seasonal effects. Often you can make interesting discoveries that weren't initial considerations. Some useful Python libraries for exploratory analysis are Pandas and Matplotlib. For R users, the ggplot2 package will be useful. An EDA project should show the following skills:
Ability to formulate relevant questions for investigation
1. identifying trends
2. Identifying co variation between variables
3. Communicating results effectively using visualizations (scatter plots, histograms, box and whisker, etc.)
Interactive Data Visualizations
Interactive data visualizations include tools such as dashboards. These tools are useful for both data science teams, as well as more business-oriented end users. Dashboards allow data science teams to collaborate, and draw insights together. Even more important, they provide an interactive tool for business-oriented customers. These individuals focus on strategic goals rather than technical details. Often the deliverable for a data science project to a client will be in the form of a dashboard. For Python users, the Bokeh and Plotly libraries are great for creating dashboards. For R users, be sure to check out RStudio's Shiny package. Your dashboard project should highlight these important skills:
1.Including metrics relevant to your customer's needs
2. Creating useful features
3.A logical layout ("F-pattern" for easy scanning)
4.Creating an optimum refresh rate
5.Generating reports or other automated actions
A machine learning project is another important piece of your data science portfolio. Now before you run off and start building some deep learning project, take a step back for a minute. Rather than building a complex machine learning model, stick with the basics. Linear regression and logistic regression are great to start with. These models are easier to interpret and communicate to upper level management. I'd also recommend focusing on a project that has a business impact, such as predicting customer churn, fraud detection, or loan default. These are more real-world than predicting flower type.
If you're a Python user, use the Scikit-learn library. For R users, use the Caret package. Your machine learning project should convey the following skills:
1. Reason why you chose to use a specific machine learning model
2. Splitting data into training/test sets (k-fold cross validation) to avoid overfitting
3. Selecting the right evaluation metrics (AUC, adj-R^2, confusion matrix, etc.)
4. Feature engineering and selection
5. Hyperparameter tuning
Communication is an important aspect of data science. Effectively communicating results is what separates the good data scientists from the great ones. It doesn't matter how fancy your model is, if you can't explain it to teammates or customers, you won't get their buy-in. Slides and notebooks are both great communication tools. Use one of your machine learning projects and put it into slide format. You could also use a Jupyter Notebook or RMarkdown file for a communication project.
Make sure to understand who your intended audience is. Presenting to executives is very different than presenting to machine learning experts. Make sure to hit on these skills:
1. Know your intended audience
2. Present relevant visualizations
3. Don't crowd your slides with too much information
4. Make sure your presentation flows well
5. Tie results to a business impact (reduced cost, increased revenue)
Make sure to document your projects in Jupyter Notebooks or RMarkdown files. You can then convert these markdown files to static websites for free using Github Pages. This is a great way to showcase your portfolio to potential employers.