Data science is not about making a complicated model, it is not only about making nice visualizations, but it is also not just about writing the code. Data Science is about using data for creating as much impact as possible for your company.
Now, the impact can be in the form of multiple things: it could be in the form of insights in the form of data products or in the form of product recommendations for a company. Now to do those things then you need tools like making complicated models or data visualizations or writing code but essentially as a data scientist, your job is to solve real company problems using data and what kind of tools you use we don't care. Now there's a lot of misconception about data science, especially on YouTube and I think the reason for this is because there's a huge misalignment between what's popular to talk about and what's needed in the industry.
What is big science?
Before data science, we popularized the term data mining. Data mining to knowledge discovery in databases in 1996 in which it referred to the overall process of discovering useful information from data in 2001. William s Cleveland wanted to bring data mining to another level he did that by combining computer science with data mining. Basically, he made statistics a lot more technical which he believed would expand the possibilities of data mining and produce a powerful force for innovation.
Now you can take advantage of computer power for statistics and he called this combo data science. Around this time this is also when web 2.0 emerged where websites are no longer just a digital pamphlet but a medium for a shared experience amongst millions and millions of users. These are web sites like MySpace in 2003, Facebook in 2004 and YouTube in 2005. We can now interact with these web sites, can contribute post comment like upload, share, leaving our footprint in the digital landscape we call the Internet and help create and shape the ecosystem.
Just to support the handling of the data, we needed parallel computing technology like MapReduce, Hadoop, and spark. So the rise of big data in 2010 sparked the rise of data science to support the needs of the businesses, to draw insights from their massive unstructured data sets. So then the journal of data science described data science as almost everything that has something to do with data: collecting, analyzing, modeling. Yet the most important part is its applications, all sorts of application like machine learning.
So in 2010 with the new abundance of data, it made it possible to train machines with a data-driven approach, rather than a knowledge-driven approach. All the theoretical papers about recurring neural networks, support vector, machines became feasible. Something that can change the way we live and how we experience things in the world.
Deep learning is no longer an academic concept in this thesis paper. It became a tangible useful class of machine learning, that would affect our everyday lives. So machine learning and AI dominated the media overshadowing. Every other aspect of data science like export or analysis experimentation and skills, we traditionally called business intelligence. So now the general public thinks of data science as researchers focused on machine learning and AI. But the industry is hiring data scientists as analysts.
So there's a misalignment there, the reason for the misalignment is that yes most of these data scientists can probably work on more technical problems but big companies like Google, Facebook, Netflix, have so many low-hanging fruits to improve their products that they don't require any advanced machine learning or statistical knowledge to find these impacts in their analysis. Being a good data scientist isn't about how advanced your models are, it's about how much impact you can have with your work. You're not a data cruncher, you're a problem solver, you're strategists companies will give you the most ambiguous and hard problems and we expect you to guide the company in the right direction.
So Now in this Video analytics that tells you using the data. What kind of insights can tell me? What is happening to my users and then metrics? This is important because what's going on with my product, these metrics will tell you if you're successful or not and then also you know testing of course experimentation that allows you to know which product versions are the best. So these things are actually really important but they're not so covered in media.
For a company! For the industry! It's actually not the highest priority or at least it's not the thing that yields the most result for the lowest amount of effort. That's why AI deep learning is on top of the hierarchy of needs and these things may be testing analytics, they're actually way more important for an industry so that's why we're hiring a lot of data scientists.
This whole thing is data science so that means you have to do everything. But let's look at medium-sized companies, now finally they have a lot more resources that can separate the data engineers and the data scientists. So usually in collection, this is probably software engineering and then here you're going to have data engineers and then depending. If you're medium-sized company and does a lot of recommendation models or stuff that requires AI, then the S will do all this right. So as a data scientist, you have to be a lot more technical. That's why they only hire people with PhDs or masters because they want you to be able to do more complicated things.
Here's how it looks for a large company, instrumental logging sensors this is all handled by software engineers right and then here it's cleaning and building data pipelines. This is for data engineers now, here between these two things we have data science analytics.