Any firm looking to hire a young data analytics professional would expect them to be aware of the most basic concepts of data querying. This is not only applicable for young professionals but also seasoned professionals looking to diversify their skill set. Without being thorough with the basic concepts that frequently applied in the data analytics projects, it is not possible to last very long in this fast-paced industry. These are the most commonly used concepts in any data analytics projects, which any professional should be able to articulate in an interview to win the interviewer's trust
1. Sorting - The concept of simply sorting data sounds very basic, simple and has very little application. However, it is important to understand how a particular tool goes about performing this function as it greatly affects the performance of your scripts. Sorting the data files is also a prerequisite when combining or joining data sets. If data is not properly sorted on the primary and secondary keys, it would provide incorrect outputs.
2. Joining Tables - This is a very powerful feature built into any tool which is capable of querying data sets like SQL databases, SAS, Audit Command Language. It is important for users to understand how the tool processes the data files line by line to create the output from a join as different tools to attempt the same goal in different ways. For instance, in the Audit Command Language both the primary and secondary keys are present in the output table whereas, in SQL Server, the resulting table only has one column. Users need to develop the clarity of thought to be able to envision the final output.
3. Identify Distinct Values - In most data analytics projects, this is a very common query that forms the basis of developing other data points to prepare the final reports. Analysts should always be mindful of how to identify unique values from raw data tables into new tables. When using audit command language scripts, the classify command or the summarize command provide this information and the same can be achieved is SQL based databases by using the keyword distinct.
4. Summarizing data - This is an all-time favorite and on par with the concept of joins. Summarizing a data set for certain values allows users to extract new information about a data set with very different fields. As a matter of fact, most exploratory queries might begin with a few summarize commands in order to understand the data points properly. For example, summarizing payroll data sets at the employee level would give the number of unique employees and if so desired the total salaries paid out to them over a period of time. There can be more such queries, which form the basis of designing the scope of an analytics project.
Mastering these concepts makes any professional ready to work on a variety of tools. This implies that familiarity with these concepts allows users to be able to scale projects on different tools and thus opens up more opportunities within the industry. It is quite remarkable how many people are not even able to master these basics in the current workforce.