The Dark Data - What is it ? & The World Around it.

By Vaibhav Jain |Email | Oct 30, 2018 | 6957 Views

Dark Data is a type of unstructured, untagged and untapped data that is found in data repositories and has not been analyzed or processed. Gartner defines dark data as the information assets organizations collect, process and store during regular business activities, but generally fail to use for other purposes (for example, analytics, business relationships and direct monetizing).

Dark data is a type of unstructured, untagged and untapped data that is found in data repositories and has not been analyzed or processed. Gartner defines dark data as the information assets organizations collect, process and store during regular business activities, but generally fail to use for other purposes (for example, analytics, business relationships and direct monetizing).

Similar to dark matter in physics, dark data often comprises most organizations├??├?┬ó?? universe of information assets. Thus, organizations often retain dark data for compliance purposes only. Storing and securing data typically incurs more expense (and sometimes greater risk) than value.

Organizations gather huge volumes of data which, they believe, will help improve their products and services. For example, a company may collect data on how users use its products, internal statistics about software development processes, and website visits.However, a large portion of the collected data are never even analysed.

Why there is so much Dark Data ?
Here are a few reasons why there is so much of dark data.
1. Lopsided priorities
Take the example of a bank analyzing online applications for credit cards. The credit card marketing team is focused solely on customer details and eligibility but no attention is paid to the data on how the customer arrived at the application page. The unattended data could have provided valuable insights on the usability of the bank website and the application page. But there is no priority assigned to this aspect.

2. Disconnect among departments
In large organizations, departments have their own data collection and storage processes which may not be known to other departments. So, data, even if relevant to other departments, lie unused. This is a process issue obviously.

3. Technology and tool constraints
If data collection is done by separate technologies and tools in the same organization, there may be cases that these technologies and tools do not interact with each other because of technological constraints. This prevents bringing all the data together and creating a cohesive picture. This happens especially for companies that have different IT systems and formats. For example, it may be difficult to integrate audio file contents from call center with click data from websites. Companies that are at the early stages of a data analytics program face these problems.

How the Dark Data is Collected?
The Dark data is a data which is acquired through various computer network operations but not used in any manner to derive insights or for decision making.The ability of an organisation to collect data can exceed the throughput at which it can analyse the data.

In some cases the organisation may not even be aware that the data is being collected. IBM estimate that roughly 90 percent of data generated by sensors and analog-to-digital conversions never get used. In an industrial context, dark data can include information gathered by sensors and telematics.

Why Organisations retain the Dark Data ?
Organizations retain dark data for a multitude of reasons, and it is estimated that most companies are only analyzing 1% of their data.Often it is stored for regulatory compliance and record keeping.Some organizations believe that dark data could be useful to them in the future, once they have acquired better analytic and business intelligence technology to process the information.Because storage is inexpensive, storing data is easy.

Categories of the Dark Data :

Though the categories of dark data may vary across companies, the following categories of unstructured data usually are considered dark data:
  • Customer Information
  • Log Files
  • Previous Employee Information
  • Raw Survey Data
  • Financial Statements
  • Email Correspondences
  • Account Information
  • Notes or Presentations
  • Old Versions of Relevant Documents

Importance of the Dark Data:
Dark data represents a huge opportunity for companies to gain valuable insights which can drive their business. Take a look at the following examples:
Server log files can provide website visitor behavior.
Customer call detail records reveal customer sentiments and feelings.
Mobile geo-location data can provide traffic patterns.
Companies are letting go of opportunities by not tapping into dark data. It is also true that they need better processes, coordination and technologies to appropriately use dark data.

Useful data may become dark data after it becomes irrelevant, as it is not processed fast enough. This is called "perishable insights" in "live flowing data". For example, if the geo-location of a customer is known to a business, the business can make offer based on the location, however if this data is not processed immediately, it may be irrelevant in the future. According to IBM, about 60 percent of data loses its value immediately.Not analyzing data immediately and letting it go 'dark' can lead to significant losses for an organisation in terms of not identifying fraud, for example, fast enough and then only addressing the issue when it is too late.

Better ways to handle dark data:

Either way you view dark data as an opportunity or a reflection of problems, you cannot deny its importance. The ideal way to handle dark data is to utilize it well. But that may not be easy, considering the investments needed. Still, there needs to be a start. Unused data may render some of it redundant over time. Also, it is unlikely that all of the dark data will be valuable. So, you should neither toss out all of the dark data nor consider all of it a goldmine. Here are some ways to get the best out of dark data.

Regularly audit and prune the database. This means that you should be structuring or assigning categories to the old data so that you know what kind of data is stored and where. You do not have to dump that data. With storage becoming inexpensive, there is no need to dump data. Later, you may suddenly need the data and since you have organized the data well, you can find it quickly.
Apply strong encryption standards on the data. This should be applicable both for data sitting in the in-house servers and the cloud storage. Encryption can prevent a lot of security issues with data.
Have data retention and safe disposal policies in place. Carefully formulate policies,identifying data for erasure or destruction. Good retention policies will help you retain valuable data for later use.

What will be the Future of the Dark Data ?
It is generally considered that as more advanced computing systems for analysis of data are built, the higher the value of dark data will be. It has been noted that "data and analytics will be the foundation of the modern industrial revolution".Of course, this includes data that is currently considered "dark data" since there are not enough resources to process it. All this data that is being collected can be used in the future to bring maximum productivity and an ability for organisations to meet consumers' demand. Furthermore, many organisations do not realize the value of dark data right now, for example in healthcare and education organisations deal with large amounts of data that could create a significant "potential to service students and patients in the manner in which the consumer and financial services pursue their target population".

By- Vaibhav Jain

Source: HOB