satyamkapoor

I work at ValueFirst Digital Media Private Ltd. I am a Product Marketer in the Surbo Team. Surbo is Chatbot Generator Platform owned by Value First. ...

Full Bio 

I work at ValueFirst Digital Media Private Ltd. I am a Product Marketer in the Surbo Team. Surbo is Chatbot Generator Platform owned by Value First.

Success story of Haptik
528 days ago

Who is afraid of automation?
528 days ago

What's happening in AI, Blockchain & IoT
529 days ago

3 million at risk from the rise of robots
529 days ago

5 ways Machine Learning can save your company from a security breach
529 days ago

Google Course for IT beginners, certificate in 8 months: Enrollment starts on Coursera today, check details
32718 views

7 of the best chatbot building plaftorms out there
21270 views

Could your job be taken over by Artificial Intelligence?
20157 views

IIT Madras launches Winter Course on Machine Intelligence and Brain Research
18246 views

WILL ROBOTS FIGHT THE NEXT WAR? U.S. AND RUSSIA BRING ARTIFICIAL INTELLIGENCE TO THE BATTLEFIELD
16626 views

Data Protection Platform capable of powering a new generation of AI, Apps & Data Science

By satyamkapoor |Email | Jan 15, 2018 | 4566 Views

Today, with so much of data available, companies  need to begin to treat their data as the most valuable asset. Companies that learn to make most of their data effectively by building, evolving & managing their data supply chains will be the one's leading. These data supply chains need to operate as smoothly as any other distribution network.
However, data supply chains present unique challenges. It's a daunting task to get a data supply chain working seamlessly since it has to gather data from many sources, distill it into a useful form, and then also succeed in delivering the specific subset as needed to the business. One needs to understand, data is not one-size fits all, therefore the data supply chain needs to be as flexible as your data is diverse.
To build the best data supply chains, companies should recognize an asset they already have in their inventory. And it's one they often overlook, as there is one repository at almost every company that is woefully underutilized as a source of business insights: Backups.

Backups don't just have to sit on a shelf and be pulled in only when other data is lost. In fact, they can drive innovation. How? Well, the whole process of what is now called data protection has become far more sophisticated. In this story we're going to use Commvault as an example of how data protection systems have created a central and comprehensive repository of data that can not only serve as a backup, but can also be the foundation for new ways of using data to create value.

In other words, we will explore how a modern data protection platform can help you build and run a data supply chain that supports new types of apps, AI, and data science.

How data protection has become a comprehensive data platform

In the past, data protection was all about backups. We all remember floppy disks and how no great late 80s tech movie could avoid involving some drama about the state of a backup. But for the enterprise writ large, backups have served as a key form of insurance. The whole backup system existed as a worst-case scenario setup, a way to transfer data to a safe place and then restore it if something went wrong.

But we need to expand how we think about backups to catch up with today's technology. In the modern world, data protection platforms have gone far beyond traditional backups in the following ways.

Creating metadata catalogs. Today, a massive amount of metadata is captured, so companies know much more about where data came from and how it is being used. These catalogs can help companies:

         Analyze data usage

         Understand growth of data

         Track down data

         Observe and monitor data sprawl

         Establish thresholds and institute alerts about capacity limitations

         Use REST APIs to add data to a dynamic index (for example, adding GPS data to an entity such as an asset)

Using data crawls. Data protection platforms can also empower companies to crawl their data and create an index of the results usable by anyone in the business, to find and categorize people, products, locations, and other vital information, such as:

         Entity identification and extraction

         Harvesting of data related to a particular analysis or AI use

         Identification of data needed for regulatory compliance

Establishing better search functionality within the data. Data protection platforms can create inverted indexes to make their data more searchable. Commvault's dynamic index creates such indexes to make searches go faster.

Serving as a transformation engine. The data within the platform can help to drive innovation across the business, as its accessibility allows users from data science to development to:

         Work with data masking

         Perform live Dev/Tests on cloud data

         Employ appropriate redaction techniques on data, while still being able to use data while it's live and relevant

Operating as a workflow engine. Once the platform is fully operationalized, companies can create workflows using visual coding and simplified methods to automate to expedite processes, including standard workflows and processes as well as third-party integration with platforms such as ticketing systems.

Analyzing use of data over time. Finally, because of the nature of data protection platforms, users can get multiple viewpoints of the same dataset  across time to see what has occurred with it. Such temporal analysis offers valuable insights.

What these platforms and data lakes have in common

When we look at the capabilities a data protection platform like Commvault offers, we see that it has many properties that people have been striving to gain from data lake projects, such as:

         All important data kept in a repository with a common metadata layer

         Ensuring data is indexed and searchable

         The ability to run transformation jobs to analyze and distill data, and to use a workflow engine to manage execution of such jobs

         API access to data, supporting processing and retrieval

Granted, there some key aspects of data lakes missing from data protection platforms, such as programming models for creating and running advanced analytics, and the ability to create new engines such as SQL engines and other machine learning technology that runs on Hadoop.

But when you include data protection platforms as part of your data infrastructure, you gain a tremendously powerful component in a data supply chain. The platforms might not do everything, but they do a lot, and no one data repository can actually provide companies with everything they need.

Putting a data protection platform to work

Now let's imagine how applications, AI, and data science can be all made more powerful with a data protection platform. Here's what these platforms provide.

Understanding what you have. You have a comprehensive view and index of your data. There's no more guessing about what you have and what's missing. This can be helpful, for instance, when you're in an app and want to know everything about a customer, or in a data science context and need context about the data. The platforms provide a metadata repository that aids understanding.

Getting access to all the data. Because of its basis in providing data recovery, data protection platforms have all your data. Once you've understood there might be something interesting in a particular dataset, the platform can give you direct access to the data itself and not just the metadata. This is a huge advantage as you can get access to lot of data that you couldn't access otherwise. This expedites results, as applications, AI, and data scientists don't have to wait around for data to be delivered - it's readily available.

Extracting nuggets. Data protection platforms break through barriers. We all know that some data is harder to find and mine for value than others. By consolidating all your data in one place, this ornery data becomes more manageable. For instance, if you want to find all the places in your data where a product or customer was mentioned, you can run a crawl through the platform and retrieve relevant data, and use it to feed analysis, apps, or AI.

Looking back in time. As I mentioned earlier, temporal analysis that companies gain from data protection platforms is invaluable. You can see how data is changing over time, monitor key trends, document and track changes, and perform analysis based on this information, allowing you to make better decisions based on historical data.

Performing metadata analytics. The same temporal analysis can also be used on your metadata. Companies can look back at all metadata and understand the changes and relationships between data sets, as well as who has accessed data and when, to get a better sense of the most vital data to the business.

A backup plan that is anything but
 One of the great things about data protection platform is that it gets created and updated automatically. Companies are still required to work on the data to distill it & put it to use, but with the help of such a platform, one starts with an incredibly powerful view of all the important data in the enterprise in one place.
Data protection platforms are capable of ready access to vast amount of historical data which can add an untapped dimension to one's data supply chain. In the end, AI experts, app developers & data scientists who have access to such a data protection platform will outsmart those who don't have access to one. 

Source: HOB