If Spider-Man had worked in the booming data analytics field, Uncle Ben probably would have told him that with big data comes big responsibility. While that might resonate with many who manage big data, it is less intuitive what exactly this responsibility is.
However, the security breaches at Google and Facebook that expose user profiles to unauthorized parties are not the most complex ethical problems companies face. What creates greater challenges is the specific way data is gathered and used: Organizations combine vast amounts of information from several sources some public, some proprietary, some from human activity, some from devices to gain unique insights about human behavior and decisionmaking. Sometimes the potential consequences of such combinations are surprising and unexpected and may not even result from decisions by humans, but rather from artificial intelligence (AI) algorithms and automated processes.
So how can we begin to identify the ethical issues in big data and the appropriate ways to deal with them? At a basic level, having access to data about human behavior shifts the power dynamic between the individual and the organization that has this data, regardless of whether that data was given freely or collected by observation without consent. If data analysis allows a company to predict how people vote, date, choose what to buy, etc., they can use this insight to influence those behaviors. This is what makes big data incredibly valuable but at the same time makes individuals vulnerable.
From an ethical perspective, such vulnerability creates a duty to act with a higher standard of care because the vulnerable person has limited ability and power to control the situation. Therefore, companies should take additional steps to evaluate potential harms to vulnerable individuals whose data is used and consider ways harms can be mitigated, giving people more control over what happens with their information or at the very least informing them about potential dangers they are exposed to.
When organizations essentially reconstitute a person from various data sources, they might gain more insight into that person's behavior than that person knows about themselves. Thus, it becomes important to clearly consider the motives and use of such insights because the potential for manipulation is great. For example, I may not know that I react to certain messages in a specific way. Consequently, I have no way of consciously putting such messages into context to help me maintain control over my choices. Someone else is essentially controlling me and I lose my autonomy and free will. Reflecting carefully about the motives behind the use of big data is quite tricky though. The line is often not clear between helping people meet needs they might not know they had (and might be glad someone figured out and met) and getting people to do things they might not otherwise do if they knew that someone tried to steer their action. However, once we recognize that a discussion around the motives and purpose of big data use is legitimate and necessary, we can begin to figure out where we want to draw that line.
The previous discussion about a higher level of care that should guide decisions about big data presumes, however, that there is a specific action or decision that a person can reasonably take or that there is some discretion in how data is handled. This is not always the case because many processes are automated given the sheer volume of data involved. Furthermore, data, once compiled and cleaned, can be a valuable source of income if sold to other users. When reused and recombined with additional data by other organizations, the potential for harm might not be clear and beyond the control of the organization that originally collected and used the data. In such circumstances, the ethical responsibility shifts from making specific ethical decisions to having ethically appropriate processes and safeguards. In other words, just because someone else used part of a data set, or AI produced insights on its own, does not mean that there is no moral responsibility for the party that originally generated the data.
Get a handle on AI, big data
Ethical processes surrounding the automated analysis or derivative use by other parties should be regularly evaluated. Questions that should be asked include: Are the insights generated by AI correct and not biased? Can individuals appeal or review decisions about them that were made by AI (e.g., credit worthiness, employability)? If data is transferred to other parties, what is permissible use and is there potential harm? Is there a way that individuals can track, monitor, or prevent the transfer of information about them? While it is not possible to obtain every person's consent, companies should have proper records about data transfers.
Unfortunately, organizations currently do not have much guidance about how to adequately and ethically handle big data. The legal environment in the United States does not address many of the situations described above because there is no generally recognized human right to privacy at the state or federal level. Regulations that do exist only address specific types of information (e.g., health, finances) or information that has been specifically kept private, excluding information that has been gathered through observation or use of devices. Thus, given the lack of legal frameworks, it is important that everyone in the business of big data maintains a healthy dose of respect for the big responsibility such big data creates.