Closing the data trust gap

Global Trends

7 Mins.

In the digital economy, data is the fuel that drives many of the key products and services we all rely on. But unless we can count on the integrity and quality of that data, the value it generates is quickly diminished.

Table of contents

1. Data is the new oil
2. Dimensions of data
3. Protecting the life stages of data
4. Root of trust
5. Risk-free sharing
6. Trusting data-driven AI
7. From data trust to value

Imagine an oil company that doesn’t know how many barrels of oil it holds in inventory or the quality of the oil in those barrels. Or even if some of those barrels are leaking.

In the era where data is celebrated as “the new oil” and executives feel the pressure to extract maximum value from data, that is exactly where many companies find themselves. They know they are creating, accessing, capturing, and replicating vast amounts of data. Too often, however, they have little way of guaranteeing its quality and integrity. The result is a data trust gap that undermines decision-making at all levels.

A study by KPMG found that two-thirds of senior executives have either reservations about or active mistrust in their own organizations’ data and analytics.¹ And that’s a picture mirrored by PwC research, which shows data owners have major concerns not just about data theft and leakage (34%) but about the intrinsic quality of data (34%) that they can access – and its integrity (31%).²

Addressing that deficit in data trust has never been more important – or complex. The security, reliability, and quality of data determines the value of the applications and services that drive and enrich business, society, and our personal lives. And because data is now increasingly generated and used by many billions of embedded IoT devices, as well as becoming the raw material for AI machine learning, establishing trust in data is vitally important.

“Many business models and innovations today would not be feasible without trust in the integrity of the data they work with“

Michael Tagscherer
Group CTO of G+D

Trust in data is fundamental in so many applications, highlights Michael Tagscherer, Group CTO of G+D. “Is autonomous driving possible without trust in the sensor data observing the car’s external environment?,” he says. “Can smart energy grids and highly automated factories operate safely without trusted data? Can humans collaborate with robots in an industrial or healthcare setting without having confidence in the data supporting their activity?” The answers are as self-evident as the consequences of a lack of data trust are serious.

A businessman reads business data on paper documents

The multi-layered dimensions of data

So, how can greater trust be established in this vital fuel of modern economies?

As Tagscherer highlights, “Many business models and innovations today would not be feasible without trust in the integrity of the data they work with.” But there are multiple considerations when seeking to elevate data to a “state of trust.”

Critically, the quality and integrity of data needs to be measured and controlled across six properties:

Consistency: the absence of differences when two or more representations of a data set are compared. For example, if a customer requests a change of address with their bank, that needs to be reflected in all the different accounts and products they use – whether a current account, a rewards program, or a life insurance policy.
Accuracy: how accurately does the data describe the world? For example, a sensor – say, in an autonomous vehicle or industrial plant – needs to be able to generate real-time data with dependable accuracy if it is to be trusted.
Completeness: how complete is the data and how much is missing or unobserved? This is not about ensuring that 100% of all possible data is present and available, but about having the complete set of data required for a particular use.
Lineage and traceability: where and how was data originally created? Trust in this case depends on establishing the provenance of data that subsequent actions can depend on.
Timeliness: is data sufficiently current to reflect the different realities its user needs to address? For an online retailer, for example, timeliness means having real-time visibility into inventory and product availability. However, in another context for the same retailer, timeliness may mean using weekly sales summaries when forecasting future demand.
Security: unless data is suitably protected from unauthorized access, its integrity will always be in question. So, establishing and maintaining security is fundamental for data trust.

Protecting the life stages of data

Data of every kind passes through a life cycle: from its creation, movement, and storage to its enhancement; from its consumption, sharing, and updating to its ultimate termination. Data will often become more valuable as it travels on that journey as a result of its application, enhancement, monetarization, and so on. But any data strategy needs to make trust a fundamental characteristic of each part of that journey.

Source: G+D

To tackle those evolving challenges, data experts argue that we need to develop new technologies and approaches that are designed to protect and guarantee the integrity of data and how it can be used across its life cycle.

Nowhere is that clearer today than in the realms of IoT, inter-company collaboration, AI, and privacy.

Data is now generated by many billions of connected devices, gathering data in factories and smart cities, logistics systems and intelligent vehicles, and many other places besides. According to tech industry analysts at IDC, the “global datasphere” as a whole will grow to 177 zettabytes by 2025, up eightfold since 2018. But by 2025, half of all that data will be created by IoT devices.³ That presents a key challenge: to close security gaps in the life cycle of IoT data to ensure it is not exposed or manipulated – and to ensure that the trust and value of that data is not destroyed.

Establishing a “root of trust”

As the creation of data on an IoT device or a sensor often happens in an environment away from human observation, it is even more critical here than elsewhere to establish an anchor for that data: a so-called root of trust.

Roots of trust, as defined by the US National Institute of Standards and Technology, are highly reliable hardware, firmware, and software components that perform specific, critical security functions, thus providing a firm foundation from which to build security and trust in an overall digital ecosystem. Since the security of roots of trust must itself be guaranteed, they are often implemented directly in hardware so that malware cannot tamper with the functions they provide.⁴

Establishing such intrinsic trust is critical when data is shared, used, and stored.

Sharing data without risk

Another key area of trust involves collaboration. Business collaborations and joint ventures often require the sharing of data while maintaining confidentiality of each company’s intellectual property (IP). Project collaboration among car manufacturers is a good example: all participants might bring relevant data to the project from each of their perspectives (e.g. on autonomous driving safety or EV efficiency). And while all of them want to access the valuable data insights, they want to do so without disclosing their own IP.

Those kinds of opportunities to share data between organizations can generate attractive economic benefits (always assuming such cross-industry collaboration has no antitrust implications). But many companies are still reluctant to make their data available for fear of inadequate security. So-called confidential computing opens up entirely new possibilities here. Data and software can be protected in the cloud while the data is being processed by trusted execution environments.

All parties involved in such a collaboration can precisely define and verify who is allowed to supply and retrieve what data, enabling trust between the business partners.

The list of use cases is growing fast: medical research institutions can work closely with each other without revealing sensitive patient data, and industrial enterprises operating in the same sector could help each other with data without having to worry about exposing trade secrets.

A woman holds a presentation in front of her colleagues

Trusting data-driven AI

Artificial intelligence scenarios have a particular resonance when establishing a life cycle for trusted data. To be successful, AI implementations need to know they are working with data from trusted sources, otherwise the resulting predictions, recommendations, and actions cannot be relied upon. There are countless examples of AI-driven decision-making that has proved to be based on biased or inaccurate data. As tech analyst company Gartner advises, “Every business leader must understand the characteristics of the data underneath the AI to make judgements on AI quality.”

At the same time, the process of obtaining valuable data cannot infringe on people’s rights or privacy. The company Brighter AI has seen success here by blurring or creating a synthetic replacement for personal identifiers (such as the faces of people captured in a video stream or a car number plate seen on CCTV) in such a way that it anonymizes data and protects privacy. To exploit that potential, the company has secured funding from top VCs, including G+D Ventures, eCAPITAL, Armilar Venture Partners, and Deutsche Bahn Digital Ventures.

For AI applications, there are other considerations. Machine learning often draws on multiple, disparate sources to feed into and train a specific model.

Federated learning – a machine learning technique that trains algorithms across multiple decentralized sources of data – is a powerful technology approach when organizations are collaborating as partners. The approach means that only the machine learning model and outcomes of the processing are shared – but not the data itself. This way, for example, biometric data, IP, or other sensitive information can be used for machine learning, making it particularly useful in joint-development projects.

From data trust to value

Trust is always a fragile component in any situation – but especially in the context of technology. Rapidly advancing technology and the exponential rise in available data need to be matched with new techniques, approaches, and solutions that can ensure that the data powering applications and services can be intrinsically trusted.

As PwC highlights, “Your organization does not just need a data strategy; it needs a data trust strategy. The challenge is to optimize that ability of data to create value while minimizing any capacity to undermine value through an absence of trust.”⁵