Legacy data, powering AI for the future

Legacy data, powering AI for the future

The average corporation manages 348 terabytes of data, according to IDG. Today, the world generates as much data in 10 minutes as it did in a whole year at the turn of the Millennium.

The amount of data available to us — as individuals, companies and societies — is exploding. And the truth is, we’re still trying to work out what to do with a lot of it. And that’s a problem, because what we don’t see any value in, we don’t take care of.

Data that may not seem valuable today, is often ignored or, at worst, totally discarded. I don’t necessarily mean customer data or data personal to human beings in some other way. Sometimes that too — but there are very clear rules on how long you can retain that kind of data for.

Think instead of the meta-data and depersonalised data created on the factory floor, by a fleet of networked vehicles or in an e-commerce operation. Some of this data, we can mine for value today. But what do we do with it then? And what about the bits that aren’t valuable right now?

The tools for tomorrow’s world

The things we can do with data today, seem almost like magic compared to what we could do only a few years ago. Data scientists can build a scaled lookalike audience from just a small amount of seed data.

Predictive-data specialists can save companies huge sums of money by moving them from analytic techniques, such as the Weibull distribution, that deal in averages across large populations to maintenance regimes optimised for each and every internet-connected component.

But what about the things we’re not doing today? Computing power and data analytics are advancing so fast, it’s very likely that in just a few years, we’ll be able to extract value from data in ways we can’t even imagine today. But will we have the historic data we need to reap those as-yet unimagined benefits.

To take an example we can already foresee, at the moment a lot of the AI applications being used in enterprises are based on machine learning. A data scientist takes training data, for which he or she already has both the inputs and desired outputs. The inputs are fed into the AI, which is constantly tweaked and optimised until its outputs match the outputs supplied with test data. The AI is then ready to let loose on real data. And it will process that data in exactly the way it has been trained to.

But there is another type of AI - deep learning. This can spot entirely new patterns in data, ones which can often deliver surprising insights and unlock a lot of value. But typically, to achieve these things, it needs a lot of data — often many years’ worth — to work with.

A company that discarded the data that was no good to its machine-learning algorithms today might find itself unable to benefit from deep learning in the future. Not only would that be a bad idea in itself, it would also send a signal to the (much sought-after) top data talent that the most exciting careers lay elsewhere.

How to maximise the long-term value of your data

So how can you ensure that you extract the maximum value from your data, now and far in the future? First, collect as much data as you can (though, in the case of personal data, be sure your policies are GDPR compliant).

The future is full of unknowns, making it unclear what data from today will be valuable tomorrow. Don’t allow your company to be paralysed because it tries to develop the perfect technical solution to extract maximum value from all the data you collect. Conversely, don’t limit what you collect to just the data you can extract value from today. Use the best data-analytics available to you now but also collect and store whatever you legally and ethically can, for the future.

Next, make sure that your data is stored in a portable form. It should be available clean and in its original format, in a read-only repository to which all business functions have access. Storing the data locally, in an owned data centre, is often inefficient if you require access to the data in the immediately or near term. In the modern connected economy, you may want to move some of that data on short notice to other markets, to take advantage of new opportunities. This works much better if your data is stored in a high-availability cloud-location, ideally replicated to secure servers around the world.

If, however, you wish to lock your data away for longer periods, your best bet may be to store it privately. That’s because of the cost involved with classifying your data to make sure it is protected in the cloud, which could also be increased with egress charges from the public cloud provider if you wish to move the data again at short notice.

Most importantly, to design a data-collection and retention strategy that complies with law and best practice, enables high availability and portability, and sets your company up for present and future success, it pays to work with external experts who can help you achieve these things to the highest possible standards in the shortest possible time. Data talent is tight and companies that move first, and do it right, will have a competitive advantage.

Value the data you have available to you and take action now to ensure it works well for you in the future. 

To view or add a comment, sign in

Insights from the community

Others also viewed

Explore topics