May 19, 2017
Realising the benefits of data lakes
A data lake is a repository for storing vast amounts of raw, streaming or unstructured data. With a data lake, organizations can store data from multiple sources, without having to model it first. The promise is great - data lakes, when executed well, can provide input into and receive real-time updates from operational applications, as well as providing true, consistent, actionable insights to business users.
A more complete view
Data lakes can help you build a 360-degree view of your customers. If you only have the customer’s transactional data to analyse (payments, orders, sales) without preferences, likes and dislikes - from social media, for example - you only have a partial view of what is relevant to your customer. Using a data lake you can store and access this kind of unstructured data, together with the transaction data stored in your traditional data warehouse, to build a much more relevant and personalized view of your customers, which can in turn give you a real competitive advantage.
Building a data lake for success
The idea of a data lake might sound straightforward – just dumping all your raw data into one store. To make a data lake really work for your organization, though, you need a way to make it cohesive when you query it. You need to understand the value you intend to derive and the insights you wish to gain from your data lake to prevent it from being just another data repository that cannot be easily accessed or analysed by users. Fortunately, there have been some great innovations which make moving data into the lake and then modelling it easier.
Organisations need to manage data architectures in the same way they manage modern applications, networks and IT and cyber security: as a living breathing operation that must run reliably and automatically on a continuous basis. This will mean making changes to processes, tools and technologies and evaluating the skills of your data and analytics staff.
Nothing can be introduced into an enterprise in isolation. Any new system needs to work well with the technology, skill set and processes that exist. As data lakes graduate from the one-off and prototyping phases and become ready for consumption by the enterprise, one of the biggest challenges to adoption and success is around retrofitting and operationalising it.
A well-oiled process in an enterprise is mostly automated. How continuous integration and continuous deployment, security, versioning and collaboration are handled will define the success of a data lake in an enterprise.
Realise your big data plans with data lakes
Implementing a data lake incorrectly or partially will not produce much value, but if you plan your data lake carefully, include a rich set of meta tags with your data in the lake and make your data lake searchable, your data lake will be useful. It will serve as a repository for data that can help you realise your big data plans, and build value within your organisation.