Panaseer founder and CEO, Nik Whitfield, is a noted computer scientist and cyber security strategy expert. His company is renowned for its expertise in cyber security analytics and providing the big data analytics tools that help financial services companies build an effective and intelligent defence against the rising tide of cyber-attacks.
He’s been speaking with Dark Reading about some of the avoidable pitfalls of dealing with big data and the Data Lake, while providing tips on getting the most out of it. As a platform for big data analytics tools, Hadoop is versatile and more than capable – an essential element for improving enterprise security. That said, when looking for insights from data in Hadoop, it is very easy to get bogged down.
Hadoop is ideal for technology and security teams, particularly in finance, who are running data lake projects together to build data analytics capabilities. Collaboration between different teams, using the right tools, is an important weapon in developing a cyber security strategy and improving enterprise security.
What does this mean for security teams?
For security teams, this means gaining timely insights from similar data, which helps them solve a wide range of problems.
These problems include continuous monitoring of cyber hygiene factors across the IT environment such as asset, vulnerability, configuration and access management.
Other problems relate to the identification of threat actors moving across their networks. They attack these problems by correlating logs across large cumbersome data sets such as those from Web proxies, Active Directory, DNS and Netflow. This use of big data and analytics becomes highly useful in meeting the cyber security challenge faced by financial (and other) institutions today.
Security teams have realised that whether it is the chief information security officer (CISO) and leadership team, the control managers, security operations or incident response, they are all using big data analytics tools to gain insight from overlapping data sets. The same data that would simplify security operations’ job in monitoring and detecting for malicious activity across application, users and devices, would also be useful for enhancing the CISO’s executive communications with risk and audit functions.
Hadoop facilitates the storage and analysis of all this data on one platform, meaning security teams can consolidate the output of all their security solutions and simplify their tech stacks where possible.
If you don’t want your data lake to turn into a data swamp, you need to identify the challenges that exist through every stage of data lake projects. The four phases of the process are: build data lake, ingest data, do the analysis, and deliver insight.
The data lake needs to be able to support the ingestion of relevant data sets at the right speed and frequency and enable a range of big data analysis techniques to generate relevant insights. That opens the door to building and running efficient analysis on these data sets. These big data analytics tools need to deliver the appropriate insights that stakeholders need. Finally, these insights need to be presented in a way that is relevant to the concerns, decisions and responsibilities of those stakeholders.
Most problems are found in how the data lake is built and how data is ingested.
With regards to the data lake itself, it is not easy to build a platform for big data analysis on Hadoop. For a start, Hadoop is not a single technology. Rather it is a whole boatload of technology components that share an ecosystem. Some components work well with certain types of application but not others – so, it is easy to get into a mess building the data lake.
The problem with data ingestion is understanding the most appropriate data to collect in order to carry out the best analytics, to deliver the insights you are looking for. People sometimes have a tendency to grab everything.
Secondly, big data that is ingested is not always well curated, cleaned or understood. That’s when you end up with a swamp instead of a data lake.
The Dark Reading article goes into a little more detail about these issues and gives excellent tips on utilising the big data analytics tools that help you to avoid the pitfalls that cause the creation of data swamps instead of data lakes.