Automatically finding new cybersecurity threats with Open Source Intelligence

December 12, 2018

Monitoring cybersecurity-related posts in social networks and blogs can give security analysts an edge in discovering new threats against ICT infrastructures, according to Alysson Bessani

According to Lloyd’s, companies lose $400 billion to hackers each year, and, according to Gartner, companies will spend $170 billion on cybersecurity measures in 2020. These losses, and consequent expenditures, come from the growing complexity and ubiquity of new cybersecurity threats.

A substantial fraction of these resources will be spent on research and consultancy about new threat models affecting organisations and the development and acquisition of tools to operate under them. For example, the widespread adoption of wireless networking and interconnected devices, together with the internet of things, are examples of recent developments that have a profound impact on the current cybersecurity threat landscape.

A fundamental aspect of this phenomena is that by 2020, after these resources are spent, probably even more money will be spent to identify, prepare and protect against new cyber threats that will start attracting attention in the next decade. Maybe our attention will be focused on attacks against artificial intelligence or machine learning systems that will be fundamental to our society, or on threats affecting a new generation of blockchains that will be supporting a broad spectrum of near-future critical services.

To deal with this recurrent emergence of new cybersecurity threats in a more cost-efficient way it is paramount to design a new breed of tools capable of finding and consolidating information about new vulnerabilities and attacks against emerging technologies. This information is widely available, in Open Source Intelligence (OSINT – a term used initially by the military intelligence community to denote the plethora of information on the news and other open-access sources) available on the internet in the form of security feeds, blogs, social networks, and the dark web.

There are two significant challenges in acquiring this information automatically. First, selecting the precious information that can give insight about imminent threats in the information deluge of the internet is like finding a needle in a haystack. Finding the sources that could give such information with greater probability is a challenge in itself, and, once they are found, it is still a challenge to filter what is relevant automatically. Second, most of the time this information will not be structured, requiring the use of advanced natural language processing techniques for extracting and structuring the insights security teams are looking for.

These challenges can be addressed by exploiting the recent advances in machine learning for extracting information from big data. The H2020 DiSIEM project is devising a set of tools and services capable of solving these problems. The objective is to extract Indicators of Compromise (IoCs) from OSINT and feed this information as events to security information and event management (SIEM) systems and threat intelligence tools, allowing externally-collected information to be correlated with internal events obtained from the organisation infrastructure.

More specifically, a component called OSINT Threat Detector collects tweets from cybersecurity-related accounts and try to generate early alarms about possible threats affecting the monitored IT infrastructure. Twitter was selected as the primary data source for this tool as it is a kind of hub for the cybersecurity community and software vendors to disseminate alerts and engage in discussions about threats, vulnerabilities, and mitigation measures. By inspecting tweets, researchers have already shown that it is possible to discover vulnerabilities days or even weeks before their publication in reputed security feeds such as NIST’s NVD (National Vulnerability Database), and even finding which vulnerabilities have exploits available.

Besides common data pre-processing and normalisation tasks, the OSINT Threat Detector data processing pipeline (see the figure) uses keywords to narrow the set of tweets coming from the selected accounts. For example, if Windows 2000 is not used in the monitored infrastructure, there is no point in processing tweets about this system. After that, by acting on this subset of data, a binary classifier decides which tweets target the managed infrastructure security, discarding the others. After this step, clustering analysis is conducted over a time-based sliding window to find related events and compute distributions of keywords over time.

The resulting information allows grouping events that are likely to be related to the same security issue or, more importantly, events that establish a connection between distinct security issues. In the end, the generated clusters are analysed, and IoCs are generated, to be processed either by the SIEM system or a threat intelligence tool like MISP.

This pipeline employs several machine learning algorithms: a supervised binary classifier is used for selecting relevant tweets, an unsupervised on-line stream clustering algorithm is employed to aggregate related information, and a supervised name-entity recogniser is necessary for extracting structured information from text (e.g., what vulnerability a tweet is mentioning).

Besides the OTD, the project is also designing a platform for searching and analysing OSINT data in Twitter, blogs, and even in the dark web, by exploiting the DigitalMR Listening247 platform. Finally, a third OSINT processing component is responsible for enriching the generated IoCs with a threat score and correlating them with the alarms generated by the SIEM system. Together, these tools provide a unique set of cybersecurity OSINT processing solutions that go from alarm generation based on timely microblog posts to correlation with internal alarms from the SIEM and to further analysis of threats in a unified platform accessing a wide variety of OSINT.

For more information about DiSIEM OSINT processing solutions and other SIEM enhancements being proposed on the project, please see our website: http://disiem-project.eu

DiSIEM is supported by the European Commission through the H2020 programme under grant agreement 700692. The project consortium is composed by seven partners: FCiências.ID, City University of London, EDP, Amadeus, DigitalMR, Fraunhofer IAIS, ATOS.

Please note: this is a commercial profile

Alysson Bessani

Associate Professor

Faculdade de Ciências Universidade de Lisboa

Tel: +351 217500394

anbessani@ciencias.ulisboa.pt