Why knowledge graphs are attuned to the complexity of today’s world and thus indispensable.
The potential for cybersecurity knowledge graphs is evident in their ability to aggregate and represent knowledge about cyber threats, as well as their ability to manage and reason with that knowledge. – Liu K., author on https://www.mdpi.com/
In 2012, I was invited to talk about the Internet of Things at an invitation-only event, run by the GFF (Global Futures Forum) and the Italian “Intelligence community”, taking place in Rome that September. The session was entitled “Transformational Technologies #4: Implications for an Expanding Threat Environment”. In the afternoon, participants – an impressive mix of senior intelligence, police and military, SOCA, CIA, MI6, Homeland Security – broke out in five groups, each tasked with identifying major threats with their unfolding scenarios spanning the coming decade. The groups came back with different scenarios with five topping the list: one was focused on a military conflict, two were about biological disasters produced by DIY bio labs. The revelation came with the focus of the remaining two: a total breakdown of society triggered by the failure of existing institutions to manage the Digital Transformation.
Inside the room, Stuxnet (the name itself did not emerge) was the talk of the day. Outside, most had no idea. Today, the possibility of digitally triggered destruction of critical infrastructure is conjured up in the mainstream news, but that is hardly the whole picture. Societal collapse occurs when trust in key societal institutions suddenly fades after years of slow erosion. This final phase is usually triggered by a single action, a seemingly small act or an insignificant step in a long chain of similar steps that nevertheless lead to the final exposure. At this point, everyone realises that the central institutions no longer function to maintain the common purpose of a society. And one thing is now certain: cybersecurity is no longer only invoked or needed for digital processes, it affects industrial and everyday processes and activities, and thus also trust in everyday life.
The convergence of cloud, data platforms and connected objects was from the start bound to lead to a situation in which every person, object and machine would have a virtual representation – a digital twin – either stored in a cloud (owned most likely by a company) or at the edge (in a device most likely a smartphone running on either Apple or Android.) What is unexpected is how fast the lack of political agency and technical capabilities in the West is accelerating the end of the model of entrepreneurial government in favor of a new paradigm: Cyber-physical Systems.
Cyber-physical systems generally, according to Liu K., “combine sensor networks with embedded computing to monitor and control the physical environment, with feedback loops that allows this external stimulus to self-activate either communication, control, or computing.”[1] It is clear that the various types of heterogeneous data, deeply embedded relations between entities and the integration of human interaction and feedback loops combined with autonomous or semi-autonomous feedback and control loops, need a holistic approach to cybersecurity.
In their text Recent Progress of Using Knowledge Graph for Cybersecurity[2] Liu K et all, used the following keywords in a period between 2004 and 2022 to find 113 relevant publications:
cybersecurity knowledge graph, cybersecurity knowledge representation, cybersecurity ontology, threat intelligence extraction, cybersecurity information extraction, cybersecurity knowledge graph application, graph-based analytics, and association analysis.
In a case study on the transformation of data into information3, there are interesting details about the famous case of the explosion of the Space Shuttle Challenger in 1968. There was a direct connection between the temperature and potential damage to the rocket O-rings., which caused the crash.
The temperature was too low and resulted in failure yet in the original visualization of the data, this information was not visible, it was hidden in the figures. In his book “Visual Explanations”, Edward Tufte (1997) provides not only a detailed account of the incident, but showed with new visualization graphs that there was a direct correlation between the low temperature and failed launches that was immediately visible and would have had a strong impact on the decision to launch that day.
They find a growing field of research and practice interest defining a cybersecurity knowledge graph as a database used to organize, analyze, and discover relationships between different cybersecurity concepts. Graph-based reasoning provides a flexible framework for organizing, analyzing, and reasoning about cybersecurity data. Here is an example of what kind of data is involved:
In: Liu, K.; Wang, F.; Ding, Z.; Liang, S.; Yu, Z.; Zhou, Y. Recent Progress of Using Knowledge Graph for Cybersecurity. Electronics 2022, 11, 2287.
I asked „ChatGPT“ about this and the AI answered:
“Here are some methods that can be used to create a cybersecurity knowledge graph:
1 Data extraction: The first step is to identify relevant sources of data, such as academic papers, reports, and threat intelligence feeds. Then, data extraction techniques like web scraping, natural language processing, and data mining can be used to extract relevant information from these sources.
2 Data normalization: Once the data is extracted, it needs to be normalized to ensure that similar concepts are represented in a consistent manner. This can be achieved by using standardized vocabularies, taxonomies, and ontologies.
3 Entity recognition: Entity recognition is the process of identifying entities (e.g., people, organizations, locations) mentioned in the data. This can be done using machine learning techniques such as named entity recognition.
4 Relationship extraction: Relationship extraction involves identifying relationships between entities in the data. This can be done using techniques such as dependency parsing, semantic role labeling, and knowledge graph embedding.
5 Knowledge graph construction: Once the entities and relationships are identified, they can be used to construct the knowledge graph. This involves representing the concepts and relationships in a graph database or ontology.
6 Graph analysis: The knowledge graph can then be analyzed using graph analysis techniques such as centrality analysis, community detection, and clustering. These techniques can be used to identify key concepts, relationships, and patterns in the data.
7 Visualization: Finally, the knowledge graph can be visualized using tools such as Gephi, Cytoscape, and Neo4j. Visualization can help to reveal insights and facilitate communication of the knowledge contained in the graph.”
Since “neither the number of available people nor the required skills can be increased overnight, companies must increase the development of technologies for modeling experts’ knowledge and experience. The integration of automation, intelligent technology, and attack defense technology has become one of the inevitable trends in the development of cybersecurity technology.” (Liu, 2022) The relatively simple tasks, ‘such as feature extraction, anomaly detection, and data classification, can no longer restore the full picture of attack behavior. Expert knowledge hidden in cybersecurity data is still a very important breakthrough in solving the above problems”.
“Therefore, the current issue in cybersecurity analysis is not a shortage of available data but rather how to combine non-homogenous information from various sources into a single model in order to better understand the cybersecurity situation as well as provide auxiliary information for decision-making.”
The advantages of the knowledge graphs (KG) are thus threefold:
- KGs extract and integrate existing knowledge from multi-source heterogeneous data
- KGs describe cybersecurity knowledge structurally and relationally, as well as visualize it in a graphical format
- KGs use semantic modeling, query, and reasoning technologies and so can imitate the thinking process of security specialists that aim to derive new knowledge
“A cybersecurity knowledge graph (CSKG), as a specific knowledge graph (KG) in the security area, is made up of nodes and edges that constitute a large-scale security semantic network, providing an intuitive modeling method for various attacks and defense scenarios in the real security world. The entities or abstract concepts (e.g., vulnerability name, attack pattern, product name, vendor) could be represented by nodes. The attributes or the relationships among entities are represented by edges. The nodes and the edges together form a KG”. – In: Liu, K.; Wang, F.; Ding, Z.; Liang, S.; Yu, Z.; Zhou, Y. Recent Progress of Using Knowledge Graph for Cybersecurity. Electronics 2022, 11, 2287.
The above graph shows how rich the KG approach is in both taking in a plethora of vastly different data sources and data formats, ranging from fishing out malicious, fake and synthetic data. Synthetic data is AI generated data that cannot be used as input for real world data without very strict guidelines. It is able to take in OT, Operational Technology that is monitoring industrial equipment directly, malware analysis and vulnerability management as the basis for attack investigation, threat discovery and security alerts.
One of its key potential outputs is situation awareness. Situational awareness is a 360 degree scan of anything and everything in the immediate environment including the risk any product or process in that environment might pose. Our company asvin, which is massively committed to supply chain security, aims to create a common situational awareness among all actors in the supply chain so that all partners have the same perspective and risk awareness.
—
[1] Lee, EA 2015, ‚The Past, Present and Future of Cyber-Physical Systems: A Focus on Models‘, Sensors, vol. 15, no. 3, pp. 4837-4869.
[2] Liu, K.; Wang, F.; Ding, Z.; Liang, S.; Yu, Z.; Zhou, Y. Recent Progress of Using Knowledge Graph for Cybersecurity. Electronics 2022, 11, 2287. https://doi.org/10.3390/electronics11152287
3 MSE 6574, Sec. 11: Programming for Analytics (Fall 2019)
George Washington University | School of Engineering & Applied Science
https://p4a.seas.gwu.edu/2019-Fall/challengerExplosion.html