ABCs of KMIntroduction to knowledge graphs

Introduction to knowledge graphs (section 3.2): Data graphs – Querying

This article is section 3.2 of part 3 of the Introduction to knowledge graphs series of articles. Recent research has identified the development of knowledge graphs as an important aspect of artificial intelligence (AI) in knowledge management (KM).

A number of languages have been proposed for querying graphs, including the SPARQL query language for RDF graphs; and Cypher, Gremlin, and G-CORE1 for querying property graphs. Drawing on Hogan and colleagues’ comprehensive tutorial article2, this second section of the data graphs part of the series describes some common primitives that underlie these languages.

Graph Patterns

A (basic) graph pattern is a graph just like the data graph being queried, but that may also contain variables. Terms in graph patterns are thus divided into constants, such as “Arica” or “venue”, and variables, which we prefix with question marks, such as “?event” or “?rel”. A graph pattern is then evaluated against the data graph by generating mappings from the variables of the graph pattern to constants in the data graph such that the image of the graph pattern under the mapping (replacing variables with the assigned constants) is contained within the data graph.

Figure 5 shows a graph pattern looking for the venues of Food Festivals, along with the mappings generated by the graph pattern against the data graph of Figure 1. In some of the mappings, multiple variables are mapped to the same term, which may or may not be desirable, depending on the application. Hence, a number of semantics have been proposed for evaluating graph patterns, among which the most important are: homomorphism-based semantics, which allows multiple variables to be mapped to the same term such that all mappings shown in Figure 5 would be considered results (this semantics is adopted by SPARQL); and isomorphism-based semantics, which requires variables on nodes and/or edges to be mapped to unique terms, thus excluding the latter three mappings of Figure 5 from the results (this semantics is adopted by Cypher for edge variables).

Graph pattern (left) with mappings generated over the graph of Figure 1 (right)
Figure 5. Graph pattern (left) with mappings generated over the graph of Figure 1 (right) (source: Hogan et al. 2021).
Complex graph pattern (Q) with mappings generated (Q(G)) over the graph of Figure 1 (G)
Figure 6. Complex graph pattern (Q) with mappings generated (Q(G)) over the graph of Figure 1 (G) (source: Hogan et al. 2021).

Complex Graph Patterns

A graph pattern transforms an input graph into a table of results (as shown in Figure 5). A complex graph pattern then allows the tabular results of one or more graph patterns to be transformed using the relational algebra, as supported in query languages such as SQL. Figure 6 shows a complex graph pattern looking for food festivals or drinks festivals not held in Santiago, optionally returning their name and start date (where available). Projected variables are denoted in bold. The complex graph pattern combines the tables of mappings for five basic graph patterns.

Complex graph patterns can give rise to duplicate results; for example, if we project only the variable “?ev” in Figure 5, then “EID16” appears (alone) as a result four times. Query languages typically offer two semantics: bag semantics preserves duplicates according to the multiplicity of the underlying mappings, while set semantics removes duplicates from the results.

Navigational Graph Patterns

A path expression r is a regular expression that can be used in a regular path query (x,r,y), where x and y can be variables or constants, to match paths of arbitrary length. Regular path queries can then be evaluated under a number of different semantics. For example, (Arica, bus*, ?city) evaluated against the graph of Figure 1 may match the following paths:

In fact, since a cycle is present, an infinite number of paths are potentially matched. For this reason, restricted semantics are often applied, returning only the shortest paths, or paths without repeated nodes or edges (as in the case of Cypher). Rather than returning paths, another option is to instead return the (finite) set of pairs of nodes connected by a matching path (as in the case of SPARQL 1.1).

Regular path queries can then be used in graph patterns to express navigational graph patterns, as shown in Figure 7, which illustrates a query searching for food festivals in cities reachable (recursively) from Arica by bus or flight. Combining regular paths queries with complex graph patterns gives rise to complex navigational graph patterns, which are supported by SPARQL 1.1.

Navigational graph pattern (left) with mappings generated over the graph of Figure 1 (right)
Figure 7. Navigational graph pattern (left) with mappings generated over the graph of Figure 1 (right) source: Hogan et al. 2021).

Other Features

Graph query languages may support other features beyond those we have discussed, such as aggregation, complex filters and datatype operators, sub-queries, federated queries, graph updates, entailment regimes, and so on. For more information, refer to the respective query languages.

Next part: (section 3.3): Data graphs – Validation.

Header image source: Crow Intelligence, CC BY-NC-SA 4.0.

References:

  1. Angles, R., Arenas, M., Barceló, P., Boncz, P., Fletcher, G., Gutierrez, C., … & Voigt, H. (2018, May). G-CORE: A core for future graph query languages. In Proceedings of the 2018 International Conference on Management of Data (pp. 1421-1432).
  2. Hogan, A., Blomqvist, E., Cochez, M., d’Amato, C., Melo, G. D., Gutierrez, C., … & Zimmermann, A. (2021). Knowledge graphs. ACM Computing Surveys (CSUR), 54(4), 1-37.
Rate this post

Bruce Boyes

Bruce Boyes is a knowledge management (KM), environmental management, and education thought leader with more than 40 years of experience. As editor and lead writer of the award-winning RealKM Magazine, he has personally written more than 500 articles and published more than 2,000 articles overall, resulting in more than 2 million reader views. With a demonstrated ability to identify and implement innovative solutions to social and ecological complexity, Bruce has successfully completed more than 40 programs, projects, and initiatives including leading complex major programs. His many other career highlights include: leading the KM community KM and Sustainable Development Goals (SDGs) initiative, using agile approaches to oversee the on time and under budget implementation of an award-winning $77.4 million recovery program for one of Australia's most iconic river systems, leading a knowledge strategy process for Australia’s 56 natural resource management (NRM) regional organisations, pioneering collaborative learning and governance approaches to empower communities to sustainably manage landscapes and catchments in the face of complexity, being one of the first to join a new landmark aviation complexity initiative, initiating and teaching two new knowledge management subjects at Shanxi University in China, and writing numerous notable environmental strategies, reports, and other works. Bruce is currently a PhD candidate in the Knowledge, Technology and Innovation Group at Wageningen University and Research, and holds a Master of Environmental Management with Distinction and a Certificate of Technology (Electronics). As well as his work for RealKM Magazine, Bruce currently also teaches in the Beijing Foreign Studies University (BFSU) Certified High-school Pathway (CHP) program in Baotou, Inner Mongolia, China.

Related Articles

Back to top button