ABCs of KMIntroduction to knowledge graphs

Introduction to knowledge graphs (section 3.2): Data graphs – Querying

This article is section 3.2 of part 3 of the Introduction to knowledge graphs series of articles.

A number of languages have been proposed for querying graphs, including the SPARQL query language for RDF graphs; and Cypher, Gremlin, and G-CORE1 for querying property graphs. Drawing on Hogan and colleagues’ comprehensive tutorial article2, this second section of the data graphs part of the series describes some common primitives that underlie these languages.

Graph Patterns

A (basic) graph pattern is a graph just like the data graph being queried, but that may also contain variables. Terms in graph patterns are thus divided into constants, such as “Arica” or “venue”, and variables, which we prefix with question marks, such as “?event” or “?rel”. A graph pattern is then evaluated against the data graph by generating mappings from the variables of the graph pattern to constants in the data graph such that the image of the graph pattern under the mapping (replacing variables with the assigned constants) is contained within the data graph.

Figure 5 shows a graph pattern looking for the venues of Food Festivals, along with the mappings generated by the graph pattern against the data graph of Figure 1. In some of the mappings, multiple variables are mapped to the same term, which may or may not be desirable, depending on the application. Hence, a number of semantics have been proposed for evaluating graph patterns, among which the most important are: homomorphism-based semantics, which allows multiple variables to be mapped to the same term such that all mappings shown in Figure 5 would be considered results (this semantics is adopted by SPARQL); and isomorphism-based semantics, which requires variables on nodes and/or edges to be mapped to unique terms, thus excluding the latter three mappings of Figure 5 from the results (this semantics is adopted by Cypher for edge variables).

Graph pattern (left) with mappings generated over the graph of Figure 1 (right)
Figure 5. Graph pattern (left) with mappings generated over the graph of Figure 1 (right) (source: Hogan et al. 2021).
Complex graph pattern (Q) with mappings generated (Q(G)) over the graph of Figure 1 (G)
Figure 6. Complex graph pattern (Q) with mappings generated (Q(G)) over the graph of Figure 1 (G) (source: Hogan et al. 2021).

Complex Graph Patterns

A graph pattern transforms an input graph into a table of results (as shown in Figure 5). A complex graph pattern then allows the tabular results of one or more graph patterns to be transformed using the relational algebra, as supported in query languages such as SQL. Figure 6 shows a complex graph pattern looking for food festivals or drinks festivals not held in Santiago, optionally returning their name and start date (where available). Projected variables are denoted in bold. The complex graph pattern combines the tables of mappings for five basic graph patterns.

Complex graph patterns can give rise to duplicate results; for example, if we project only the variable “?ev” in Figure 5, then “EID16” appears (alone) as a result four times. Query languages typically offer two semantics: bag semantics preserves duplicates according to the multiplicity of the underlying mappings, while set semantics removes duplicates from the results.

Navigational Graph Patterns

A path expression r is a regular expression that can be used in a regular path query (x,r,y), where x and y can be variables or constants, to match paths of arbitrary length. Regular path queries can then be evaluated under a number of different semantics. For example, (Arica, bus*, ?city) evaluated against the graph of Figure 1 may match the following paths:

In fact, since a cycle is present, an infinite number of paths are potentially matched. For this reason, restricted semantics are often applied, returning only the shortest paths, or paths without repeated nodes or edges (as in the case of Cypher). Rather than returning paths, another option is to instead return the (finite) set of pairs of nodes connected by a matching path (as in the case of SPARQL 1.1).

Regular path queries can then be used in graph patterns to express navigational graph patterns, as shown in Figure 7, which illustrates a query searching for food festivals in cities reachable (recursively) from Arica by bus or flight. Combining regular paths queries with complex graph patterns gives rise to complex navigational graph patterns, which are supported by SPARQL 1.1.

Navigational graph pattern (left) with mappings generated over the graph of Figure 1 (right)
Figure 7. Navigational graph pattern (left) with mappings generated over the graph of Figure 1 (right) source: Hogan et al. 2021).

Other Features

Graph query languages may support other features beyond those we have discussed, such as aggregation, complex filters and datatype operators, sub-queries, federated queries, graph updates, entailment regimes, and so on. For more information, refer to the respective query languages.

Next part: (section 3.3): Data graphs – Validation.

Header image source: Crow Intelligence, CC BY-NC-SA 4.0.

References:

  1. Angles, R., Arenas, M., Barceló, P., Boncz, P., Fletcher, G., Gutierrez, C., … & Voigt, H. (2018, May). G-CORE: A core for future graph query languages. In Proceedings of the 2018 International Conference on Management of Data (pp. 1421-1432).
  2. Hogan, A., Blomqvist, E., Cochez, M., d’Amato, C., Melo, G. D., Gutierrez, C., … & Zimmermann, A. (2021). Knowledge graphs. ACM Computing Surveys (CSUR), 54(4), 1-37.
Rate this post

Bruce Boyes

Bruce Boyes (www.bruceboyes.info) is editor, lead writer, and a director of the award-winning RealKM Magazine (www.realkm.com), and a knowledge management (KM), environmental management, and project management professional. He is a PhD candidate in the Knowledge, Technology and Innovation Group at Wageningen University and Research, and holds a Master of Environmental Management with Distinction. His expertise and experience includes knowledge management (KM), environmental management, project management, stakeholder engagement, teaching and training, communications, research, and writing and editing. With a demonstrated ability to identify and implement innovative solutions to social and ecological complexity, Bruce's many career highlights include establishing RealKM Magazine as an award-winning resource, using agile and knowledge management approaches to oversee an award-winning $77.4 million western Sydney river recovery program, leading a knowledge strategy process for Australia's 56 natural resource management (NRM) regional organisations, pioneering collaborative learning and governance approaches to support the sustainable management of landscapes and catchments, and initiating and teaching two new knowledge management subjects at Shanxi University in China.

Related Articles

Back to top button