Introduction to knowledge graphs (section 3.2): Data graphs – Querying
This article is section 3.2 of part 3 of the Introduction to knowledge graphs series of articles. Recent research has identified the development of knowledge graphs as an important aspect of artificial intelligence (AI) in knowledge management (KM).
A number of languages have been proposed for querying graphs, including the SPARQL query language for RDF graphs; and Cypher, Gremlin, and G-CORE1 for querying property graphs. Drawing on Hogan and colleagues’ comprehensive tutorial article2, this second section of the data graphs part of the series describes some common primitives that underlie these languages.
Graph Patterns
A (basic) graph pattern is a graph just like the data graph being queried, but that may also contain variables. Terms in graph patterns are thus divided into constants, such as “Arica” or “venue”, and variables, which we prefix with question marks, such as “?event” or “?rel”. A graph pattern is then evaluated against the data graph by generating mappings from the variables of the graph pattern to constants in the data graph such that the image of the graph pattern under the mapping (replacing variables with the assigned constants) is contained within the data graph.
Figure 5 shows a graph pattern looking for the venues of Food Festivals, along with the mappings generated by the graph pattern against the data graph of Figure 1. In some of the mappings, multiple variables are mapped to the same term, which may or may not be desirable, depending on the application. Hence, a number of semantics have been proposed for evaluating graph patterns, among which the most important are: homomorphism-based semantics, which allows multiple variables to be mapped to the same term such that all mappings shown in Figure 5 would be considered results (this semantics is adopted by SPARQL); and isomorphism-based semantics, which requires variables on nodes and/or edges to be mapped to unique terms, thus excluding the latter three mappings of Figure 5 from the results (this semantics is adopted by Cypher for edge variables).
Complex Graph Patterns
A graph pattern transforms an input graph into a table of results (as shown in Figure 5). A complex graph pattern then allows the tabular results of one or more graph patterns to be transformed using the relational algebra, as supported in query languages such as SQL. Figure 6 shows a complex graph pattern looking for food festivals or drinks festivals not held in Santiago, optionally returning their name and start date (where available). Projected variables are denoted in bold. The complex graph pattern combines the tables of mappings for five basic graph patterns.
Complex graph patterns can give rise to duplicate results; for example, if we project only the variable “?ev” in Figure 5, then “EID16” appears (alone) as a result four times. Query languages typically offer two semantics: bag semantics preserves duplicates according to the multiplicity of the underlying mappings, while set semantics removes duplicates from the results.
Navigational Graph Patterns
A path expression r is a regular expression that can be used in a regular path query (x,r,y), where x and y can be variables or constants, to match paths of arbitrary length. Regular path queries can then be evaluated under a number of different semantics. For example, (Arica, bus*, ?city) evaluated against the graph of Figure 1 may match the following paths:
In fact, since a cycle is present, an infinite number of paths are potentially matched. For this reason, restricted semantics are often applied, returning only the shortest paths, or paths without repeated nodes or edges (as in the case of Cypher). Rather than returning paths, another option is to instead return the (finite) set of pairs of nodes connected by a matching path (as in the case of SPARQL 1.1).
Regular path queries can then be used in graph patterns to express navigational graph patterns, as shown in Figure 7, which illustrates a query searching for food festivals in cities reachable (recursively) from Arica by bus or flight. Combining regular paths queries with complex graph patterns gives rise to complex navigational graph patterns, which are supported by SPARQL 1.1.
Other Features
Graph query languages may support other features beyond those we have discussed, such as aggregation, complex filters and datatype operators, sub-queries, federated queries, graph updates, entailment regimes, and so on. For more information, refer to the respective query languages.
Next part: (section 3.3): Data graphs – Validation.
Header image source: Crow Intelligence, CC BY-NC-SA 4.0.
References:
- Angles, R., Arenas, M., Barceló, P., Boncz, P., Fletcher, G., Gutierrez, C., … & Voigt, H. (2018, May). G-CORE: A core for future graph query languages. In Proceedings of the 2018 International Conference on Management of Data (pp. 1421-1432). ↩
- Hogan, A., Blomqvist, E., Cochez, M., d’Amato, C., Melo, G. D., Gutierrez, C., … & Zimmermann, A. (2021). Knowledge graphs. ACM Computing Surveys (CSUR), 54(4), 1-37. ↩