ABCs of KM Introduction to knowledge graphs

Introduction to knowledge graphs (section 3.2): Data graphs – Querying

Bruce Boyes1 Apr 2023

918 3 minutes read

This article is section 3.2 of part 3 of the Introduction to knowledge graphs series of articles. Recent research has identified the development of knowledge graphs as an important aspect of artificial intelligence (AI) in knowledge management (KM).

A number of languages have been proposed for querying graphs, including the SPARQL query language for RDF graphs; and Cypher, Gremlin, and G-CORE¹ for querying property graphs. Drawing on Hogan and colleagues’ comprehensive tutorial article², this second section of the data graphs part of the series describes some common primitives that underlie these languages.

Graph Patterns

A (basic) graph pattern is a graph just like the data graph being queried, but that may also contain variables. Terms in graph patterns are thus divided into constants, such as “Arica” or “venue”, and variables, which we prefix with question marks, such as “?event” or “?rel”. A graph pattern is then evaluated against the data graph by generating mappings from the variables of the graph pattern to constants in the data graph such that the image of the graph pattern under the mapping (replacing variables with the assigned constants) is contained within the data graph.

Figure 5 shows a graph pattern looking for the venues of Food Festivals, along with the mappings generated by the graph pattern against the data graph of Figure 1. In some of the mappings, multiple variables are mapped to the same term, which may or may not be desirable, depending on the application. Hence, a number of semantics have been proposed for evaluating graph patterns, among which the most important are: homomorphism-based semantics, which allows multiple variables to be mapped to the same term such that all mappings shown in Figure 5 would be considered results (this semantics is adopted by SPARQL); and isomorphism-based semantics, which requires variables on nodes and/or edges to be mapped to unique terms, thus excluding the latter three mappings of Figure 5 from the results (this semantics is adopted by Cypher for edge variables).

Figure 5. Graph pattern (left) with mappings generated over the graph of Figure 1 (right) (source: Hogan et al. 2021).

Figure 6. Complex graph pattern (Q) with mappings generated (Q(G)) over the graph of Figure 1 (G) (source: Hogan et al. 2021).

Complex Graph Patterns

A graph pattern transforms an input graph into a table of results (as shown in Figure 5). A complex graph pattern then allows the tabular results of one or more graph patterns to be transformed using the relational algebra, as supported in query languages such as SQL. Figure 6 shows a complex graph pattern looking for food festivals or drinks festivals not held in Santiago, optionally returning their name and start date (where available). Projected variables are denoted in bold. The complex graph pattern combines the tables of mappings for five basic graph patterns.

Complex graph patterns can give rise to duplicate results; for example, if we project only the variable “?ev” in Figure 5, then “EID16” appears (alone) as a result four times. Query languages typically offer two semantics: bag semantics preserves duplicates according to the multiplicity of the underlying mappings, while set semantics removes duplicates from the results.

Navigational Graph Patterns

A path expression r is a regular expression that can be used in a regular path query (x,r,y), where x and y can be variables or constants, to match paths of arbitrary length. Regular path queries can then be evaluated under a number of different semantics. For example, (Arica, bus*, ?city) evaluated against the graph of Figure 1 may match the following paths:

In fact, since a cycle is present, an infinite number of paths are potentially matched. For this reason, restricted semantics are often applied, returning only the shortest paths, or paths without repeated nodes or edges (as in the case of Cypher). Rather than returning paths, another option is to instead return the (finite) set of pairs of nodes connected by a matching path (as in the case of SPARQL 1.1).

Regular path queries can then be used in graph patterns to express navigational graph patterns, as shown in Figure 7, which illustrates a query searching for food festivals in cities reachable (recursively) from Arica by bus or flight. Combining regular paths queries with complex graph patterns gives rise to complex navigational graph patterns, which are supported by SPARQL 1.1.

Figure 7. Navigational graph pattern (left) with mappings generated over the graph of Figure 1 (right) source: Hogan et al. 2021).

Other Features

Graph query languages may support other features beyond those we have discussed, such as aggregation, complex filters and datatype operators, sub-queries, federated queries, graph updates, entailment regimes, and so on. For more information, refer to the respective query languages.

Next part: (section 3.3): Data graphs – Validation.

Header image source: Crow Intelligence, CC BY-NC-SA 4.0.

References:

Angles, R., Arenas, M., Barceló, P., Boncz, P., Fletcher, G., Gutierrez, C., … & Voigt, H. (2018, May). G-CORE: A core for future graph query languages. In Proceedings of the 2018 International Conference on Management of Data (pp. 1421-1432). ↩
Hogan, A., Blomqvist, E., Cochez, M., d’Amato, C., Melo, G. D., Gutierrez, C., … & Zimmermann, A. (2021). Knowledge graphs. ACM Computing Surveys (CSUR), 54(4), 1-37. ↩

Rate this post

Cookie	Duration	Description
cookielawinfo-checkbox-advertisement	1 year	Set by the GDPR Cookie Consent plugin, this cookie is used to record the user consent for the cookies in the "Advertisement" category .
cookielawinfo-checkbox-analytics	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Analytics".
cookielawinfo-checkbox-functional	11 months	The cookie is set by GDPR cookie consent to record the user consent for the cookies in the category "Functional".
cookielawinfo-checkbox-necessary	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookies is used to store the user consent for the cookies in the category "Necessary".
cookielawinfo-checkbox-others	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Other.
cookielawinfo-checkbox-performance	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Performance".
viewed_cookie_policy	11 months	The cookie is set by the GDPR Cookie Consent plugin and is used to store whether or not user has consented to the use of cookies. It does not store any personal data.

Cookie	Duration	Description
_ga	2 years	The _ga cookie, installed by Google Analytics, calculates visitor, session and campaign data and also keeps track of site usage for the site's analytics report. The cookie stores information anonymously and assigns a randomly generated number to recognize unique visitors.
_gid	1 day	Installed by Google Analytics, _gid cookie stores information on how visitors use a website, while also creating an analytics report of the website's performance. Some of the data that are collected include the number of visitors, their source, and the pages they visit anonymously.

Graph Patterns

Complex Graph Patterns

Navigational Graph Patterns

Other Features

Bruce Boyes

Related Articles

KMGN Research Community: From Concept Maps to Knowledge Graphs with Barbara Fillip, Ph.D., PMP

Business is waking up to the idea of deep learning

Can robots write? Machine learning produces dazzling results, but some assembly is still required

Madhumita Murgia: “AI can do harm when people don’t have a voice”