ABCs of KMIntroduction to knowledge graphs

Introduction to knowledge graphs (section 3.3): Data graphs – Validation

This article is section 3.3 of part 3 of the Introduction to knowledge graphs series of articles.

While graphs offer a flexible representation for diverse, incomplete data at large-scale, we may wish to validate that our data graph follows a particular structure or is in some sense “complete.” In Figure 1, for example, we may wish to ensure that all events have at least a name, venue, start and end date, such that applications using the data – e.g., one notifying users of events – have the minimal information required.

Drawing on Hogan and colleagues’ comprehensive tutorial article1, this third section of the data graphs part of the series describes one mechanism to facilitate such validation – shapes graphs.

Shapes Graphs

A shape targets a set of nodes in a data graph and specifies constraints on those nodes. The shape’s target can be specified manually, using a query, and so on. A shapes graph is then formed from a set of interrelated shapes. Figure 8 illustrates an example of a shapes graph based on Figure 1, defining constraints on four interrelated shapes. Each shape – denoted with a box such as “Place” , “Event”, and so on – is associated with a set of constraints. Nodes conform to a shape if and only if they satisfy all constraints defined on the shape.

Example shapes graph.
Figure 8. Example shapes graph (source: Hogan et al. 2021).

Another option is to place constraints on the number of nodes conforming to a particular shape that the conforming node can relate to with an edge-label. For example,

denotes that conforming nodes for “Event” must link to at least one node that conforms to the “Venue”
shape with the edge label venue.

Boolean combinations of shapes can be defined using conjunction (and), disjunction (or), and negation (not); for example, we may say that all the values of venue should conform to the shape

making explicit that venues in the data should not be directly given as cities.

When declaring shapes, the data modeller may not know in advance the entire set of properties that some nodes can have. An open shape allows the node to have additional properties not specified by the shape, while a closed shape does not. For example, if we add the edge

to the graph represented in Figure 1, then “Santiago” only conforms to the “City” shape if that shape is defined as open (since the shape does not mention founder).

Conformance

A node conforms to a shape if it satisfies all of the constraints of the shape. The conformance of a node to a shape may depend on the conformance of other nodes to other shapes; for example, the node “EID15” conforms to the “Event” shape not only based on its local properties, but also based on conformance of “Santa Lucía” to “Venue” and “Santiago” to “City.”

Conformance dependencies may also be recursive, where the conformance of “Santiago” to “City” requires that it conform to “Place,” which requires that “Viña del Mar” and “Arica” conform to “Place,” and so on. Conversely, “EID16” does not conform to “Event,” as it does not have the start and end properties required by the shapes graph.

A graph is valid with respect to a shapes graph (and its targets) if and only if every node that each shape targets conforms to that shape; for example, if “Event” targets “EID15” and “EID16,” then the graph of Figure 1 will not be valid with respect to the shapes graph of Figure 8 (“EID16” does not conform to “Event”), whereas if “Event” targets “EID15” only, and no other target is defined, then the graph is valid.

Other Features

Two shapes languages with such features have been proposed for RDF graphs: ShEx (Shape Expressions) and SHACL (Shapes Constraint Language). These languages also support additional features; for example, SHACL supports constraints expressed using graph queries in the SPARQL language. Similar ideas have been proposed for property graphs.

Next part: (section 3.4): Data graphs – Context.

Header image source: Crow Intelligence, CC BY-NC-SA 4.0.

References:

  1. Hogan, A., Blomqvist, E., Cochez, M., d’Amato, C., Melo, G. D., Gutierrez, C., … & Zimmermann, A. (2021). Knowledge graphs. ACM Computing Surveys (CSUR), 54(4), 1-37.
Rate this post

Bruce Boyes

Bruce Boyes (www.bruceboyes.info) is editor, lead writer, and a director of the award-winning RealKM Magazine (www.realkm.com), and a knowledge management (KM), environmental management, and project management professional. He is a PhD candidate in the Knowledge, Technology and Innovation Group at Wageningen University and Research, and holds a Master of Environmental Management with Distinction. His expertise and experience includes knowledge management (KM), environmental management, project management, stakeholder engagement, teaching and training, communications, research, and writing and editing. With a demonstrated ability to identify and implement innovative solutions to social and ecological complexity, Bruce's many career highlights include establishing RealKM Magazine as an award-winning resource, using agile and knowledge management approaches to oversee an award-winning $77.4 million western Sydney river recovery program, leading a knowledge strategy process for Australia's 56 natural resource management (NRM) regional organisations, pioneering collaborative learning and governance approaches to support the sustainable management of landscapes and catchments, and initiating and teaching two new knowledge management subjects at Shanxi University in China.

Related Articles

Back to top button