ABCs of KM Featured Stories Introduction to knowledge graphs

Introduction to knowledge graphs (section 5.1): Inductive knowledge – Graph analytics

Bruce Boyes12 Jun 2023

827 4 minutes read

This article is section 5.1 of part 5 of the Introduction to knowledge graphs series of articles. Recent research has identified the development of knowledge graphs as an important aspect of artificial intelligence (AI) in knowledge management (KM).

Graph analytics is the application of analytical algorithms to (often large) graphs. Such algorithms often analyse the topology of the graph, i.e., how nodes and groups thereof are connected. In this section of their comprehensive tutorial article¹, Hogan and colleagues provide a brief overview of some popular graph algorithms applicable to knowledge graphs and then discuss graph processing frameworks on which such algorithms can be implemented.

Graph Algorithms

A wide variety of graph algorithms can be applied for analytical purposes. Hogan and colleagues briefly introduce five categories of algorithms that are often used in practice.

First, centrality analysis aims to identify the most important (“central”) nodes or edges of a graph. Specific node centrality measures include degree, betweenness, closeness, Eigenvector, PageRank, HITS, Katz, among others. Betweenness centrality can also be applied to edges. A node centrality measure would allow, e.g., to predict the busiest transport hubs in Figure 15, while edge centrality would allow us to find the edges on which many shortest routes depend for predicting traffic.

Second, community detection aims to identify sub-graphs that are more densely connected internally than to the rest of the graph (a.k.a. communities). Community detection algorithms include minimum-cut algorithms, label propagation, Louvain modularity, and so on. Community detection applied to Figure 15 may, for example, detect a community to the left (referring to the north of Chile), to the right (referring to the south of Chile), and perhaps also the centre (referring to cities with airports).

Third, connectivity analysis aims to estimate how well-connected and resilient the graph is. Specific techniques include measuring graph density or k-connectivity, detecting strongly connected components and weakly connected components, computing spanning trees or minimum cuts, and so on. In the context of Figure 15, such analysis may tell us that routes to and are the most “brittle,” becoming disconnected from the main hubs if one of two bus routes fail.

Fourth, node similarity aims to find nodes that are similar to other nodes by virtue of how they are connected within their neighbourhood. Node similarity metrics may be computed using structural equivalence, random walks, diffusion kernels, and so on. These methods provide an understanding of what connects nodes and in what ways they are similar. In Figure 15, such analysis may tell us that and are similar nodes, as both have return flights to and return buses to

Fifth, graph summarisation aims to extract high-level structures from a graph, often based on quotient graphs, where nodes in the input graph are merged while preserving the edges between the input nodes. Such methods help to provide an overview for a large-scale graph. Figure 16 provides an example of a quotient graph that summarises the graph of Figure 15, such that if there is an edge in the input graph, then there is an edge in the quotient graph with s ∈ S and o ∈ O. In this case, quotient nodes are defined in terms of outgoing edge-labels, where we may detect that they represent islands, cities, and towns/attractions, respectively, from left to right.

Many such techniques have been proposed and studied for simple graphs or directed graphs without edge labels. An open research challenge in the context of knowledge graphs is to adapt such algorithms for graph models such as del graphs, heterogeneous graphs and property graphs.

Data graph representing transport routes in Chile. — Figure 15. Data graph representing transport routes in Chile (source: Hogan et al. 2021).

Example quotient graph summarising the data graph in Figure 15. — Figure 16. Example quotient graph summarising the data graph in Figure 15 (source: Hogan et al. 2021).

Graph Processing Frameworks

Various graph parallel frameworks have been proposed for large-scale graph processing, including Apache Spark (GraphX), GraphLab², Pregel³, Signal/Collect, Shark⁴, and so on. Computation in these frameworks is iterative, where in each iteration, each node reads messages received through inward edges (and possibly its own previous state), performs a computation, and then sends messages through outward edges using the result.

To illustrate, assume we wish to compute the places that are most (or least) easily reached in the graph of Figure 15. A good way to measure this is using centrality, where we choose PageRank⁵, which computes the probability of a tourist that randomly follows the routes shown in the graph being at a particular place after a given number of “hops.” We can implement PageRank on large graphs using a graph parallel framework. In Figure 17, we provide an example of an iteration of PageRank for a sub-graph of Figure 15. The nodes are initialised with a score of A tourist is assumed to have an equal chance of starting at any point. In the message phase (MSG), each node v passes a score of on each of its outgoing edges, where we denote by d a “damping factor” (typically d = 0.85) used to ensure convergence, by R_i(v) the score of node v in iteration i (the probability of the tourist being at node v after i hops), and by |E(v)| the number of outgoing edges of v. The aggregation phase (AGG) for v then sums all incoming messages along with its share of the damping factor to compute We then proceed to the message phase of the next iteration, continuing until some termination criterion is reached and results are output.

While the given example is for PageRank, this abstraction is general enough to support a wide variety of (though not all) graph algorithms. An algorithm in this framework consists of the functions to compute message values (Msg) and to accumulate messages (Agg). The framework will then take care of distribution, message passing, fault tolerance, and so on.

Example of a graph-parallel iteration of PageRank for a sample sub-graph of Figure 15. — Figure 17. Example of a graph-parallel iteration of PageRank for a sample sub-graph of Figure 15 (source: Hogan et al. 2021).

Next part: (section 5.2): Inductive knowledge – Knowledge graph embeddings.

Header image source: Crow Intelligence, CC BY-NC-SA 4.0.

References:

Hogan, A., Blomqvist, E., Cochez, M., d’Amato, C., Melo, G. D., Gutierrez, C., … & Zimmermann, A. (2021). Knowledge graphs. ACM Computing Surveys (CSUR), 54(4), 1-37. ↩
Wikipedia, CC BY-SA 4.0. ↩
Malewicz, G., Austern, M. H., Bik, A. J., Dehnert, J. C., Horn, I., Leiser, N., & Czajkowski, G. (2010, June). Pregel: a system for large-scale graph processing. In Proceedings of the 2010 ACM SIGMOD International Conference on Management of data (pp. 135-146). ↩
Xin, R. S., Rosen, J., Zaharia, M., Franklin, M. J., Shenker, S., & Stoica, I. (2013, June). Shark: SQL and rich analytics at scale. In Proceedings of the 2013 ACM SIGMOD International Conference on Management of data (pp. 13-24). ↩
Page, L., Brin, S., Motwani, R., & Winograd, T. (1999). The PageRank citation ranking: Bringing order to the web. Stanford Infolab. ↩

Rate this post

Cookie	Duration	Description
cookielawinfo-checkbox-advertisement	1 year	Set by the GDPR Cookie Consent plugin, this cookie is used to record the user consent for the cookies in the "Advertisement" category .
cookielawinfo-checkbox-analytics	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Analytics".
cookielawinfo-checkbox-functional	11 months	The cookie is set by GDPR cookie consent to record the user consent for the cookies in the category "Functional".
cookielawinfo-checkbox-necessary	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookies is used to store the user consent for the cookies in the category "Necessary".
cookielawinfo-checkbox-others	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Other.
cookielawinfo-checkbox-performance	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Performance".
viewed_cookie_policy	11 months	The cookie is set by the GDPR Cookie Consent plugin and is used to store whether or not user has consented to the use of cookies. It does not store any personal data.

Cookie	Duration	Description
_ga	2 years	The _ga cookie, installed by Google Analytics, calculates visitor, session and campaign data and also keeps track of site usage for the site's analytics report. The cookie stores information anonymously and assigns a randomly generated number to recognize unique visitors.
_gid	1 day	Installed by Google Analytics, _gid cookie stores information on how visitors use a website, while also creating an analytics report of the website's performance. Some of the data that are collected include the number of visitors, their source, and the pages they visit anonymously.

Graph Algorithms

Graph Processing Frameworks

Bruce Boyes

Related Articles

Introduction to knowledge graphs (section 5.2): Inductive knowledge – Knowledge graph embeddings

Introduction to knowledge graphs (section 5.3): Inductive knowledge – Graph neural networks

Introduction to knowledge graphs (section 3.4): Data graphs – Context

Introduction to knowledge graphs (section 5.4): Inductive knowledge – Symbolic learning