The blog post discusses about graph convolutional networks.

In the last article, we discussed how the function to produce latent representations of a graph, , needs to be permutation equivariant, which applies a permutation invariant local function row-wise. There are multiple permutation invariant functions we can use for , but one of the simplest functions is to take the weighted sum of each feature among the node and its neighbors, which we have been calling convolution.
Graph Convolution
Graph convolution differs from the convolution operation we covered in the context of image processing in that graph convolution does not have a fixed kernel size to slide over. However, it is essentially the same in that it takes the weighted sum of a unit of data and all its neighbors. Using the adjacency matrix , we can compute the weighted sum of neighbors quite conveniently by , where is of size , , the node matrix, is of size , and , the weight matrix, is of size . Here, is the number of nodes, is the node feature dimension, and is the number of kernels or the output feature dimension. The resulting sum will be of size .
However, is not a proper convolution as it does not include the node itself in computing the weighted sum. To address this, we create a new matrix by adding the identity matrix to the adjacency matrix and compute , making it a proper graph convolution. Notice that we could have set up of size , with values depending on node pairs, and computed . However, this operation is not permutation equivariant, as the permutation of must correspond to that of . Hence, we compute with whose values correspond to the input and output features.
Graph Convolutional Network
To allow the model to learn complex and nonlinear relationships through backpropagation, we introduce a non-linear activation function and compute . However, the latent representations from the weighted sum can grow larger and larger, making it difficult for the model to learn appropriate weights. To normalize the weighted sum, we multiply the inverse of the degree matrix , which contains the degree of each node in its diagonal entries, to the weighted sum, finally arriving at a graph convolutional network (GCN). (*The degree of a node is the number of edges for an unweighted graph and the sum of edge weights for a weighted graph.)
At each layer , GCN performs the computation above to produce a graph embedding of size . However, in practice, GCN tends to compute , which normalizes the value by , the square root of the degrees of a pair of nodes, instead of the degree of the node of interest, and this empirically tends to perform better. In the context of and , GCN can be expressed as follows.
Due to its relatively simple convolution operation, which simply takes the normalized weighted sum of the node and its neighbors, whose weights do not depend on the nodes of interest, it tends to perform better on homophilous graphs where similar nodes are more likely to be connected.
Code Implementation
The following shows the TensorFlow implementation of a GCN layer.
class GraphConvolutionLayer(layers.Layer):
def __init__(self, d_in, d_out):
super(GraphConvolutionLayer, self).__init__()
self.d_in = d_in
self.d_out = d_out
self.W = self.add_weight(shape=(d_in, d_out),
initializer='glorot_uniform',
trainable=True)
def call(self, x, A):
D = tf.reduce_sum(A, axis=-1)
D = tf.linalg.diag(D)
D_inv_sqrt = tf.linalg.inv(tf.sqrt(D))
A = tf.matmul(tf.matmul(D_inv_sqrt, A), D_inv_sqrt)
x = tf.matmul(A, x)
x = tf.matmul(x, self.W)
x = tf.nn.relu(x)
return x
The implementation assumes that the degree matrix will not be provided. The latent representation can be passed to a subsequent classifier or regressor to perform various tasks on graphs. I highly recommend implementing the graph convolution layer in PyTorch for practice.
Conclusion
In this article, we covered how convolution can be used as the permutation invariant aggregator to compose a permutation equivariant graph convolution layer, construct a graph convolutional network by stacking the layers, and produce latent representations for downstream tasks. While demonstrating the mathematical operations and code implementations, I also mentioned how the weights for computing the normalized weighted sum do not depend on the node, making GCN simple and effective on homophilous graphs but not necesarily for other graphs. Thus, in the next article, we will look at more complex models that can work well on non-homophilous graphs as well.
Resources
- Barbero, F. 2023. Graph Convolutional Networks - Oxford Geometric Deep Learning. YouTube.
- Velickovic, P. 2021. Theoretical Foundations of Graph Neural Networks. YouTube.