approve package

Submodules

approve.models module

class approve.models.APPr(K: int, alpha: float, add_self_loops: bool = True, cached: bool = True, normalize: bool = True, **kwargs)[source]

Bases: MessagePassing

The approximate personalized PageRank model for homogeneous graphs, adapted from the APPNP model:

\[ \begin{align}\begin{aligned}\mathbf{X}^{(0)} &= \mathbf{X}\,,\\\mathbf{X}^{(k)} &= (1 - \alpha) \mathbf{\hat{S}} \mathbf{X}^{(k-1)} + \alpha \mathbf{X}^{(0)}\,,\\\mathbf{X}' &= \mathbf{X}^{(K)}\,,\end{aligned}\end{align} \]

where \(\alpha\) is the transport probability, \(\mathbf{\hat{S}}\) is the (column) stochastic matrix \(\mathbf{\hat{S}} := \mathbf{\hat{A}}\mathbf{\hat{D}}^{-1}\), \(\mathbf{\hat{A}}\) is the adjacency matrix with self-loops possibly added, and \(\mathbf{\hat{D}}\) is the diagonal degree matrix with entries \(\mathbf{\hat{D}}_{ii} := \sum_\ell \mathbf{\hat{A}}_{\ell i}\).

Note

The adjacency matrix can include values different from 1 (representing edge weights) via the optional edge_weight tensor.

Parameters:
  • K (int) – The number of iterations \(K\).

  • alpha (float) – The teleport probability \(\alpha\).

  • add_self_loops (bool, optional) – If set to True, the model will add self-loops to the graph. (default: True)

  • cached (bool, optional) – If set to True, the model will cache the computation of \(\mathbf{\hat{A}} \mathbf{\hat{D}}\) on the first execution and use the cached version for further executions. (default: True)

  • normalize (bool, optional) – If set to True, the model will add self-loops to the graph and normalize the edge weights. (default: True)

  • **kwargs (optional) – Additional arguments of torch_geometric.nn.conv.MessagePassing.

Examples

from approve.models import APPr
from torch import tensor

# define homogeneous graph
edge_index = torch.tensor([[1, 2, 2],
                        [0, 0, 1]])

# assign uniform scores to each node
x = torch.full((3, 1), 1 / 3)

# compute PageRank
model = APPr(K=30, alpha=0.5)
model(x, edge_index)
>>> tensor([[0.5333],
            [0.2667],
            [0.2000]])
forward(x: Tensor, edge_index: Tensor, edge_weight: Tensor | None = None) Tensor[source]

Runs the forward pass of the module.

Parameters:
  • x (Tensor) – The node features \(\mathbf{X}\).

  • edge_index (Tensor) – The edge indices.

  • edge_weight (Tensor, optional) – The edge weights. (default: None)

Returns:

The updated node features \(\mathbf{X}'\).

Shape:
  • Input: \((|\mathcal{N}|, F)\), \((2,|\mathcal{E}|)\), and \((|\mathcal{E}|)\), where \(|\mathcal{N}|\) is the number of nodes, \(|\mathcal{E}|\) is the number of edges, and \(F\) is the number of features.

  • Output: \((|\mathcal{N}|, F)\).

static message(x_j, edge_weight)[source]

Constructs messages from node \(j\) to node \(i\) in analogy to \(\phi_{\mathbf{\Theta}}\) for each edge in edge_index. This function can take any argument as input which was initially passed to propagate(). Furthermore, tensors passed to propagate() can be mapped to the respective nodes \(i\) and \(j\) by appending _i or _j to the variable name, .e.g. x_i and x_j.

reset_parameters() None[source]

Resets all learnable parameters of the module.

class approve.models.HeteroAPPr(K: int, add_self_loops: bool = True, add_special_edges: bool = True, cached: bool = True, normalize: bool = True, check_node_type_conn: bool = True, **kwargs)[source]

Bases: MessagePassing

The approximate personalized PageRank model for heterogeneous graphs which generalizes the APPNP model:

\[ \begin{align}\begin{aligned}\mathbf{X}^{[n](0)} &= \mathbf{X}^{[n]}\,,\\\mathbf{X}^{[n](k+1)} &= (1 - \alpha[n]) \textstyle \sum_{n' \in \mathcal{N}} \sum_{e \in \mathcal{E}[n',n]} \beta[e] \mathbf{\hat{S}}[e] \mathbf{X}^{[n'](k)} + \alpha[n] \mathbf{X}^{[n](0)}\,,\\\mathbf{X}^{\prime[n]} &= \mathbf{X}^{[n](K)}\,,\end{aligned}\end{align} \]

where \(\alpha[n]\) is the transport probability for type \(n\) nodes, \(\mathcal{N}\) is the set of all node types, \(\mathcal{E}[n',n]\) is the set of all edge types \(e: n' \to n\), \(\beta[e]\) is the contribution percentage from type \(e\) edges for updating the features of type \(n\) nodes, \(\mathbf{\hat{S}}[e]\) is the (column) stochastic matrix \(\mathbf{\hat{S}}[e] := \mathbf{\hat{A}}[e]\mathbf{\hat{D}}[e]^{-1}\), \(\mathbf{\hat{A}}[e]\) is the adjacency matrix for type \(e\) edges with self-loops (for homogenenous layers) or special edges (for bipartite layers) possibly added, and \(\mathbf{\hat{D}}[e]\) is the diagonal degree matrix for type \(e\) edges with entries \(\mathbf{\hat{D}}[e]_{ii} := \sum_\ell \mathbf{\hat{A}}[e]_{\ell i}\).

Note

All adjacency matrices can include values other than 1 (representing edge weights) via the optional edge_weight_dict dictionary.

Parameters:
  • K (int) – The number of iterations \(K\).

  • add_self_loops (bool, optional) – If set to True, the model will add self-loops to all homogeneous layers. (default: True)

  • add_special_edges (bool, optional) – If set to True, the model will add special edges to all bipartite layers. (default: True)

  • cached (bool, optional) – If set to True, the model will cache the computation of \(\mathbf{\hat{S}}[e]\) on the first execution and use the cached version for further executions. (default: True)

  • normalize (bool, optional) – If set to True, the model will add self-loops to homogeneous layers and special edges to bipartite layers, and normalize the edge weights. (default: True)

  • check_node_type_conn (bool, optional) – If set to True, the model checks that every node type is the target of at least one edge type. (default: True)

  • **kwargs (optional) – Additional arguments of torch_geometric.nn.conv.MessagePassing

Examples

from approve.models import HeteroAPPr
from torch import tensor
from torch_geometric.data import HeteroData

# define heterogeneous graph
hetero_data = HeteroData()
hetero_data['paper', 'cites', 'paper'].edge_index = \
    torch.tensor([[1, 2, 2],
                  [0, 0, 1]])
hetero_data['venue', 'publishes', 'paper'].edge_index = \
    torch.tensor([[0, 1],
                  [0, 1]])
hetero_data['paper', 'rev_publishes', 'venue'].edge_index = \
    hetero_data['venue', 'publishes', 'paper'].edge_index[[1,0]]

# assign uniform scores to each node
hetero_data['paper'].x = torch.full((3, 1), 1 / 3)
hetero_data['venue'].x = torch.full((2, 1), 1 / 2)

# compute PageRank
model = HeteroAPPr(K=30)
model(hetero_data.x_dict, hetero_data.edge_index_dict)
>>> {'paper': tensor([[0.4605],
             [0.3289],
             [0.2105]]),
     'venue': tensor([[0.4803],
             [0.4145],
             [0.1053]])}
forward(x_dict: Dict[str, Tensor], edge_index_dict: Dict[Tuple[str, str, str], Tensor], edge_weight_dict: Dict[Tuple[str, str, str], Tensor] | None = None, special_dict: Dict[str, int | None] | None = None, alpha_exp_dict: Dict[str, float] | None = None, beta_exp_dict: Dict[Tuple[str, str, str], float] | None = None) Dict[str, Tensor][source]

Runs the forward pass of the module.

Parameters:
  • x_dict (Dict[str, Tensor]) – A dictionary of features for each node type.

  • edge_index_dict (Dict[Tuple[str, str, str], Tensor]) – A dictionary of edge indices for each edge type.

  • edge_weight_dict (Dict[Tuple[str, str, str], Tensor], optional) – A dictionary of edge weights for each edge type. (default: None)

  • special_dict (Dict[str, int] , optional) – A dictionary of special nodes for each node type. (default: None)

  • alpha_exp_dict (Dict[str, float], optional) – A dictionary of exponents used to calculate transport probabilities via utils.gen_alpha_dict for each node type. (default: None)

  • beta_exp_dict (Dict[Tuple[str, str, str], float], optional) – A dictionary of exponents used to calculate contribution percentages via utils.gen_beta_dict for each edge type. (default: None)

Returns:

An updated dictionary of features for each node type.

Raises:

AssertionError – If self.check_node_type_conn is set to True and not every node type is the target of at least one edge type.

static message(x_j, edge_weight)[source]

Constructs messages from node \(j\) to node \(i\) in analogy to \(\phi_{\mathbf{\Theta}}\) for each edge in edge_index. This function can take any argument as input which was initially passed to propagate(). Furthermore, tensors passed to propagate() can be mapped to the respective nodes \(i\) and \(j\) by appending _i or _j to the variable name, .e.g. x_i and x_j.

reset_parameters() None[source]

Resets all learnable parameters of the module.

approve.typing module

approve.utils module

approve.utils.add_remaining_special_edges(edge_index: Tensor, edge_weight: Tensor | None = None, num_nodes_s: int | None = None, num_nodes_t: int | None = None, special_s: int | None = None, special_t: int | None = None, fill_value: float = 1.0) Tuple[Tensor, Tensor | None, int | None, int | None, int | None, int | None][source]

Adds special edges, from missing source nodes to special_t, and from special_s to missing target nodes, for all nodes missing from edge_index. If special_s (resp. special_t) is not given, then it is set to num_nodes_s (resp. num_nodes_t).

Parameters:
  • edge_index (Tensor) – The edge indices.

  • edge_weight (Tensor, optional) – The edge weights. (default: None)

  • num_nodes_s (int, optional) – The number of source nodes, if known. (default: None)

  • num_nodes_t (int, optional) – The number of target nodes, if known. (default: None)

  • special_s (int, optional) – The special source node, if set. (default: None)

  • special_t (int, optional) – The special target node, if set. (default: None)

  • fill_value (float, optional) – The weight associated with special edges. This is relevant only when edge_weight is not None. (default: 1.)

Returns:

The updated values for edge_weight, edge_weight, num_nodes_s, num_nodes_t, special_s, and special_t.

Shape:
  • Input: \((2,|\mathcal{E}|)\) and \((|\mathcal{E}|)\) where \(|\mathcal{E}|\) is the number of edges.

  • Output: \((2,|\mathcal{E}'|)\) and \((|\mathcal{E}'|)\) where \(|\mathcal{E}'|\) is the new number of edges.

Examples

>>> edge_index = torch.tensor([[0, 0],
... [0, 1]])
>>> add_remaining_special_edges(edge_index, num_nodes_s=2,
... num_nodes_t=3)
(tensor([[0, 0, 1, 2],
         [0, 1, 3, 2]]), None, 3, 4, 2, 3)
approve.utils.bipartite_maybe_num_nodes(edge_index: Tensor, num_nodes_s: int | None = None, num_nodes_t: int | None = None) Tuple[int, int][source]

Calculates the number of source and target nodes in the bipartite graph with edge indices given by edge_index.

Parameters:
  • edge_index (Tensor) – The edge indices.

  • num_nodes_s (int, optional) – The number of source nodes, if known. (default: None)

  • num_nodes_t (int, optional) – The number of target nodes, if known. (default: None)

Returns:

The updated values for num_nodes_s and num_nodes_t.

Raises:
  • AssertionError – If num_nodes_s is not None and not more than the maximum value of edge_index[0].

  • AssertionError – If num_nodes_t is not None and not more than the maximum value of edge_index[1].

Shape:
  • Input: \((2,|\mathcal{E}|)\) where \(|\mathcal{E}|\) is the number of edges.

Examples

>>> edge_index = torch.tensor([[0, 0],
... [0, 1]])
>>> bipartite_maybe_num_nodes(edge_index)
(1, 2)
approve.utils.bipartite_pr_norm(edge_index: Tensor, edge_weight: Tensor | None = None, num_nodes_s: int | None = None, num_nodes_t: int | None = None, add_special_edges: bool = True, special_s: int | None = None, special_t: int | None = None, flow: str = 'source_to_target') Tuple[Tensor, Tensor | None, int, int, int | None, int | None][source]

Calculates the entries of the (column) stochastic matrix \(\mathbf{\hat{S}} := \mathbf{\hat{A}}\mathbf{\hat{D}}^{-1}\) as edge weights where \(\mathbf{\hat{A}}\) is the adjacency matrix with special edges possibly added, and \(\mathbf{\hat{D}}\) is the diagonal degree matrix with entries \(\mathbf{\hat{D}}_{ii} := \sum_\ell \mathbf{\hat{A}}_{\ell i}\).

Parameters:
  • edge_index (Tensor) – The edge indices.

  • edge_weight (Tensor, optional) – The edge weights. (default: None)

  • num_nodes_s (int, optional) – The number of source nodes, if known. (default: None)

  • num_nodes_t (int, optional) – The number of target nodes, if known. (default: None)

  • add_special_edges (bool, optional) – If set to True, the function will add special edges to the bipartite graph. (default: True)

  • special_s (int, optional) – The special source node, if set. (default: None)

  • special_t (int, optional) – The special target node, if set. (default: None)

  • flow (str, optional) – The flow of the graph. (default: "source_to_target")

Returns:

The updated values for edge_index, edge_weight, num_nodes_s, num_nodes_t, special_s, and special_t.

Raises:

AssertionError – If flow is neither "source_to_target" nor "target_to_source".

approve.utils.gen_alpha_dict(node_type_list: List[str], alpha_exp_dict: Dict[str, float] | None = None) Dict[str, float][source]

Calculates the transport probabilities \(\alpha[n]\) for each node type \(n\) in node_type_list via

\[\alpha[n] = \frac{\exp{A[n]}}{1 + \exp{A[n]}}\,.\]

Note

If the exponent \(A[n]\) is not specified in alpha_exp_dict, then it defaults to 0.

Parameters:
  • node_type_list (List[str]) – A list of all node types.

  • alpha_exp_dict (Dict[str, float], optional) – A dictionary of exponents \(A[n]\) used to calculate the transport probabilities \(\alpha[n]\) for each node type.

Returns:

A dictionary of transport probabilities \(\alpha[n]\) for each node type.

approve.utils.gen_beta_dict(edge_type_list: List[Tuple[str, str, str]], beta_exp_dict: Dict[Tuple[str, str, str], float] | None = None) Dict[Tuple[str, str, str], float][source]

Calculates the contribution percentages \(\beta[e]\) for each edge type \(e:n' \to n\) in beta_exp_list via

\[\beta[e] = \frac{\exp{B[e]}}{\sum_{n'' \in \mathcal{N}} \sum_{e'\in\mathcal{E}[n'',n]} \exp{B[e']}}\,,\]

where the sum is over all edge types with type \(n\) nodes as targets.

Note

If the exponent \(B[e]\) is not specified in beta_exp_dict, then it defaults to 0.

Parameters:
  • edge_type_list (List[Tuple[str, str, str]]) – A list of all edge types.

  • beta_exp_dict (Dict[Tuple[str, str, str], float], optional) – A dictionary of exponents \(B[e]\) used to calculate the contribution percentages \(\beta[e]\) for each edge type.

Returns:

A dictionary of contribution percentages \(\beta[e]\) for each edge type.

approve.utils.hetero_pr_norm(edge_index_dict: Dict[Tuple[str, str, str], Tensor], edge_weight_dict: Dict[Tuple[str, str, str], Tensor] | None = None, num_nodes_dict: Dict[str, int] | None = None, add_self_loops: bool = True, add_special_edges: bool = True, special_dict: Dict[str, int | None] | None = None, flow: str = 'source_to_target') Tuple[Dict[Tuple[str, str, str], Tensor], Dict[Tuple[str, str, str], Tensor], Dict[str, int], Dict[str, int | None]][source]

Calculates the entries of the (column) stochastic matrix \(\mathbf{\hat{S}}[e] := \mathbf{\hat{A}}[e]\mathbf{\hat{D}}[e]^{-1}\) as edge weights for all edge types \(e\) where \(\mathbf{\hat{A}}[e]\) is the adjacency matrix for type \(e\) edges with self-loops (for homogenenous layers) or special edges (for bipartite layers) possibly added, and \(\mathbf{\hat{D}}[e]\) is the diagonal degree matrix for type \(e\) edges with entries \(\mathbf{\hat{D}}[e]_{ii} := \sum_\ell \mathbf{\hat{A}}[e]_{\ell i}\).

Parameters:
  • edge_index_dict (Dict[Tuple[str, str, str], Tensor]) – A dictionary of edge indices for each edge type.

  • edge_weight_dict (Dict[Tuple[str, str, str], Tensor], optional) – A dictionary of edge weights for each edge type. (default: None)

  • num_nodes_dict (Dict[str, Tensor], optional) – A dictionary for the number of each node type, if known. (default: None)

  • add_self_loops (bool, optional) – If set to True, the function will add self-loops to all homogeneous layers. (default: True)

  • add_special_edges (bool, optional) – If set to True, the function will add special edges to all bipartite layers. (default: True)

  • special_dict (Dict[str], Tensor], optional) – A dictionary of special nodes for each node type, if set. (default: None)

  • flow (str, optional) – The flow of the graph. (default: "source_to_target")

Returns:

The updated values for edge_index_dict, edge_weight_dict, num_nodes_dict, and special_dict.

Raises:

AssertionError – If flow is neither "source_to_target" nor "target_to_source".

approve.utils.missing_indices(index: Tensor, num_nodes: int | None = None) Tensor | None[source]

Finds all integers in the interval \([0,N)\) missing from index. If num_nodes is not None, \(N\) is num_nodes. Otherwsie, \(N\) is the maximum value of index plus one.

Parameters:
  • index (torch.Tensor) – A one-dimensional tensor of non-negative integers representing nodes.

  • num_nodes (int, optional) – The number of nodes, if known. (default: None)

Returns:

A one-dimensional tensor of all missing indices or None if no indices are missing.

Raises:

AssertionError – If num_nodes is not None and not more than the maximum value of index.

Examples

>>> index = torch.tensor([0, 2])
>>> missing_index(index)
tensor([1])
approve.utils.pr_norm(edge_index: Tensor, edge_weight: Tensor | None = None, num_nodes: int | None = None, add_self_loops: bool = True, flow: str = 'source_to_target') Tuple[Tensor, Tensor][source]

Calculates the entries of the (column) stochastic matrix \(\mathbf{\hat{S}} := \mathbf{\hat{A}}\mathbf{\hat{D}}^{-1}\) as edge weights where \(\mathbf{\hat{A}}\) is the adjacency matrix with self-loops possibly added, and \(\mathbf{\hat{D}}\) is the diagonal degree matrix with entries \(\mathbf{\hat{D}}_{ii} := \sum_\ell \mathbf{\hat{A}}_{\ell i}\).

Parameters:
  • edge_index (Tensor) – The edge indices.

  • edge_weight (Tensor, optional) – The edge weights. (default: None)

  • num_nodes (int, optional) – The number of nodes, if known. (default: None)

  • add_self_loops (bool, optional) – If set to True, the function will add self-loops to the graph. (default: True)

  • flow (str, optional) – The flow of the graph. (default: "source_to_target")

Returns:

The updated values for edge_index and edge_weight.

Raises:

AssertionError – If flow is neither "source_to_target" nor "target_to_source".

Shape:
  • Input: \((2,|\mathcal{E}|)\) and \((|\mathcal{E}|)\) where \(|\mathcal{E}|\) is the number of edges.

  • Output: \((2,|\mathcal{E}'|)\) and \((|\mathcal{E}'|)\) where \(|\mathcal{E}'|\) is the new number of edges.

Module contents