API Reference

coreset_sc.gen_sbm(n, k, p, q)[source]

Generate an approximate sample from a Stochastic Block Model (SBM) graph.

Parameters:
  • n (int) – Number of nodes in each cluster.

  • k (int) – Number of clusters.

  • p (float) – Probability of an edge within the same cluster.

  • q (float) – Probability of an edge between different clusters.

Returns:

  • adj_mat (scipy.sparse.csr_matrix, shape = (n*k, n*k)) – The symmetric adjacency matrix of the generated graph with self loops added.

  • labels (numpy.ndarray, shape = (n*k,)) – The ground truth cluster labels

CoresetSpectralClustering

class coreset_sc.CoresetSpectralClustering(*args: Any, **kwargs: Any)[source]

Coreset Spectral Clustering

Parameters:
  • num_clusters (int) – Number of clusters to form.

  • coreset_ratio (float, default=0.01) – Ratio of the coreset size to the original data size. If set to 1.0, coreset clustering will be skipped and the full graph will be clustered directly.

  • k_over_sampling_factor (float, default=2.0) – The factor to oversample the number of clusters for the coreset seeding stage. Higher values will increase the “resolution” of the sampling distribution, but take longer to compute.

  • shift (float, default=0.0) – The shift to add to the implicit kernel matrix of the form K’ = K + shift*D^{-1}. This is useful for graphs with large edge weights relative to degree, which can cause the kernel matrix to be indefinite.

  • kmeans_alg (sklearn.cluster.KMeans, default=None) – The KMeans algorithm to use for clustering the coreset embeddings. If None, a default KMeans algorithm will be used.

  • full_labels (bool, default=True) – Whether to return the full labels of the graph after fitting. If False, only the coreset labels will be returned.

  • ignore_warnings (bool, default=False) – Whether to ignore warnings about the implicit Kernel matrix being indefinite. Distances that do become negative will be clipped to zero.

compute_conductances()[source]

Compute the conductance of the labelled graph after fitting.

Returns:

conductances – The conductance of each cluster

Return type:

numpy.ndarray, shape = (num_clusters,)

fit(adjacency_matrix, y=None)[source]

Fit the coreset clustering algorithm on the sparse adjacency matrix.

Parameters:
  • adjacency_matrix (scipy.sparse.csr_matrix, shape = (n_samples, n_samples)) – The adjacency matrix of the graph. This must contain self loops for each node.

  • y (Ignored) – Not used, present here for API consistency by convention.

Returns:

self – Fitted estimator.

Return type:

object

fit_predict(adjacency_matrix, y=None)[source]

Fit the coreset clustering algorithm on the sparse adjacency matrix and return the cluster assignments.

Parameters:
  • adjacency_matrix (scipy.sparse.csr_matrix, shape = (n_samples, n_samples)) – The adjacency matrix of the graph. This must contain self loops for each node.

  • y (Ignored) – Not used, present here for API consistency by convention.

Returns:

labels – Cluster assignments.

Return type:

numpy.ndarray, shape = (n_samples,)

get_coreset_graph(adjacency_matrix, y=None)[source]

Extract a coreset graph from the adjacency matrix.

Parameters:
  • adjacency_matrix (scipy.sparse.csr_matrix, shape = (n_samples, n_samples)) – The adjacency matrix of the graph. This must contain self loops for each node.

  • y (Ignored) – Not used, present here for API consistency by convention.

Returns:

self – Fitted estimator.

Return type:

object

label_full_graph()[source]

Label the full graph using the coreset labels. Skip this if the coreset ratio is 1.0.

Returns:

labels – Cluster assignments.

Return type:

numpy.ndarray, shape = (n_samples,)

set_coreset_graph_labels(labels)[source]

Allow the user to set the coreset graph labels manually. Must have constructed self.coreset_graph with get_coreset_graph() first. :param labels: Cluster assignments for the coreset graph. :type labels: numpy.ndarray, shape = (num_coreset_nodes,)