API Reference

coreset_sc.gen_sbm(n, k, p, q)[source]

Generate an approximate sample from a Stochastic Block Model (SBM) graph.

Parameters:

Returns:

adj_mat (scipy.sparse.csr_matrix, shape = (n*k, n*k)) – The symmetric adjacency matrix of the generated graph with self loops added.
labels (numpy.ndarray, shape = (n*k,)) – The ground truth cluster labels

CoresetSpectralClustering

class coreset_sc.CoresetSpectralClustering(*args: Any, **kwargs: Any)[source]

Coreset Spectral Clustering

Parameters:

num_clusters (int) – Number of clusters to form.
coreset_ratio (float, default=0.01) – Ratio of the coreset size to the original data size. If set to 1.0, coreset clustering will be skipped and the full graph will be clustered directly.
k_over_sampling_factor (float, default=2.0) – The factor to oversample the number of clusters for the coreset seeding stage. Higher values will increase the “resolution” of the sampling distribution, but take longer to compute.
shift (float, default=0.0) – The shift to add to the implicit kernel matrix of the form K’ = K + shift*D^{-1}. This is useful for graphs with large edge weights relative to degree, which can cause the kernel matrix to be indefinite.
kmeans_alg (sklearn.cluster.KMeans, default=None) – The KMeans algorithm to use for clustering the coreset embeddings. If None, a default KMeans algorithm will be used.
full_labels (bool, default=True) – Whether to return the full labels of the graph after fitting. If False, only the coreset labels will be returned.
ignore_warnings (bool, default=False) – Whether to ignore warnings about the implicit Kernel matrix being indefinite. Distances that do become negative will be clipped to zero.

compute_conductances()[source]

Compute the conductance of the labelled graph after fitting.

fit(adjacency_matrix, y=None)[source]

Fit the coreset clustering algorithm on the sparse adjacency matrix.

Parameters:

adjacency_matrix (scipy.sparse.csr_matrix, shape = (n_samples, n_samples)) – The adjacency matrix of the graph. This must contain self loops for each node.
y (Ignored) – Not used, present here for API consistency by convention.

Returns:

self – Fitted estimator.

Return type:

object

fit_predict(adjacency_matrix, y=None)[source]

Fit the coreset clustering algorithm on the sparse adjacency matrix and return the cluster assignments.

Parameters:

adjacency_matrix (scipy.sparse.csr_matrix, shape = (n_samples, n_samples)) – The adjacency matrix of the graph. This must contain self loops for each node.
y (Ignored) – Not used, present here for API consistency by convention.

Returns:

labels – Cluster assignments.

Return type:

numpy.ndarray, shape = (n_samples,)

get_coreset_graph(adjacency_matrix, y=None)[source]

Extract a coreset graph from the adjacency matrix.

Parameters:

adjacency_matrix (scipy.sparse.csr_matrix, shape = (n_samples, n_samples)) – The adjacency matrix of the graph. This must contain self loops for each node.
y (Ignored) – Not used, present here for API consistency by convention.

Returns:

self – Fitted estimator.

Return type:

object

label_full_graph()[source]

Label the full graph using the coreset labels. Skip this if the coreset ratio is 1.0.

set_coreset_graph_labels(labels)[source]: Allow the user to set the coreset graph labels manually. Must have constructed self.coreset_graph with get_coreset_graph() first. :param labels: Cluster assignments for the coreset graph. :type labels: numpy.ndarray, shape = (num_coreset_nodes,)