API Reference
- coreset_sc.gen_sbm(n, k, p, q)[source]
Generate an approximate sample from a Stochastic Block Model (SBM) graph.
- Parameters:
n (int) – Number of nodes in each cluster.
k (int) – Number of clusters.
p (float) – Probability of an edge within the same cluster.
q (float) – Probability of an edge between different clusters.
- Returns:
adj_mat (scipy.sparse.csr_matrix, shape = (n*k, n*k)) – The symmetric adjacency matrix of the generated graph with self loops added.
labels (numpy.ndarray, shape = (n*k,)) – The ground truth cluster labels
CoresetSpectralClustering
- class coreset_sc.CoresetSpectralClustering(*args: Any, **kwargs: Any)[source]
Coreset Spectral Clustering
- Parameters:
num_clusters (int) – Number of clusters to form.
coreset_ratio (float, default=0.01) – Ratio of the coreset size to the original data size. If set to 1.0, coreset clustering will be skipped and the full graph will be clustered directly.
k_over_sampling_factor (float, default=2.0) – The factor to oversample the number of clusters for the coreset seeding stage. Higher values will increase the “resolution” of the sampling distribution, but take longer to compute.
shift (float, default=0.0) – The shift to add to the implicit kernel matrix of the form K’ = K + shift*D^{-1}. This is useful for graphs with large edge weights relative to degree, which can cause the kernel matrix to be indefinite.
kmeans_alg (sklearn.cluster.KMeans, default=None) – The KMeans algorithm to use for clustering the coreset embeddings. If None, a default KMeans algorithm will be used.
full_labels (bool, default=True) – Whether to return the full labels of the graph after fitting. If False, only the coreset labels will be returned.
ignore_warnings (bool, default=False) – Whether to ignore warnings about the implicit Kernel matrix being indefinite. Distances that do become negative will be clipped to zero.
- compute_conductances()[source]
Compute the conductance of the labelled graph after fitting.
- Returns:
conductances – The conductance of each cluster
- Return type:
numpy.ndarray, shape = (num_clusters,)
- fit(adjacency_matrix, y=None)[source]
Fit the coreset clustering algorithm on the sparse adjacency matrix.
- Parameters:
adjacency_matrix (scipy.sparse.csr_matrix, shape = (n_samples, n_samples)) – The adjacency matrix of the graph. This must contain self loops for each node.
y (Ignored) – Not used, present here for API consistency by convention.
- Returns:
self – Fitted estimator.
- Return type:
object
- fit_predict(adjacency_matrix, y=None)[source]
Fit the coreset clustering algorithm on the sparse adjacency matrix and return the cluster assignments.
- Parameters:
adjacency_matrix (scipy.sparse.csr_matrix, shape = (n_samples, n_samples)) – The adjacency matrix of the graph. This must contain self loops for each node.
y (Ignored) – Not used, present here for API consistency by convention.
- Returns:
labels – Cluster assignments.
- Return type:
numpy.ndarray, shape = (n_samples,)
- get_coreset_graph(adjacency_matrix, y=None)[source]
Extract a coreset graph from the adjacency matrix.
- Parameters:
adjacency_matrix (scipy.sparse.csr_matrix, shape = (n_samples, n_samples)) – The adjacency matrix of the graph. This must contain self loops for each node.
y (Ignored) – Not used, present here for API consistency by convention.
- Returns:
self – Fitted estimator.
- Return type:
object