using scipy connected_components correctly
Scipy provides various sparse matrix representations to work with sparse data with high dimentionality.
which is incorrect because now graph contains contains more values than previous graph
Some times we need to find clusters in sparse matrix based on similar trributes. we can model this using record to record sparse matrix with node i,j contains non zero similarity score.
D1,D2,D3,D4,D5
graph = [
D1... [ 5, 4 , 4, 0 , 0 ],
D2... [ 4, 5 , 3 , 0 , 0 ],
D3... [ 4, 3, 5, 0, 0],
D4... [0, 0 , 0, 5, 1],
D5... [0, 0, 0, 1, 5]
... ]
Here we have
similarity(D1,D2)=4
similarity(D1,D3)=4
similarity(D2,D3)=3
...
...
from scipy.sparse import csr_matrix
graph=[[ 5, 4 , 4, 0 , 0 ],
[ 4, 5 , 3 , 0 , 0 ],
[ 4, 3, 5, 0, 0],
[0, 0 , 0, 5, 1],
[0, 0, 0, 1, 5]]
graph=csr_matrix(graph)
print(repr(graph))
out:
<5x5 sparse matrix of type '<class 'numpy.int64'>
'with 13 stored elements in Compressed Sparse Row format>
suppose now i have a requirement to take only similarity higher than a threshold we can do
graph[graph<threshold]=0
connected_components(graph)
out:
(1, array([0, 0, 0, 0, 0], dtype=int32))
which is incorrect because now graph contains contains more values than previous graph
print(graph.toarray())
out:
array([[5, 4, 4, 0, 0],
[4, 5, 3, 0, 0],
[4, 3, 5, 0, 0],
[0, 0, 0, 5, 0],
[0, 0, 0, 0, 5]])
print(repr(graph))
out:
<5x5 class="" matrix="" numpy.int64="" of="" sparse="" type="">'
with 25 stored elements in Compressed Sparse Row format>
5x5>
here all zeros we got into csr matrix because we did graph[graph<threshold]=0 and as per documentation of csgraph. it considers zero weight edges as valid edges.To get correct graph we need to remove all zeroes from the csr_matrix using eliminate_zeros().
graph.eliminate_zeros() print(repr(graph)) out: <5x5 matrix="" numpy.int64="" of="" sparse="" type="">'with 11 stored elements in Compressed Sparse Row format> 5x5> connected_components(graph) out: (3, array([0, 0, 0, 1, 2], dtype=int32))
Post a Comment