### Paradoxe: Clustering and Compact graphs

Hello,
In the last lecture, it didn't seem intuitive to me why it is a paradoxe that random graphs have both clustering and compact properties. Transitivity to me is a property that tends to make the graph more connected, and my very basic intuition was that a more connected graph has shorter diameter (because more possibilities to reach each other).
What is the actual accurate intuition we should have had here ?

Thanks a lot.

Top comment

On the first question: it's of course true that adding edges can only reduce distances. The point was that if one wanted to design a compact graph subject to a total budget of edges, then one would avoid short cycles, such as triangles. From the point of view of some arbitrary starting node, we want the # of nodes reachable within a graph distance (hops) to grow as quickly as possible, and short cycles would be "wasted edges" in this sense.

The G(n,p) is in fact very close to this ideal, and subject to the total # of edges (or equivalently, average degree), it is essentially as compact as can be. One says that "it locally looks like a tree": when starting at any node and expanding out to distance 1,2,3,..., almost all edges lead to new nodes (until we get close to the end).

In real networks, we would encounter many edges that connect back to already found nodes because of short cycles. This slows down this "expansion" process, compared to the G(n,p). The point of the Small World model is to show that it doesn't have to slow down too badly.

To add on Tom's question, I was confused by the clustering and diameter graph in the small world model.

Why is it that the clustering coefficient is at its highest when the graph is the original n-cycle ?

It seems to me that the clustering coefficient should be zero according to the definition because there are no closed triples or triangles.

Top comment

On the first question: it's of course true that adding edges can only reduce distances. The point was that if one wanted to design a compact graph subject to a total budget of edges, then one would avoid short cycles, such as triangles. From the point of view of some arbitrary starting node, we want the # of nodes reachable within a graph distance (hops) to grow as quickly as possible, and short cycles would be "wasted edges" in this sense.

The G(n,p) is in fact very close to this ideal, and subject to the total # of edges (or equivalently, average degree), it is essentially as compact as can be. One says that "it locally looks like a tree": when starting at any node and expanding out to distance 1,2,3,..., almost all edges lead to new nodes (until we get close to the end).

In real networks, we would encounter many edges that connect back to already found nodes because of short cycles. This slows down this "expansion" process, compared to the G(n,p). The point of the Small World model is to show that it doesn't have to slow down too badly.

On the second question: note that every node does not only have two edges to its immediate neighbors on the cycle, but to every neighbor within distance $k$. This in fact gives rise to many triangles (you will compute the exact clustering coefficient in this model in a homework problem).

@matthias9 said:
On the second question: note that every node does not only have two edges to its immediate neighbors on the cycle, but to every neighbor within distance $k$. This in fact gives rise to many triangles (you will compute the exact clustering coefficient in this model in a homework problem).

Just about this answer, I think it is just not clear to me either clustering means transitivity of neighbourhood or transitivity of connectivity, namely, is it that "two nodes with common neighbours are probably neighbours themselves" or "two nodes with common neighbours are probably connected" ? Also, it seems quite obvious that nodes with common neighbour are connected, would it then mean that they are connected through other nodes ?
Thanks

It's the former: high clustering means that for "many" pairs of nodes that have a common neighbor, they are actually neighbors as well, thereby forming a triangle. And with "many", we mean relative to the G(n,p) reference model, where edges are generated independently.

You are correct that two nodes with a common neighbor are, of course, always connected (ie, in the same connected component).