Clusters and homophily

Informally speaking, a cluster is a group of nodes in a graph that are more connected to each other than they are to the rest of the graph. For example, the red and yellow regions below are clusters:

clusters

Clusters provide a good big-picture view of the Web, much like a table of contents provides a big-picture view of a book. Unlike a book and its table of contents, however, a cluster involves many authors (Web builders) acting independently. Also, a Web builder may be unaware of the clusters he is forming.

The sociological force behind Web clusters is homophily, or "birds of a feather" (which "flock together"). Here is one important way that homophily impacts the Web:

  1. When a Web builder links from his site to another site, he usually does so because he perceives something important shared by his site and the other site.
  2. Therefore hyperlinks represent more than channels of navigation; they also represent shared likes and dislikes.
  3. The navigational meaning of a hyperlink is clear and explicit, but the shared likes and dislikes represented by a hyperlink are often unstated and implicit.
  4. Therefore clusters of mutually interlinked Web pages represent self-organized groups with specific shared likes and dislikes, which are usually unstated and implicit.

Motivated by the above, cluster-based search engines find interlinked groups of Web pages and then deduce the specific "meaning" of each cluster by analyzing the content of the interlinked pages. Popular examples of cluster-based search engines include Clusty, Grokker, and iBoogie.

 
Joomla Templates by Joomlashack