HN
Today

Wikipedia as a Graph

Wikigrapher models Wikipedia's entire dataset as a massive graph, meticulously mapping pages, redirects, and categories as distinct nodes. This technical feat provides a powerful, programmatic lens into the intricate web of human knowledge, allowing for novel data exploration. It resonates with the HN crowd due to its ingenious approach to handling colossal information structures and its implications for advanced data analytics.

8
Score
0
Comments
#2
Highest Rank
7h
on Front Page
First Seen
Aug 29, 4:00 PM
Last Seen
Aug 29, 10:00 PM
Rank Over Time
53245810

The Lowdown

The wikigrapher project, developed by Hamza BAAZIZ, is an ambitious undertaking to represent the colossal structure of Wikipedia as a comprehensive graph database. It transforms the world's largest encyclopedia into a queryable network, where every article, redirect, and category becomes a node, and their relationships form the edges. This approach offers a powerful new paradigm for understanding the intrinsic connections within Wikipedia's vast information ecosystem.

Key aspects of the wikigrapher dataset, based on an August 20, 2025 English Wikipedia dump, include:

  • Nodes: The graph contains 7,043,985 unique pages, 11,563,911 redirects, 2,550,366 categories, and 27,427 orphan pages (those with no incoming links).
  • Relations: It meticulously tracks 691,392,572 'link_to' relationships between pages, 11,535,980 'redirect_to' connections, and 44,331,587 'belong_to' associations, likely linking pages to categories.
  • Data Source: The project leverages publicly available Wikimedia dumps, specifically noting the English Wikipedia dump from a future date (August 20, 2025), indicating ongoing or forward-looking development.

By converting Wikipedia into a structured graph, wikigrapher opens up new avenues for research, data mining, and knowledge discovery, enabling complex queries and network analyses that are challenging with traditional relational databases or simple text processing.