Simple English Wikipedia Link Graph with Clickstream Transitions 2018-12

The Simple English Wikipedia Link Graph with Clickstream Transitions is a gzipped GML file representing the hyperlink graph of the Simple English Wikipedia. It was prepared using the "pagelinks" and "page" SQL dumps for 2019-01-01 and extended with an edge property called "transitions" based on the Clickstream dump for the English Wikipedia from 2018-12. It was designed to be used as a ground truth to evaluate node ranking metrics, like PageRank, but it can be useful for Network Science in general, or for Machine Learning and Information Retrieval to compute features over a medium-sized, complete Wikipedia link graph.

Data og ressourcer

Yderligere info

Felt Værdi
Kilde " "
Forfatter José Devezas
Last Updated oktober 1, 2019, 13:27 (UTC)
Oprettet marts 6, 2019, 10:23 (UTC)
Cite As DEVEZAS, José, NUNES, Sérgio. Simple English Wikipedia Link Graph with Clickstream Transitions 2018-12 [dataset]. 06 mar 2019. INESC TEC research data repository. DOI:
dc. License GNU Free Documentation License (GFDL)+ CC BY-SA 3.0 (
dc.Contributor Sérgio Nunes
dc.Coverage.Temporal 2018-12
dc.Date 2019-02-01T12:04
dc.Format Graph Modeling Language (GML); GZip
dc.Format.Extent Total: 166 MB; GML: 35 MB (compressed); 897,577 nodes; 6,986,460 edges; RData: 131 MB (igraph 1.2.2; 'g' variable)
dc.Language EN
dc.Publisher INESC TEC
dc.Type Simple English Wikipedia Link Graph for network analysis.
ddi.Software R; Gephi; igraph; NetworkX