The 10 most cited sources on Wikipedia

The 10 most cited sources on Wikipedia

By Miriam Redi, Jake Orlowitz, Dario Taraborelli & Ben Vershbow, all of the Wikimedia Foundation.

Wikipedia has recently published a dataset of every citation referencing an identifier across all 297 Wikipedia language editions. The dataset breaks down sources cited in each language by identifier–a PMID or PMC (for articles in the biomedical literature), a DOI (for scholarly papers), an ISBN (for book editions), or an ArXiV ID (for preprints).

What are the most cited sources?

Unsurprisingly, Wikipedians love reference works. The top 10 sources by citation across every Wikipedia language are all reference books or scientific articles describing large collections. Many of these publications have been cited by Wikipedians across large series of articles using powerful bots and automated tools.

  1. Updated world map of the Köppen-Geiger climate classification:  2,830,341 citations  []
  2. Prediction of Hydrophobic (Lipophilic) Properties of Small Organic Molecules Using Fragment Methods: An Analysis of AlogP and CLogP Methods:  21,350 citations []
  3. The status, quality, and expansion of the NIH full-length cDNA project: the Mammalian Gene Collection (MGC):  20,247 citations []
  4. The de Vaucouleurs Atlas of Galaxies:  19,068 citations [ISBN: 9780521820486]
  5. The Complete New General Catalogue and Index Catalogues of Nebulae and Star Clusters by J. L. E. Dryer:  19,060 citations [ISBN: 9780933346512]
  6. Galaxies and How to Observe Them:  19,058 citations [ISBN: 9781852337520}
  7. A Concise History of Romania:  15,597 citations [ISBN: 9780521872386]
  8. Catalog of Fishes California Academy of Sciences:  11,980 citations, [ISBN: 0940228475]
  9. Dictionary of Minor Planet Names:  10,651 citations [ISBN: 9783540002383]
  10. National and religious composition of the population of Croatia, 1880-1991: By settlements:  8,230 citations [ISBN: 9789536667079]


What types of sources are cited the most by language?


On average, the majority of publications cited by identifier across Wikipedia language editions are books. German Wikipedia – one of the top 5 language editions by number of articles – relies primarily on information sourced to book editions, with 87% of citations in the ISBN category. Conversely, English Wikipedia sources its information equally on scholarly publications and books, while Arabic Wikipedia uses more scholarly publications than books.

Preprint repositories such as ArXiv, represent a minority of publications, with less than 2% of citations in each language, and they are most prominently cited in Arabic Wikipedia. At least 5% of publications in Arabic and English Wikipedia are open access biomedical publications from PubMedCentral.

How fast are citations growing by language?


If we look at the percentage of total citations added over time, we note that some languages such as Arabic and Spanish are on a steady growth trajectory as of early 2018, while the general trend (black line) is flattening. Since the number of articles across all languages continues to grow, this suggests that in some languages the rate of citation is slowing down.

How often are sources cited and reused across articles and languages?

There are 4.5 million unique sources in the datasets. While on average, every source is cited 3.5 times, the vast majority of sources in this dataset are used less than 500 times across wikis. Only nine “super publications”’ are used more than 10,000 times.


Credit: Wikimedia Foundation (several edits were made to the original article).

Thumbnail Image: Alireza Attari

The trouble with ceasefires

The trouble with ceasefires

Oldest genetic evidence of Hepatitis B virus found in ancient DNA from 4,500 year-old skeletons

Oldest genetic evidence of Hepatitis B virus found in ancient DNA from 4,500 year-old skeletons