The map below displays the total number of Wikipedia articles tagged to each country. The country with the most articles is the United States (almost 90,000 articles). Anguilla has the fewest number of geotagged articles (4), and indeed most small island nations and city states have less than 100 articles. However, it is not just microstates that are characterised by extremely low levels of wiki representation. Almost all of Africa is poorly represented in Wikipedia. Remarkably there are more Wikipedia articles written about Antarctica than all but one of the fifty-three countries in Africa (or perhaps even more amazingly, there are more Wikipedia articles written about the fictional places of Middle Earth and Discworld than about many countries in Africa, the Americas and Asia).
When examining the data normalised by area, an entirely different pattern is evident. Central and Western Europe, Japan and Israel have the most articles per landmass, while large countries like Russia and Canada have low ratios of Wikipedia articles per area.
Finally, the data were also mapped out against population. Here countries with small populations and large landmasses rise to the top of the rankings. Canada, Australia and Greenland all have extremely high levels of articles per every 100,000 people. Smaller nations with many noteworthy features or geotaggable events also appear high in the rankings (e.g. Pitcairn or Iceland).
As I've previously argued, Wikipedia is an important component of the palimpsests of place. In other words, presences and absences play a fundamental role in shaping how we interpret and interact with the world. The fact that the geographies of Wikipedia content are so uneven therefore leads to worrying conclusions. As we increasingly rely on peer produced information, large parts of the world remain 'terra incognita' (in a similar manner to the ways in which many of those same places were represented on European maps before the 19th Century). However, it is conceivable that it will only be a matter of time until the empty spaces on the Wikipedia map are filled in by Wikipedians in Zambia, Indonesia, and much of the rest of world.
These data certainly warrant a closer look, and I'll aim to get more maps (examining the distribution of content in specific languages, and looking in more detail at specific regions) uploaded soon.
28 comments:
Interesting work, though after editing for several years I can't say I'm too surprised by the systematic bias. I look forward to seeing more.
This is really awesome work, Mark! I've been thinking about doing something similar with KML content...
Hi Matt. Thanks for the comment. I've actually also been working on mapping out placemarks over on the floatingsheep blog (http://www.floatingsheep.org/2009/06/global-placemark-intensity.html). Happy to talk about this more if you'd like.
Excellent. Thanks for the link, Mark... More soon.
Hi Matt
This is interesting stuff. I've just this weekend discovered earthscan Atlas series http://www.earthscan.co.uk/?tabid=37&st=basic&se=atlas
Sometime back I emailed one of the Wikinomics authors about the possibility of a three dimensional representation of Wikipedia (or any internet system, the WWW itself for that matter).
I had in mind something along the lines of star systems (or neural networks) with the most active sites as the biggest brightest nodes (or whatever you wish to call them).
Is there any such thing? If not, what kind of obstacles are we talking about? Any idea?
(I forget how I got the imaginative display name)
Nick
Hi Nick. Thanks for the link. Are you talking about something along the lines of these maps: http://en.wikipedia.org/wiki/File:Internet_map_1024.jpg
Yip!!
Was that image always at the top of this blog?
Anyway, that looks a lot like what I visualized, but live, active.
Oh dear... I'm all confused.. I meant to respond to Mark, not Matt (not that I have anything against Matt ;)
I find the direct comparison with the density map of GeoNames entries quite visually compelling: http://en.wikipedia.org/wiki/Wikipedia:Systemic_bias#The_bias
Could you plot the data vs population density?
Interesting idea. I'll have a look at that next week.
Why does Burkina Faso have so many
I'd really like to see this data plotted against number of internet users per country. I don't see the results as a huge surprise when compared to, say, the Internet penetration map on Commons. The discrepancies with Internet penetration are more interesting.
I'd expect to see a general effect of concentration or dispersion at each extreme: a certain critical mass per country will probably result in more geotags per user, and under that mass we'll probably see very little geotagging. Above a certain mass, there will probably be a drop-off in growth, probably limited by total country area and/or population.
It'd be nice to see insets of the countries too small to show up on the map.
(Redo) Backtrack here.
My understanding is that you only used geotags from the english-speaking Wikipedia. Is that right?
If so, it may be interesting to see whether all wikis have the same patterns of geotagging.
@Kento: I discuss the reason for Burkina Faso having so many tags here: http://www.guardian.co.uk/technology/2009/dec/02/wikipedia-known-unknowns-geotagging-knowledge
@nihiltres: This is actually something I've been working on. Hope to have some concrete results too.
@Adam, I'll also try to get to this soon.
@Popo. No this isn't just the English Wikipedia. The data include all geotags in any language.
Ooops. I just realized it's written in big nice letters at the bottom of each pic :-b
These maps have a note "Metadata and more maps available at geospace.co.uk". Where are they exactly? I cant find...
well if you compare with the map of the density of inhabitants you will see the same maxima in Europe and US. Only China is a surprise to me, but I remember that wikipedia was blocked in China for years (it may still be). It is logical that people talk and write about their environment, their country, their history since it is what they know the best.
This blog post is almost blaming wikipedia to be focused on a few countries. Since wikipedia is written by the people for the people it is logical that the maximal density area get the best coverage. The solution will be to promote higher education, and internet access to poorer countries but this is beyond the objective of the free encyclopedia. If I want to know more about a country in Africa, I will read articles available in the press or will visit my library, but I will not blame the people from the US for not writing anything about Africa in wikipedia.
You should take into account the percentage of English speaking people in the countries represented, as well. That is if you are only counting the English Wikipedia. The figures might be different if you take into account other language Wikipedias.
@mikk: I'm afraid I haven't had the time to upload more maps or data just yet. I'll try to get to it later this week though.
@Franck: The point of this blog is not to blame Wikipedia or its editors. My point with these maps is to highlight some of the gaps in knowledge that we can: (1) work on filling in; and (2) be aware of when using Wikipedia as a resource.
@Anonymous: These maps show the results from all languages. Not just English.
Your data is interesting but was not exactly surprising to me. I think many countries in Africa or other poorer nations do not have easy access to technology or the awareness to tell their story on Wikipedia.Thus they are underrepresented in the number of articles. Also, they do not get many visitors who will be compelled to write an entry for them.
Apart from that I have 2 suggestions:
1. Can you plot the country of origin of the authors of the entries (by geoconverting their IPs) on a heat map? I suspect that this closely resemble your first map or maybe not?
2. Can you then plot the geographical distribution of authors for entries corresponding to a particular region? This data should be interesting. One would suspect the the authors editing articles about a region would be from that region.
This is quite interesting, but I wonder how this compares to established reference works such as Britannica or World Book. Would they map the same?
You need think about it. Despite the emails, the overwhelming evidence showing global warming is happening hasn't changed.
"The e-mails do nothing to undermine the very strong scientific consensus . . . that tells us the Earth is warming, that warming is largely a result of human activity," Jane Lubchenco, who heads the National Oceanic and Atmospheric Administration, told a House committee. She said that the e-mails don't cover data from NOAA and NASA, whose independent climate records show dramatic warming.
good posting.i like it. thank u. :)-
bathmate
It's really a reflection of internet penetration - countries with cheaper / faster internet connections have more wikipedia articles.
Post a Comment