OSM Tileserver/Archive
< OSM Tileserver
This page contains historical information. It is probably no longer true.
This page is for the project to implement an OpenStreetMap TileServer hosted in WMF's production infrastructure which serves the same basic data as OSM themselves (synced from them). This is orthogonal to (or in some other sense, a prerequisite for) other OSM efforts going on in Labs for related overlays and databases. Corrections to the information below are welcome, this is all to the best of my knowledge at this time...
2013-10-07 - Still working on deployment/puppetization -related things
Bug: https://bugzilla.wikimedia.org/show_bug.cgi?id=33980
There seem to be multiple drivers for this project. The main one from WMF Engineering's perspective is that we want a reliable and robust tileserver to serve tiles for use in our mobile apps. Also, once this tileserver is available, it can be used by the Toolforge community as well.
The obvious alternative is using OSM directly from our pages/mobile. The principle rationales I've heard for this solution over sending the traffic straight to OSM seem to be:
The primary goal for this project is to implement a wikipedia-scale OSM tileserver. The data will be the normal public OSM dataset available at e.g. http://planet.openstreetmap.org/ . We'll additionally support HTTPS, and obviously we'll adhere to our privacy policies, and scale the solution out to meet our demands. There will be a single set of standard tiles in standard image format. Once we have that running in production, we can start gathering more data on scaling, user activity, community needs, etc and develop further plans, some of which may entail significant upgrades to the architecture as warranted by need and allowed by the costs involved.
Specifics, where known or relevant:
Timeline / Plan
Basic investigative work is mostly complete. The steps forward from here (with rough timeline guesses):
Software discussion
The software stack looks something like:
Varnish -> Apache -> mod_tile -> (renderd || Tirex) (-> FS-cache || -> Mapnik -> PostGIS)
- If you use renderd rather than Tirex, it has better support for scaling out already built in, if I am not mistaken (about Tirex). E.g. renderd has support for using ceph/rados as a tile storage and support to load balance rendering requests properly over multiple servers. apmon (talk) 18:49, 8 August 2013 (UTC)
- Updated that above (Tirex choice was arbitrary at the time). Been discussing with Faidon a bit about the FS cache, and I'm not (yet) convinced we need shared storage for that. I think we could do a consistent-hash map at the varnish layer that splits the load geographically and do local FS caches per backend machine that have some locality-of-reference (only end up rendering a certain geographic subset at the deep zoom levels, mostly). - Shared FS cache is not necessary and likely simplifies things if you leave it out. Indeed osm.org has just gone that route with operating two entirely independent rendering servers. The main advantage of a shared FS cache is probably load balancing of the rendering requests. This way you can have a single "master queue" which then optimally distributes rendering requests among the available rendering servers. This way you don't need to worry about what load patterns you have and if your hash function properly distributes them, which may even change throughout the day. However, if you have a fine enough interleaving this might not a big issue. While designing the hash function, do remember though that things get rendered in "metatiles" i.e. 8x8 tile blocks. The second advantage is that you can independently scale IO load for tile serving and CPU load for rendering. But if your varnish cashing storage in front of the tileservers is big enough (i.e. have a very high hit rate), that again might be less of an issue. If you don't go the shared cache route, then tirex might not be a bad choice. Through its plug-in system, it has greater flexibility. With it, it is easier to do things like geojson/vector tiles ( like http://a.www.toolserver.org/tiles/vtile/0/0/0.js ) which can be useful for client side rendering ( e.g. http://kothic.org/js/ ) or clickable features. apmon (talk) 02:22, 15 August 2013 (UTC)
Related Links
Future Work
Community requirements and general wishlist items that won't be in the initial deployment, but are a consideration for the future:
Extended State of Things - 2013-11-27
Larger issues with the scope and constraints of this project
Hardware is already bought and isn't right:
Really, direct PostgreSQL queries for rendering aren't going to scale well
No matter what avenue we go down here, managing a tile server infrastructure is going to be a large ongoing megaproject.
Basic technical notes from setting up a simple install
setup for db/render nodes on wmf Ubuntu Precise: (also, c.f. MaxSem's earlier single-node puppet commit here: https://gerrit.wikimedia.org/r/#/c/36222/ - which is still further along than any of this and apparently didn't work out either):
Precise has some packages available that work, and others have to come from Kai's PPA ultimately ( https://launchpad.net/~kakrueger/+archive/openstreetmap )
We can rebuild the binaries for 3 base packages from the PPA to import into our repo on brewster (to avoid direct PPA dependency, control local versions, avoid needing proxy to install, etc):
Changes: default to false on dl-coastlines in the .template file, remove suggest/recommend for libapache2-mod-tile and osm2pgsql
osm2pgsql - no changes
libapache2-mod-tile - remove suggest/recommend for osm2pgsql and postgis-db-setup
(note, all the changes above can also be done through apt options and dselect, but since we're rebuilding anyways it just makes life easier to change the defaults)
Render nodes:
apt installs: libapache2mod-tile, openstreetmap-mapnik-stylesheet-data, and mapnik-utils (from Ubuntu) + all their dependencies
use http proxy on brewster (http_proxy="​http://brewster.wikimedia.org:8080​") for coastline downloading after package install
Database nodes:
set up raid via bios as RAID10 - gives us ~560GB usable.
After basic OS stuff, bulk of space as XFS filesystem on /db
apt installs: mapnik-utils, osm2pgsql, openstreetmap-postgis-db-setup (which will also pull in postgis, etc)
relocate postgres data into /db (shut down, mv, symlink of /var/lib/postgresql/9.1/main to /db/main)
Postgres DB config (current params differing from default):
shared_buffers = 256MB
checkpoint_segments = 20
maintenance_work_mem = 512MB
autovacuum = off
kernel shmmax 536870912 (currently /etc/sysctl.d/99-temp-postgres.conf)
pull down latest planet data from http://planet.openstreetmap.org/planet/ via wget, using env var http_proxy="​http://brewster.wikimedia.org:8080​"
import into postgres via a command like:
su www-data -c 'osm2pgsql --create -s -d gis -C 32768 --cache-strategy=dense --unlogged --number-processes=8 /db/planet-131106.osm.bz2'
(this took approximately 20 hours last time, but that was with nprocs=4 - nprocs=8 may help - other tuning is possible I'm sure)
osmosis setup for timely data sync:
apt-get install osmosis
mkdir /db/osmosis
cd /db/osmosis
osmosis --rrii (initializes)
create state.txt to match planet import based on stamp + http://osm.personalwerk.de/replicate-sequences/?Y=2013&m=11&d=05&H=19&i=10&s=15&stream=minute#
osmosis --rri workingDirectory=/db/osmosis/ --sc --wpc user=www-data database=gis
is java-based and needs proxy via env var: JAVACMD_OPTIONS="-Dhttp.proxyHost=brewster.wikimedia.org -Dhttp.proxyPort=8080"
currently, crashes (probably a simple fix? this is where I last left off) with: org.springframework.jdbc.BadSqlGrammarException: StatementCallback; bad SQL grammar [SELECT version FROM schema_info]; nested exception is org.postgresql.util.PSQLException: ERROR: relation "schema_info" does not exist
Ok, Brandon, thanks for the report. I'm going to try an be as polite as I can here, forgive me if I fail. We have a running setup on the toolserver. It is in production on WMF sites. You may not be familiar with the extent of current mapping tool deployment. You may want to use the holidays to catch up on the OSM gadget and the meta:WikiMiniAtlas. I have a hard time believing that you or the WMF cannot do better than the toolserver. That admission would be nothing short of an embarrassment. I as a mapping data user (WMA) frankly feel f%^&&%ed over in the a&&. There is no other way to put it. If the toolserver gets shut down because of labs, the way I see it is that the WMF has a duty to provide adequate replacement. TS is going down in about 7 months. This is not much time, especially in hindsight if we look at what was accomplished on labs for the maps projects in the last 7 months ("nothing" would be hyperbole, but it is surprisingly little). I'm not posting here to blame anyone, I'm posting with a plea for an attitude change! It is very frustrating to read above a statement that essentially says "this won't scale up, let's forget about it". This is not in the spirit of us toolserver hackers. We have spent a considerable amount of our free unpaid time to build tools that we are now about to see going down the toilet. Please shift down a gear and forget about your scalability ideas for a month or two, and get us a system that will at least allow us to move our stuff over before it is dead! F&^%$ high performance for now. We need the bare minimum to migrate and have anything. The alternative is that we will have no mapping tools at all! Is that not clear? There is a saying in German that goes "perfection" is the enemy of "good enough. That's what I'm seeing here. Brandon, I know that you jumped on this train after it was already running (actually after it was already promised to have reached its destination) so please don't take this as a completely personal criticism (just take an appropriate amount of responsibility ;-). This is a very frustrating situation for me as a volunteer tool developer. --Dschwen (talk) 17:37, 28 November 2013 (UTC)
Let's try and keep this as much on the technical level as possible. I too think that the worries are a bit overblown, and looking at the various other tile server setups and how the current "in production system" on the toolserver works, which currently is deployed in a good fraction of the wikipedia sites, it is going to be much less of an issue than you think. However, we need to find a way forward that can demonstrate that the scaling issues indeed aren't going to be that much of an issue, or if Brandon turns out to be correct, to get some solid empirical evidence of exactly where the scaling bottlenecks are going to be and how they manifest them selves. Without that solid data, it will be hard for me or other developers of the tool stack to improve the software stack where it actually matters. I will try and write a more detailed point by point response to the issues you have raised in the next couple of days in order to try and find a way forward. But first of all happy Thanksgiving to all (in the US). apmon (talk) 18:10, 28 November 2013 (UTC)
Last edited on 15 May 2020, at 18:55
Content is available under CC BY-SA 3.0 unless otherwise noted.
Privacy policy
Terms of Use
 Home Random Log in  Settings  Donate  About Wikitech  Disclaimers