Page MenuHomePhabricator

cloudvps: maps project trusty deprecation
Closed, ResolvedPublic

Description

Ubuntu Trusty is no longer available in Cloud VPS since Nov 2017 for new instances. However, the EOL of Trusty is approaching in 2019 and we need to move to Debian Stretch before that date.

All instances in the maps project needs to upgrade as soon as possible.

The list of affected VMs is:

  • maps-tiles2.maps.eqiad.wmflabs
  • maps-tiles3.maps.eqiad.wmflabs
  • maps-warper2.maps.eqiad.wmflabs
  • maps-wma1.maps.eqiad.wmflabs

Listed administrator are:

More info in openstack browser: https://tools.wmflabs.org/openstack-browser/project/maps

TODO:

  • figure out the current configuration for the servers
  • create new Debian 9 (Stretch) VMs for each and configure puppet client
    • Created maps-tiles1 instance and a maps-puppetmaster instance to experiment with a puppet config
    • maps-tiles1 is mapped to maps.wmflabs.org (previously unused)
    • setup puppet
  • create new manifests for new versions of packages and configurations in puppet and deploy
  • transfer any data that needs to be transfered for proper tile server operation (likely mostly via nfs ?)
  • test that all functionality is transferred and that everything works ok
  • shut down old servers, turn on new ones
  • delete old servers

Event Timeline

There are a very large number of changes, so older changes are hidden. Show Older Changes

It seems overpass-wiki instance is not in use. Was once made by Jotpe in 2015. I've sent out an email via his wikimedia wiki account. I suggest we delete it if there is no response before the 18th.

And i've sent out an email on maps-l and wikitech-l

I'm willing to attempt the OS conversion on the VM's but at this point I don't have shell access to the tools project ( apparently infrastructure access is not needed - sorry I'm new... ) and I also need admin access to the maps project. I don't know how to go about getting this but participating in this discussion was suggested as a means, so I'm just leaving this comment in here for now.

I'm working on redoing the maps-wma1 instance as maps-wma. This involves a region change and as a consequecnce it seems the /mnt/nfs/labstore1003-maps directory, which contains my home directory on the old instance is empty on the new instance. Same goes for the project directory. Will I have to copy everything over? Why is there no home dir?

Why is there no home dir?

For murky historical reasons, mounting NFS on a new VM in that project requires a puppet patch. I'll make one for maps-wma now. Are there others that need the same?

Change 479764 had a related patch set uploaded (by Andrew Bogott; owner: Andrew Bogott):
[operations/puppet@production] nfs: add another VM to the Maps nfs mount

https://gerrit.wikimedia.org/r/479764

Change 479764 merged by Andrew Bogott:
[operations/puppet@production] nfs: add another VM to the Maps nfs mount

https://gerrit.wikimedia.org/r/479764

@dschwen, if you reboot maps-wma your nfs mounts should be more like what you'd expect now.

maps-warper2 has been migrated to maps-warper3 and web proxy (warper.wmflabs.org) switched too, with everything seeming to work okay, but I'd like at least a day before we turn off the old instance just in case...

@Chippyy can you please document the setup somewhere for the future ? That would be super helpful.

@Sasheto hi, I haven't forgotten about your request... Please don't take this the wrong way, but lately there has been quite a LOT of malicious activity and your account is brand new.. And I can't find any history of your activity with the projects... At this time that makes me uncomfortable to give you access to this set of servers as one of them is actually rather critical and could allow a person to do a lot of harm to wikimedians. I'm considering how we should approach this.

@TheDJ Thanks for letting me know and no, I'm not taking it personally. You don't find any history because I'm super new and don't have any contributions yet. Let me know if you have something in mind that I could do to gain the trust of this community prior to doing more critical tasks.

@Chippyy can you please document the setup somewhere for the future ? That would be super helpful.

created T212166 to track this.

Current progress:

  • Created maps-tiles1 with debian stretch
  • Created maps-puppetmaster to be able to write experiment with puppet manifest for the new host
  • Currently stuck on installing puppet master standalone, as it insists on creating /home/gitpuppet, which fails (acl ???) Now fixed
  • Figured out more of the dependencies and setup of the old server. Added to the documentation
  • Pondering how to tackle installing mod_tile in the future, as there is no package for this. Can compile from source, but..
  • make install mod_tile doesn't work for now as it fails to install on libiniparser Fixed by installing from /tmp ?
  • Wondering what the difference is between puppet httpd and apache2 modules, they seem to be used interchangeably..

I have shut down the maps-warper2 instance

(I'd like to keep it around for a week or so before deletion though, just in case)

Hi! Thanks for your work on this.

FYI since the deadline already passed, we agreed on shutting down remaining Trusty instances on 2019-01-18. More info at https://wikitech.wikimedia.org/wiki/News/Trusty_deprecation#Cloud_VPS_projects
It would be great if you folks can have the migration done by then.

maps-tiles1 now has access to the pgsql server.

When running the renderer I see SELECT ST_SRID("way") AS srid FROM planet_osm_polygon WHERE "way" IS NOT NULL LIMIT 1; ERROR: permission denied for relation planet_osm_polygon but the old tiles servers seem to have that problem as well.

Now i need a way to verify if the new renderer is creating proper tiles. I hope to be able to get to that this weekend.

maps-tiles2, maps-tiles3 and maps-wma1 should be shutoff today... It looks like maps-wma1 is still critical, what about the other two? Should this be extended given there is work actively being done on it?

Please, someone provide some estimation of time required for the maps-wma1 instance. https://tools.wmflabs.org/openstack-browser/server/maps-wma1.maps.eqiad.wmflabs

It's now over a week since more information was requested.

I've had 4 hours since Christmas that i was able to spend on this ticket, all 4 were spent on that postgres issue (the patch for which was open for 3 weeks btw... just sayin)
I still haven't had time to verify if tiles1 is actually working now. (I suspect it's not actually.)

For maps-wma1, there is a new maps-wma but i'm not sure if @dschwen has worked on it since T204506#4824815

Spend a couple of hours. I now have tile rendering working again on tiles3 of the old instance as well as on tiles1 of the new server.

The new server is generating into a separate directory for now, as I wanted to confirm operation. Will work on consolidating the configurations in the next few days and attempt to verify operation of the new server. Then i should be able to let go of the old tiles instances.
http://tiles.wmflabs.org/osm/slippymap.html now shows tiles from the new instance. I'll clean that every day, as i don't want us to double the /data/project usage.

FYI, pretty sure that tile updates were not running for many many months, as @Bstorm suspected in T215560: Can't read from OSM replicas when connecting from Stretch bastion

For maps-wma1, there is a new maps-wma but i'm not sure if @dschwen has worked on it since T204506#4824815

Yeah, I have not. The home directory has not magically appeared. Do I have to rebuild the VM? I'm confused.

@dschwen the instance needs to be rebooted for it to appear.

Ok, will do that later. No access from work.

@dschwen I triggered a reboot on it and can confirm the homedirs are mounted on that instance.

Hmm, osm tile rendering seems extremely slow...

/usr/bin/render_speedtest 
Rendering client
Starting 1 rendering threads
Initial startup costs
Rendered 1 tiles in 0.00 seconds (62500.00 tiles/s)

Zoom(0) Now rendering 1 tiles
Rendered 1 tiles in 656.16 seconds (0.00 tiles/s)

Zoom(1) Now rendering 2 tiles
Rendered 2 tiles in 921.96 seconds (0.00 tiles/s)

This is on both old and new tile servers.
Anyone know if that is normal ? 15 minutes for 2 tiles at zoom level 1 ?

11 minutes for z0 is good (Kartotherian does this in ~13).

Hmm, think I just figured out that none of the cached tiles have been expired since at least early 2015... learning so much...

I have now setup munin and have a grip on how the clusterization of renderd works. Next step is to make tiles1 the new master for renderd and http traffic ! Hope to get to that tomorrow evening.

The tiles traffic and tiles rendering is now primarily on the new server. I intend to verify this for a few days before shutting down the old servers completely.

After T218145: maps: take back root owned files/dirs from root_squash protected nfs old tiles can suddenly be regenerated again, causing an explosion of tile render events, in turn causing most render jobs to be dropped. I have temporarily disabled serving of hikebike, osm-no-labels and osm-bw, to allow the server to slowly catch up a bit on rendering old tiles.

TIL, that if you append /status to a tile, you can find out when it was last generated, and adding /dirty causes it to be put on the render queue.

Still seem to have some issues in other spots.
Also still need to take care of adding config files to redirect renderd logging to /var/log/renderd.log and a logrotate for that as well.

A user on one of the subtasks wrote:

Hello,
JOSM maintainer here.
Since a few days we're unable to access two imagery layers hosted on tiles.wmflabs.org: 'https://tiles.wmflabs.org/osm-no-labels/{zoom}/{x}/{y}.png' and 'https://tiles.wmflabs.org/bw-mapnik/{zoom}/{x}/{y}.png'.
Is it due to this change? Is the service still provided?

Hi @Don-vip

Yes, as you may note from this ticket, this service has been mostly running unmaintained since 2015 (many tiles were not refreshed since date as well... ). As we are going through several large changes in our cloud platform these services needed migration. Over the past 3 months, as a volunteer, I've taken on the job of getting familiar with this service and starting to rebuild this service. This might lead to reduced availability of maps in the short term as I deal with several large problems in the little time I have available to work on this.

It is my intent that hikebike, osm-bw and osm-no-labels are retained in the long run.

@TheDJ: Thanks for working on this. To clarify the status of this task, are the tiles2/tiles3/wma1 instances still serving traffic, running important background processes, or otherwise in use?

Tiles2 and tiles3 are mostly standing by doing nothing (i've incidentally used them to boost bulk rendering capacity as ive been rendering 4 years of not updated tiles)

Wma1 is still fully and wholly doing all wikiminiatlas work. I have not done any work on it.

So, just to confirm my understanding of this... maps-tiles2 and maps-tiles3 can be shut down whenever, and maps-wma1 still needs a new person to work on it?

@dschwen can you please catch us up on your plan for maps-wma1.maps.eqiad.wmflabs? Have you already built a replacement for it in the new region?

@dschwen, what's the latest regarding maps-wma1?

@TheDJ I'm shutting down maps-tiles2 and maps-tiles3 right now. Will delete in a few days if there are no ill effects.

maps-wma1 can be deleted. There is a new maps-wma instance. I still have to switch over some webproxies. But I'll do that today. I ported my tile render code to mapnik 3, but still have to do some minor things like converting my upstart scripts to systemd. This will only affect rendering of new tiles - but most of the world is already cached by now.

Mentioned in SAL (#wikimedia-cloud) [2019-04-01T14:51:46Z] <andrewbogott> shutting down maps-wma1 as per T204506

:-/ I need to delete/re-add proxies to point to the new instance. I have proyx urls like 1.wma.wmflabs.org and label.wma.wmflabs.org. But the new horizon interface wont let me add a two-level subdomain like label.wma (tells me to specify a name without dots in it).

Yep, the only proxies supported are things like <proxyname>.wmflabs.org. You might be able to set up cnames pointing to the existing proxies but I'm not sure how that will work with https

:-/ I need to delete/re-add proxies to point to the new instance. I have proyx urls like 1.wma.wmflabs.org and label.wma.wmflabs.org. But the new horizon interface wont let me add a two-level subdomain like label.wma (tells me to specify a name without dots in it).

We have scripts that will let these legacy proxy entries on the backend. Support for them has been removed from the normal user facing UI because of incompatibility with our TLS certificates. Make a list here of the (proxy name, backing instance, backing port) triples you need and I can create them. I would also recommend you make "normal" proxies more like "wma-1.wmflabs.org" and migrate your code to use them as soon as you can so that TLS will be possible.

Ugh, this is borked. I cannot delete the proxy entries with the . in the name. I get You have selected: . Please confirm your selection. This action cannot be undone. and after pressing Delete the entry is still there.

@bd808 horizon does not let you use a dash either!

Ugh, this is borked. I cannot delete the proxy entries with the . in the name. I get You have selected: . Please confirm your selection. This action cannot be undone. and after pressing Delete the entry is still there.

Yes. This is expected as well. From the point of view of the Horizon app these legacy proxies are data corruption. I had to make some command line tools to manage them when we moved the proxies for the tiles hosts in this project as well.

And the dash? (anyhow, just delete all wma proxies containing a dot at your convenience, please)

@bd808 horizon does not let you use a dash either!

Created T219789 for this. Bryan's script should be able to work around that in the meantime.

btw @dwschen, while you're doing this I encourage you to create standardized urls to replace your old extra.dot.wmflabs.org proxy names so you can move clients over to using the new tls-supporting urls. It's generally bad to use the unencrypted services and I can't guarantee any kind of long-term support for them.

@Andrew , Yes, I am planning to update my code to use non-dot subdomains. But I'm waiting until the dash issue is resolved.

I didnt mention it in this ticket yet, so for completeness sake: Last week i started redirecting [abc].tiles .wmflabs.org to https://tiles.wmflabs.org for the exact same reason.

Attempting to fix the proxies for @dschwen:

$ sudo /usr/local/sbin/wmcs-webproxy --project=maps list
domain                                           backend
================================================ ========================
0.wma.wmflabs.org                                http://10.68.16.70:80
1.wma.wmflabs.org                                http://10.68.16.70:80
2.wma.wmflabs.org                                http://10.68.16.70:80
3.wma.wmflabs.org                                http://10.68.16.70:80
4.wma.wmflabs.org                                http://10.68.16.70:80
5.wma.wmflabs.org                                http://10.68.16.70:80
6.wma.wmflabs.org                                http://10.68.16.70:80
7.wma.wmflabs.org                                http://10.68.16.70:80
a.tiles.wmflabs.org                              http://172.16.5.154:80
b.tiles.wmflabs.org                              http://172.16.5.154:80
c.tiles.wmflabs.org                              http://172.16.5.154:80
label.wma.wmflabs.org                            http://10.68.16.70:80
maps.wmflabs.org                                 http://172.16.5.154:80
oldwma.wmflabs.org                               http://10.68.16.70:80
tiles.wmflabs.org                                http://172.16.5.154:80
warper.wmflabs.org                               http://172.16.0.158:80
wma.wmflabs.org                                  http://172.16.1.144:80
wma0.wmflabs.org                                 http://172.16.1.144:80
wma1.wmflabs.org                                 http://172.16.1.144:80
wma2.wmflabs.org                                 http://172.16.1.144:80
wma3.wmflabs.org                                 http://172.16.1.144:80
wma4.wmflabs.org                                 http://172.16.1.144:80
wma5.wmflabs.org                                 http://172.16.1.144:80
wma6.wmflabs.org                                 http://172.16.1.144:80
wma7.wmflabs.org                                 http://172.16.1.144:80
$ for i in $(seq 0 7); do
  sudo /usr/local/sbin/wmcs-webproxy --project=maps delete ${i}.wma
done
$ sudo /usr/local/sbin/wmcs-webproxy --project=maps delete label.wma

:# At this point I had to use Horizon to manually delete dangling DNS records in the wma.wmflabs.org zone and ultimately the empty zone itself.
:# Once that was done I was able to re-create the proxy entries pointing to the new backend

$ for i in $(seq 0 7); do sudo /usr/local/sbin/wmcs-webproxy --project=maps add ${i}.wma http://172.16.1.144:80; done
2019-04-01T23:35:29Z mwopenstackclients.DnsManager WARNING : Creating 0.wma.wmflabs.org.
2019-04-01T23:35:32Z mwopenstackclients.DnsManager WARNING : Creating 1.wma.wmflabs.org.
2019-04-01T23:35:34Z mwopenstackclients.DnsManager WARNING : Creating 2.wma.wmflabs.org.
2019-04-01T23:35:37Z mwopenstackclients.DnsManager WARNING : Creating 3.wma.wmflabs.org.
2019-04-01T23:35:40Z mwopenstackclients.DnsManager WARNING : Creating 4.wma.wmflabs.org.
2019-04-01T23:35:42Z mwopenstackclients.DnsManager WARNING : Creating 5.wma.wmflabs.org.
2019-04-01T23:35:45Z mwopenstackclients.DnsManager WARNING : Creating 6.wma.wmflabs.org.
2019-04-01T23:35:48Z mwopenstackclients.DnsManager WARNING : Creating 7.wma.wmflabs.org.
$ sudo /usr/local/sbin/wmcs-webproxy --project=maps add wma http://172.16.1.144:80
2019-04-01T23:36:03Z mwopenstackclients.DnsManager WARNING : Creating wma.wmflabs.org.
$ sudo /usr/local/sbin/wmcs-webproxy --project=maps add lable.wma http://172.16.1.144:80
2019-04-01T23:41:06Z mwopenstackclients.DnsManager WARNING : Creating lable.wma.wmflabs.org.
$ sudo /usr/local/sbin/wmcs-webproxy --project=maps list
domain                                           backend
================================================ ========================
0.wma.wmflabs.org                                http://172.16.1.144:80
1.wma.wmflabs.org                                http://172.16.1.144:80
2.wma.wmflabs.org                                http://172.16.1.144:80
3.wma.wmflabs.org                                http://172.16.1.144:80
4.wma.wmflabs.org                                http://172.16.1.144:80
5.wma.wmflabs.org                                http://172.16.1.144:80
6.wma.wmflabs.org                                http://172.16.1.144:80
7.wma.wmflabs.org                                http://172.16.1.144:80
a.tiles.wmflabs.org                              http://172.16.5.154:80
b.tiles.wmflabs.org                              http://172.16.5.154:80
c.tiles.wmflabs.org                              http://172.16.5.154:80
lable.wma.wmflabs.org                            http://172.16.1.144:80
maps.wmflabs.org                                 http://172.16.5.154:80
oldwma.wmflabs.org                               http://10.68.16.70:80
tiles.wmflabs.org                                http://172.16.5.154:80
warper.wmflabs.org                               http://172.16.0.158:80
wma.wmflabs.org                                  http://172.16.1.144:80
wma0.wmflabs.org                                 http://172.16.1.144:80
wma1.wmflabs.org                                 http://172.16.1.144:80
wma2.wmflabs.org                                 http://172.16.1.144:80
wma3.wmflabs.org                                 http://172.16.1.144:80
wma4.wmflabs.org                                 http://172.16.1.144:80
wma5.wmflabs.org                                 http://172.16.1.144:80
wma6.wmflabs.org                                 http://172.16.1.144:80
wma7.wmflabs.org                                 http://172.16.1.144:80

I deleted the typo'ed lable.wma.wmflabs.org and created label.wma.wmflabs.org with the same target.

Andrew claimed this task.

The last of the shutdown VMs have been deleted, so this is now done!