Help:Toolforge/Web
< Help:Toolforge
Overview
Every Toolforge tool can run a dedicated <toolname>.toolforge.org website. Toolforge provides the webservice command which is used to start and stop the web server for each tool. Toolforge supports websites written in several programming languages including PHP, Python, Node.js, Java, Ruby and others. Toolforge also provides some support services which can help you make your website’s visitors safe from tracking by third party services.
The webservice command uses convention over configuration for some aspects of how the website is deployed. You’ll find details for different programming languages below.
Using the webservice command
You can use the webservice command to start, stop, restart, and check the status of a webserver.
webservice command example
$ ssh login.toolforge.org $ become my_cool_tool $ webservice start
Use webservice --help to get a full list of arguments.
Without any additional arguments or configuration files, webservice start will currently start a PHP 7.3 Kubernetes container serving content from your tool's $HOME/public_html directory using lighttpd as the web server software.
Webservice templates
The webservice command has the concept of a "template" file which can be used to store arguments (and eventually other structured content) for starting a webservice. The code will look for a --template=... command line argument and fallback to looking for a $HOME/service.template file. The $HOME/service.template file is what most tools will be expected to use, but we may find interesting uses for multiple templates in a single tool as well.
A webservice template file is a YAML document. It can contain these settings:
By saving desired startup state in a file, the user can use simple webservice stop; webservice start commands again!
Choosing a backend
Toolforge provides two different execution environments for web servers: Kubernetes and Grid Engine.
The Kubernetes backend provides more modern software versions and is the default backend. The Grid Engine backend is used primarily by legacy tools which were developed before Kubernetes was available. Toolforge administrators recommend that you try using Kubernetes first for new tools and only use the Grid Engine backend if there is a technical limitation that prevents your tool from running inside Kubernetes.
Common features
Both the Kubernetes and Grid Engine backends share common infrastructure services for serving web sites. Toolforge has an Nginx server configured as a proxy server which handles all inbound requests to your tool's web server. This proxy server takes care of providing TLS termination and then reverse proxies the inbound request to your tool's web service. Web servers running on Kubernetes have a second Nginx proxy server running as the "Ingress" component inside the Kubernetes cluster. See Portal:Toolforge/Admin/Kubernetes/Networking and ingress for detailed information about the network and web request routing used by the Toolforge Kubernetes cluster.
Toolforge also includes a 404 handler service which will respond to HTTP requests for tools which do not exist and tools which are not currently running a web service. This service is implemented as the fourohfour tool which runs on the Kubernetes backend.
Kubernetes
Kubernetes (k8s) is a platform for running containers. Kubernetes web servers have access to newer versions of most software than the Grid Engine provides. K8s also provides a more robust system for restarting tools automatically following an application crash.
Maintainer visible differences from Grid Engine based Web services
  1. Each process runs inside a Docker container, orchestrated by Kubernetes.
    • Provides better resource isolation (one tool can not take down other tools by consuming all RAM or CPU)
    • Better health checking (monitoring built into Kubernetes, not a hack we wrote)
    • Less complex proxy setup, leading to fewer proxy related outages / issues
  2. Containers available based on newer Debian versions (Buster)
    Newer software versions than those available with Debian Stretch
  3. It is not possible to interact with the Grid Engine from Kubernetes (no jsub...)
  4. Kubernetes backend has specific webservice options:
    -m MEMORY, --mem MEMORY Set higher Kubernetes memory limit -c CPU, --cpu CPU Set a higher Kubernetes cpu limit -r REPLICAS, --replicas REPLICAS Set the number of pod replicas to use
Grid Engine
The Grid Engine backend runs your web server as a job on a Debian Stretch grid exec node. This is similar to the way that jsub runs any grid job you submit, but there is a separate exec queue on the grid for running jobs started by webservice.
Switching between Kubernetes and Grid Engine
From Kubernetes to Grid Engine
$ webservice --backend​=​kubernetes stop $ webservice --backend​=​gridengine start
From Grid Engine to Kubernetes
$ webservice --backend​=​gridengine stop $ webservice --backend​=​kubernetes <type> start
Default web server (lighttpd + PHP)
See: Help:Toolforge/Web/Lighttpd
PHP
See: Help:Toolforge/Web/PHP
Python
See: Help:Toolforge/Web/Python
Node.js web services
See: Help:Toolforge/Web/Node.js
Java
See: Help:Toolforge/Web/Java
Other / generic web servers
You can run other web servers that are not directly supported. This can be accomplished using the generic webservice type on the Grid Engine backend or a runtime specific type on the Kubernetes backend.
To start a webserver that is launched by a script at /data/project/toolname/code/server.bash​, you would launch it with:
$ webservice --backend​=​gridengine generic start /data/project/toolname/code/server.bash
Your script will be passed an HTTP port to bind to in an environment variable named PORT. This is the port that the Nginx proxy will forward requests for https://YOUR_TOOL.toolforge.org/ to. When using the Kubernetes backend, PORT will always be 8000. When using the Grid Engine backend, PORT will change each time the webservice start or webservice restart command is run.
Common tasks and guides
Hosting large files
Toolforge storage uses NFS which has limited storage and network bandwidth. If your tool requires a static file larger than 1GB (for example serving up a container image or tarball), please store that file in the 'Download' project rather than storing it in your tools home directory.
The Download project hosts https://download.wmcloud.org​, a public read-only web server for large file storage. If you would like a file added, create a Phabricator ticket or contact WMCS staff directly to have the file added.
Serving static files
Files placed in a tool's $HOME/www/static directory are available directly from the URL tools-static.wmflabs.org/toolname​. This does not require any action on the tool's part — putting the files in the appropriate folder (and making the directory readable) should 'just work'.
You can use this to serve static assets (CSS, HTML, JS, etc) or to host simple websites that don't require a server-side component.
Load external assets using our CDN services
To preserve the privacy of our users, avoid embedding assets (images, CSS, JavaScript) from servers outside of Wikimedia Foundation control.
Libraries (Browse libraries)
Toolforge provides an anonymizing reverse proxy to cdnjs.
Fonts (Search fonts)
Toolforge provides an anonymizing reverse proxy to Google Fonts.
Maps (Documentation)
Wikimedia provides maps servers with data from OpenStreetMap.
Runtime memory limits
Requesting additional tool memory
Kubernetes web servers start with a default limit on both runtime memory and cpu power. These limits vary slightly based on which runtime language (PHP, Python, Java, etc) you are using. The --cpu and --mem command line arguments can be used to increase these defaults up to the quota limit for your tool's Kubernetes namespace. See Kubernetes#Quotas and Resources for instructions on requesting an increased quota for your tool.
For Grid Engine webservices, request more tool memory by opening a Phabricator task
Notify the #wikimedia-cloud connect IRC channel that you have filed a request.
A Cloud Services administrator will review your request and can create a /data/project/.system/config/$TOOLNAME.web-memlimit configuration file that will adjust the limit.
Response buffering
An Nginx proxy sits between your webservice and the user. By default this proxy buffers the response sent from your server. For some use cases, including streaming large quantities of data to the browser, this can be undesirable. Buffering can be disabled on a per-request basis by sending an X-Accel-Buffering: no header in your response.[1]
/favicon.ico
Tracked in Phabricator
Task T251628RESOLVED
A default image will be served by the shared proxy layer if your webservice returns a 404 Not Found response when asked for /favicon.ico. This default icon is the same as the one found at https://tools-static.wmflabs.org/toolforge/favicons/favicon.ico​.
/robots.txt
Tracked in Phabricator
Task T251628RESOLVED
A default response will be served by the shared proxy layer if your webservice returns a 404 Not Found response when asked for /robots.txt. The default robots.txt response denies access to all compliant web crawlers. We decided that this "fail closed" approach would be safer than a "fail open" telling all crawlers to crawl all tools.
Any tool that does wish to be indexed by search engines and other crawlers can serve their own /robots.txt content. Please see https://www.robotstxt.org/ for more information on /robots.txt in general.
Communication and support
We communicate and provide support through several primary channels. Please reach out with questions and to join the conversation.
Communicate with us
ConnectBest for
Phabricator Workboard#Cloud-ServicesTask tracking and bug reporting
IRC Channel#wikimedia-cloud​connect
Telegram bridge
mattermost bridge
General discussion and support
Mailing Listcloud@Information about ongoing initiatives, general discussion and support
Announcement emailscloud-announce@Information about critical changes (all messages mirrored to cloud@)
News wiki pageNewsInformation about major near-term plans
Cloud Services BlogClouds & UnicornsLearning more details about some of our work
Wikimedia Technical Blogtechblog.wikimedia.orgNews and stories from the Wikimedia technical movement
References
↑​https://www.nginx.com/resources/wiki/start/topics/examples/x-accel/
See also
Last edited on 7 June 2021, at 18:31
Wikitech
Content is available under CC BY-SA 3.0 unless otherwise noted.
Privacy policy
Terms of Use
Desktop
HomeRandomLog inSettingsDonateAbout WikitechDisclaimers
WatchEdit