Shellbox

From Wikitech
Shellbox safely sandboxes unsafe command execution.

Shellbox is a library for remote command execution, and a server for secure command execution. It was primarily implemented to sandbox lilypond (used by the Score extension) and provide a way for MediaWiki to utilize external binaries without needing them to be in the same container. Shellbox relies on Kubernetes (and Linux containers/namespaces) to provide isolation and resource limits for external commands.

Documentation for integration in MediaWiki is available at mw:Shellbox, operational aspects are here on Wikitech.

Architecture

Architecture overview of Shellbox

Requests come into an Apache httpd container, which contains the Shellbox secret key as a configmap. The request is passed onto a php-fpm container, which contains the Shellbox code and necessary binaries. Once the request is authenticated, Shellbox executes the command as the www-data user. The response is then sent back. Yeah.

MediaWiki talks to Shellbox over a local envoyproxy.

Shellboxes

We currently have five Shellboxes in active use:

Monitoring

Shellbox provides a /healthz endpoint that can be used to quickly check if the service is up, e.g.:

user@host$ curl https://shellbox.discovery.wmnet:4008/healthz
{
    "__": "Shellbox running",
    "pid": 10782
}

All other requests are harder to externally construct since they need to be signed with the Shellbox secret key.

Bugs should be reported/tracked in #Shellbox on Phabricator.

Logs

All logs from httpd and php-fpm should end up in logstash. You can filter for a specific Shellbox deployment with kubernetes.namespace_name:"shellbox-constraints". The actual log text is under the field log (not message like MediaWiki). The httpd access logs that are HTTP 200 are dropped because of the volume and minimal likelihood they'll be useful.

All Shellbox invocations should still be logged under MediaWiki's exec log channel too.

Deploying a new version

When you merge a new patch to shellbox and want to deploy it to production, the procedure is simple:

  • Change the value of shellbox.version in helmfile.d/services/shellbox/global.yaml in the deployment-charts repository
  • Deploy each version of shellbox following the general deployment guidelines. If you want a quick way to cycle through them, once you've tested how they perform on staging:
    cd /srv/deployment-charts/helmfile.d/services
    # Change the DC according to your needs
    DC=codfw
    for deployment in shellbox*; do
      echo "#### Doing $deployment"
      sleep 5
      pushd $deployment
      helmfile -e $DC -i apply --context 5
      popd
    done
    

Smoke test

Quick verification that the containers are at least running:

cd /srv/deployment-charts/helmfile.d/services
# Change the DC according to your needs
DC=staging
for deployment in shellbox*; do
  echo "#### Checking $deployment in $DC"
  kube_env $deployment $DC
  curl https://staging.svc.eqiad.wmnet:$(kubectl get service shellbox-main-tls-service -o jsonpath='{.spec.ports[0].nodePort}')
done

This curl call is expected to return a JSON payload describing an error processing the request because no action was encoded in the URL.

{
    "__": "Shellbox server error",
    "class": "Shellbox\\ShellboxError",
    "message": "No action was specified",
    "log": [
        {
            "level": 400,
            "message": "Exception of class Shellbox\\ShellboxError: No action was specified",
            "context": {
                "trace": "#0 /srv/app/src/Server.php(72): Shellbox\\Server->guardedExecute('/srv/app/config...')\n#1 /srv/app/src/Server.php(61): Shellbox\\Server->execute('/srv/app/config...')\n#2 /srv/app/index.php(3): Shellbox\\Server::main('/srv/app/config...')\n#3 {main}"
            }
        }
    ]
}

Source code