If you are using virtual envs:
# only python3 is supported python3 -m venv /path/to/your/new/venv source /path/to/your/new/venv/bin/activate pip install -r requirements.txt
In order to be able to retrieve a list of hosts from netbox directly, you will have to add a config file under "~/.config/netbox/config.json" with the contents:
{ "netbox_url": "https://netbox.wikimedia.org/api", "api_token": "your-api-token" }
To create an API token, you can got to https://netbox.wikimedia.org/user/api-tokens/
The goal of these scripts is to be able to asses if a change in the stack (new kernel, new ceph version, etc.) has had any effect in the performance of the cluster. With that in mind, the idea is to get a snapshot of all the stack levels of the deployment before and after, and build a comparison for each of them.
You can also compare different levels of the stack at the same moment, though that's out of scope currently, currently only 2 result sets can be compared at a time.
The chosen stack levels are:
There's a python script scripts/run_on_env.py
that will randomly gather one host from each stack level, run the test and gather the results under the parent directory (under 'results/<SITE_NAME>/').
The results will be ready to use to generate a full stack comparison report.
There's a helper script, scripts/execute_remote_test.sh
that will run the given stack layer tests on the given host and copy locally the results (in the results directory).
Once you have at least two different result sets (the directories under ./results), you can generate a comparative report by running the script scripts/generate_reports.py
, that will build a self-contained html page with interactive graphs and some stats highlighting the results that are "better".
Example:
./scripts/generate_reports.py \ --verbose env-report \ --outfile-prefix reports/full_stack/2021-03-18_vs_2021-03-19 \ --before-data-dir results/codfw/full_stack/2021-03-18_18-11-16 \ --after-data-dir results/codfw/full_stack/2021-03-19_15-41-05/ \ --description "No changes in between, just two plain runs."
If you want to just compare two runs for a single stack level, or even compare just two any host runns, you can run generate_reports.py generate-level-report
and it will use only those two host runs. For example:
./scripts/generate_reports.py \ level-report \ --outfile-prefix $(date +%Y-%m-%d)_vm_against_osd \ -d results/full_stack/codfw/2021-03-19_15-41-05/vm_disk/ \ -n VM \ -D results/full_stack/codfw/2021-03-19_15-41-05/rbd_from_osd/ \ -N OSD
That will open a new browser window for you to preview the report. If everything is ok, in order to commit and make it available, you can gzip it:
gzip *vm_against_osd.html
and move it to the reports directory:
mv *vm_against_osd.html.gz reports/full_stack/
Now you can commit the report and the results and send a patch.
For the VM disk tests, a dedicated VM has been chosen due to having a rate limit on a per-VM basis setup on Openstack, that if we were to run on an existing VM would affect any other service running on it. It also allows for more reproducible test due to the VM level isolation (though might still be affected by noisy neighbours at the hypervisor, network and ceph cluster levels).
Essentially is the size of the 'queued' io requests fio submits at any time, it's a maximal limit, that might be shortened by the OS.
Usually lower io depth means that each operation is flushed from the queue faster, giving lower latencies, but lower iops, while highel io depth gives the opposite result, rising latencies (as operations might be now batched) up higher iops (as it's overall more efficient in number of overall total iops).
Related docs: