Page MenuHomePhabricator

PyYAML fails to install in python3 venv on Stretch grid host
Closed, ResolvedPublic

Description

Found while attempting to migrate a python3 tool from the Trusty job grid to the new Stretch job grid. In addition the compilation process hangs for a very long time while attempting the build.

$ time ./venv/bin/pip install -v -r src/keystone_browser/requirements.txt 2>&1 | tee ~/pip.log
... (lots of pip doing pip stuff) ...

    x86_64-linux-gnu-gcc -pthread -DNDEBUG -g -fwrapv -O2 -Wall -Wstrict-prototypes -g -fdebug-prefix-map=/build/python3.5-3.5.3=. -fstack-protector-strong -Wformat -Werror=format-security -Wdate-time -D_FORTIFY_SOURCE=2 -fPIC -I/mnt/nfs/labstore-secondary-tools-project/trusty-deprecation/www/python/venv/include -I/usr/include/python3.5m -c ext/_yaml.c -o build/temp.linux-x86_64-3.5/ext/_yaml.o

... (lots of -Wall warnings) ...

    x86_64-linux-gnu-gcc: internal compiler error: Killed (program cc1)
    Please submit a full bug report,
    with preprocessed source if appropriate.
    See <file:///usr/share/doc/gcc-6/README.Bugs> for instructions.
    error: command 'x86_64-linux-gnu-gcc' failed with exit status 4
    Running setup.py install for PyYAML: finished with status 'error'
...
pip._internal.exceptions.InstallationError: Command "/mnt/nfs/labstore-secondary-tools-project/trusty-deprecation/www/python/venv/bin/python3 -u -c "import setuptools, tokenize;__file__='/tmp/pip-install-ldj7z_g9/PyYAML/setup.py';f=getattr(tokenize, 'open', open)(__file__);code=f.read().replace('\r\n', '\n');f.close();exec(compile(code, __file__, 'exec'))" install --record /tmp/pip-record-awh2o4lw/install-record.txt --single-version-externally-managed --compile --install-headers /mnt/nfs/labstore-secondary-tools-project/trusty-deprecation/www/python/venv/include/site/python3.5/PyYAML" failed with error code 1 in /tmp/pip-install-ldj7z_g9/PyYAML/

real    186m53.479s
user    0m24.276s
sys     17m28.272s

Event Timeline

As a point of comparison, I installed the same requirements.txt into a Python 3.4.4 venv created on a Trusty bastion with no problems. It looks to me like PyYAML was installed there as a wheel (precompiled). This may or may not be a clue to the Stretch/Python 3.5.3 problem.

bd808 triaged this task as High priority.

This is going to be a blocker for moving a lot of Python tools to the new grid. Jouncebot being one example.

I tried to reproduce this failure on another Stretch host and did not hit the same issue:

$ ssh tools-sgeexec-0902.tools.eqiad.wmflabs
$ cd /srv
$ sudo mkdir bd808-T215434
$ sudo chown bd808 bd808-T215434
$ virtualenv -p python3 venv-T215434
$ $ venv-T215434/bin/pip install -U pip
Requirement already up-to-date: pip in ./venv-T215434/lib/python3.5/site-packages (19.0.1)
tools-sgeexec-0902.tools:/srv/bd808-T215434
bd808$ venv-T215434/bin/pip install PyYAML
Collecting PyYAML
  Downloading https://files.pythonhosted.org/packages/9e/a3/1d13970c3f36777c583f136c136f804d70f500168edc1edea6daa7200769/PyYAML-3.13.tar.gz (270kB)
    100% |████████████████████████████████| 276kB 2.1MB/s
Building wheels for collected packages: PyYAML
  Building wheel for PyYAML (setup.py) ... done
  Stored in directory: /home/bd808/.cache/pip/wheels/ad/da/0c/74eb680767247273e2cf2723482cb9c924fe70af57c334513f
Successfully built PyYAML
Installing collected packages: PyYAML
Successfully installed PyYAML-3.13
$ venv-T215434/bin/python3
Python 3.5.3 (default, Sep 27 2018, 17:25:39)
[GCC 6.3.0 20170516] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import yaml
>>> dir(yaml)
['AliasEvent', 'AliasToken', 'AnchorToken', 'BaseDumper', 'BaseLoader', 'BlockEndToken', 'BlockEntryToken', 'BlockMappingStartToken', 'BlockSequenceStartToken', 'CBaseDumper', 'CBaseLoader', 'CDumper', 'CLoader', 'CSafeDumper', 'CSafeLoader', 'CollectionEndEvent', 'CollectionNode', 'CollectionStartEvent', 'DirectiveToken', 'DocumentEndEvent', 'DocumentEndToken', 'DocumentStartEvent', 'DocumentStartToken', 'Dumper', 'Event', 'FlowEntryToken', 'FlowMappingEndToken', 'FlowMappingStartToken', 'FlowSequenceEndToken', 'FlowSequenceStartToken', 'KeyToken', 'Loader', 'MappingEndEvent', 'MappingNode', 'MappingStartEvent', 'Mark', 'MarkedYAMLError', 'Node', 'NodeEvent', 'SafeDumper', 'SafeLoader', 'ScalarEvent', 'ScalarNode', 'ScalarToken', 'SequenceEndEvent', 'SequenceNode', 'SequenceStartEvent', 'StreamEndEvent', 'StreamEndToken', 'StreamStartEvent', 'StreamStartToken', 'TagToken', 'Token', 'ValueToken', 'YAMLError', 'YAMLObject', 'YAMLObjectMetaclass', '__builtins__', '__cached__', '__doc__', '__file__', '__loader__', '__name__', '__package__', '__path__', '__spec__', '__version__', '__with_libyaml__', 'add_constructor', 'add_implicit_resolver', 'add_multi_constructor', 'add_multi_representer', 'add_path_resolver', 'add_representer', 'compose', 'compose_all', 'composer', 'constructor', 'cyaml', 'dump', 'dump_all', 'dumper', 'emit', 'emitter', 'error', 'events', 'io', 'load', 'load_all', 'loader', 'nodes', 'parse', 'parser', 'reader', 'representer', 'resolver', 'safe_dump', 'safe_dump_all', 'safe_load', 'safe_load_all', 'scan', 'scanner', 'serialize', 'serialize_all', 'serializer', 'tokens']
>>>

They are using the same version of libyaml-dev (just checked that).

libc6 is the same package version

Ok, I've got it. The stretch bastions are using extremely tight user limits on RAM per T215401 and related tickets. For instance:

bstorm@tools-sgebastion-06:~$ stress --vm 1 --vm-bytes 512M --vm-hang 100
stress: info: [13865] dispatching hogs: 0 cpu, 0 io, 1 vm, 0 hdd
stress: FAIL: [13865] (415) <-- worker 13866 got signal 9
stress: WARN: [13865] (417) now reaping child worker processes
stress: FAIL: [13865] (451) failed run completed in 1s

That does not fail on a root login (sudo doesn't count--systemd still knows who you really are). In fact, if you try a compile when you are logged in as a user, then you try to log in again, systemd/cgroups will disallow you from even opening a shell. My root login is happy as anything to keep running that up to 512M. The systemd slice cgroup limit on RAM is 150M hard limit on all user processes. Any compiling will easily exceed that number. That's why gcc fails with "killed".

Root's version of the same process as above is still chugging. Root can compile whatever it wants.

root@tools-sgebastion-06:~# stress --vm 1 --vm-bytes 512M --vm-hang 100
stress: info: [13852] dispatching hogs: 0 cpu, 0 io, 1 vm, 0 hdd

This needs changing here: modules/profile/files/toolforge/bastion-user-resource-control.conf in puppet

Change 489127 had a related patch set uploaded (by Bstorm; owner: Bstorm):
[operations/puppet@production] toolforge: bastion cgroup limits need a big boost in resources

https://gerrit.wikimedia.org/r/489127

Change 489127 merged by Bstorm:
[operations/puppet@production] toolforge: bastion cgroup limits need a big boost in resources

https://gerrit.wikimedia.org/r/489127

(newvenv) bstorm@tools-sgebastion-06:~$ pip install pyyaml
Collecting pyyaml
  Using cached https://files.pythonhosted.org/packages/9e/a3/1d13970c3f36777c583f136c136f804d70f500168edc1edea6daa7200769/PyYAML-3.13.tar.gz
Building wheels for collected packages: pyyaml
  Building wheel for pyyaml (setup.py) ... done
  Stored in directory: /home/bstorm/.cache/pip/wheels/ad/da/0c/74eb680767247273e2cf2723482cb9c924fe70af57c334513f
Successfully built pyyaml
Installing collected packages: pyyaml
Successfully installed pyyaml-3.13

We just need to remove the "known issue" from wikitech now :)

bd808 reassigned this task from bd808 to Bstorm.

Works for me now too. I have updated the on-wiki documentation to mark this issue as resolved and to explain the core problem that caused it.

Thanks for tracking this down @Bstorm!