Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Cannot install Pants > 2.19 if home directory in /etc/passwd is a symlink. #21321

Closed
originalrkk opened this issue Aug 19, 2024 · 15 comments
Closed
Labels

Comments

@originalrkk
Copy link

Describe the bug
The pants CLI works fine in all of our environments at version 2.19. When we change the version to 2.20 or 2.21 or 2.22, we see one of two errors, although it's not clear why some environments trigger one and some another, given that they're all on the same Ubuntu (perhaps quirks of the user setup):

$ pants --keep-sandboxes=on_failure --changed-since=origin/main lint
Bootstrapping Pants 2.20.0                                                                                                                                                                                                                                                                  
Installing pantsbuild.pants==2.20.0 into a virtual environment at /home/<REDACTED>/.cache/nce/60b<REDACTED>/bindings/venvs/2.20.0                                                                                                              
Failed to create Pants virtual environment.                                                                                                                                                                                                                                                 
Error: Command '['/bulk_data/home/<REDACTED>/.cache/nce/60b<REDACTED>/bindings/pex_root/venvs/591<REDACTED>/561<REDACTED>/bin/python', '/tmp/tmpp276ax9e.pex', 'venv', '--prompt', 'Pants
 2.20.0', '--compile', '--pip', '--collisions-ok', '--no-emit-warnings', '--disable-cache', '/home/<REDACTED>/.cache/nce/60b<REDACTED>/bindings/venvs/2.20.0']' returned non-zero exit status 1., output:                                      
-----                                                                                                                                                                                                                                                                                       
b'Traceback (most recent call last):\n  File "/bulk_data/home/<REDACTED>/.cache/nce/679<REDACTED>/cpython-3.9.18+20240107-x86_64-unknown-linux-gnu-install_only.tar.gz/python/lib/python3.9/runpy.py", line 197, in _run_module_as_main\n    re
turn _run_code(code, main_globals, None,\n  File "/bulk_data/home/<REDACTED>/.cache/nce/679<REDACTED>/cpython-3.9.18+20240107-x86_64-unknown-linux-gnu-install_only.tar.gz/python/lib/python3.9/runpy.py", line 87, in _run_code\n    exec(code
, run_globals)\n  File "/home/<REDACTED>/.pex/unzipped_pexes/9c1<REDACTED>/__main__.py", line 105, in <module>\n    from pex.pex_bootstrapper import bootstrap_pex\nModuleNotFoundError: No module named \'pex\'\n'                                                    
-----

Or

Bootstrapping Pants 2.22.0
Installing pantsbuild.pants==2.22.0 into a virtual environment at /home/<REDACTED>/.cache/nce/60b<REDACTED>/bindings/venvs/2.22.0
Failed to fetch https://github.com/pantsbuild/pants/releases/download/release_2.22.0/pants.2.22.0-cp39-linux_x86_64.pex: [22] HTTP response code said error (The requested URL returned error: 404)
Wasn't able to fetch the Pants PEX at https://github.com/pantsbuild/pants/releases/download/release_2.22.0/pants.2.22.0-cp39-linux_x86_64.pex.

Check to see if the URL is reachable (i.e. GitHub isn't down) and if pants.2.22.0-cp39-linux_x86_64.pex asset exists within the release. If the asset doesn't exist it may be that this platform isn't yet supported. If that's the case, please reach out on Slack: https://www.pantsbuild.org/docs/getting-help#slack or file an issue on GitHub: https://github.com/pantsbuild/pants/issues/new/choose.

Exception:

Command '['/home/<REDACTED>/.cache/nce/226<REDACTED>/ptex-linux-x86_64', 'https://github.com/pantsbuild/pants/releases/download/release_2.22.0/pants.2.22.0-cp39-linux_x86_64.pex']' returned non-zero exit status 1.
Error: Failed to establish atomic directory /home/<REDACTED>/.cache/nce/60b<REDACTED>/locks/install-ab9<REDACTED>. Population of work directory failed: Boot binding command failed: exit status: 1

Isolates your Pants from the elements.

Please select from the following boot commands:

<default> (when SCIE_BOOT is not set in the environment)  Detects the current Pants installation and launches it.
bootstrap-tools                                           Introspection tools for the Pants bootstrap process.
update                                                    Update scie-pants.

You can select a boot command by setting the SCIE_BOOT environment variable.

Pants version
2.19 - 2.22

OS
Linux:

DISTRIB_ID=Ubuntu
DISTRIB_RELEASE=22.04
DISTRIB_DESCRIPTION="Ubuntu 22.04.3 LTS"
PRETTY_NAME="Ubuntu 22.04.3 LTS"

Additional info
We've tried a bit of flailing to fix this:

  • Attempting different Pants versions.
  • ./get-pants.sh
  • SCIE_BOOT=update pants (yields No new releases of scie-pants were found.)

We tried to install Pex directly into the virtual environment using pip, but got

raise MetadataError(\npex.dist_metadata.MetadataError: Failed to determine project name and version for distribution at /bulk_data/home/<REDACTED>/.pex/unzipped_pexes/ba7<REDACTED>/.deps/PyYAML-6.0.1-cp39-cp39-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.\n'

FWIW, https://github.com/pantsbuild/pants/releases/download/release_2.22.0/pants.2.22.0-cp39-linux_x86_64.pex is not an accessible address as far as I can tell.

@cburroughs
Copy link
Contributor

The second error is because 2.22 has not been released yet, that is https://github.com/pantsbuild/pants/releases/tag/release_2.22.0 will also 404. (There have been several RCs and we hope it is really close, but it is not out yet!)

I am unsure about the first error. When you say "all of our environments", do you mean that it happens in a variety of environments ( every developers workstation) or in a standardized environment (that is on with your /bulk_data` mount).

I hesitate to recommend this, but is deleting the cache (.cache/nce) among the things you have tried?

@originalrkk
Copy link
Author

Ah, yes, that might have been a miscommunication. Forget about 2.22...

The main thing is: Tried tearing down .cache/nce, actually all of .cache and .pex and /tmp/*pants*, the whole box, but no go. 2.19.0 installs fine, 2.20 and 2.21 fail, always with the same ModuleNotFoundError.

Just to be sure it wasn't some quirk of config in our repo, we even cleared everything out including any Pants binary and did the following:

curl --proto '=https' --tlsv1.2 -fsSL https://static.pantsbuild.org/setup/get-pants.sh | bash
mkdir test-pants
cd test-pants
echo '[GLOBAL]' > pants.toml
echo 'pants_version = "2.21.0"' >> pants.toml
pants repl .

And still see the same error on these boxes.

Any ideas how we could get this to spit out some more useful logs about what's happening perhaps?

@cburroughs
Copy link
Contributor

Can you conform what version of the scie-pants bootloader this box has?

$ PANTS_BOOTSTRAP_VERSION=report pants

Any ideas how we could get this to spit out some more useful logs about what's happening perhaps?

I am not sure they would have much more that the earlier output, but you can try what is in find ~/.cache/nce/ -iname '*log' (ex: install.log)

@originalrkk
Copy link
Author

This is what we see for the bootloader:

$ PANTS_BOOTSTRAP_VERSION=report pants
0.12.0

Only an install.log (no configure.log or pants-install.log on this box like we see on a healthy system), and it just reiterates what's in the dump.

@originalrkk
Copy link
Author

Okay, a bit more context: We were able to reproduce this on a completely clean Ubuntu image on AWS with a new Pants repo: ami-0d486650b94f4c69b. Not sure whether that's pointing to some networking configuration around it (perhaps it's failing to fetch something silently?).

@originalrkk originalrkk changed the title Cannot upgrade Pants from 2.19 due to ModuleNotFoundError for pex. Cannot install Pants > 2.19 if home directory in /etc/passwd is a symlink. Sep 5, 2024
@originalrkk
Copy link
Author

originalrkk commented Sep 5, 2024

We found the source of the issue. The Pants installer isn't correctly handling a symlinked home directory. In particular, in /etc/passwd, the home directory was listed as a symlink /home/<user> instead of /bulk_data/home/<user>.

@cburroughs
Copy link
Contributor

That looks frustratingly deep to debug; glad you found it!

@WTPOptAxe
Copy link

WTPOptAxe commented Dec 12, 2024

I have a similar observation / problem, but not with a symlinked home directory.

In my case, it's when ~/.cache is symlinked to a directory named .cache more than 3 levels deep in /tmp. This is the case in a number of 3rd party SDLC images such as renovate/renovate:39.62 but I can replicate this on ubuntu:22.04

  • fails - /home/ubuntu/.cache is a symlink to /tmp/test/cache/.cache
  • fails - /home/ubuntu/.cache is a symlink to /tmp/test/something/.cache
  • succeeds - /home/ubuntu/.cache is a symlink to /tmp/test/.cache

In my case, 2.22.0 works correctly, but 2.23.0 fails (see below). No changes are required to make 2.22.0 work correctly.

I'm creating the test environment with docker run --rm --name foo --volume ./myrepo:/tmp/myrepo --entrypoint /bin/sleep ubuntu:24.04 9000

nested directory at /tmp/test/cache/.cache - fails

This set of steps fails with the failure message below for 2.23.0 and works for 2.22.0

Get shell in container with docker exec -it -u root foo /bin/bash and then

apt update && apt install -y curl
mkdir -p /tmp/test/cache/.cache
chmod 777 /tmp/test /tmp/test/cache /tmp/test/cache/.cache
if [ -L /home/ubuntu/.cache ]; then rm /home/ubuntu/.cache; fi
ln -s /tmp/test/cache/.cache /home/ubuntu/.cache
ls -ld /tmp/test /tmp/test/cache /tmp/test/cache/.cache /home/ubuntu/.cache
lrwxrwxrwx 1 root root   22 Dec 12 14:41 /home/ubuntu/.cache -> /tmp/test/cache/.cache
drwxrwxrwx 3 root root 4096 Dec 12 14:40 /tmp/test
drwxrwxrwx 3 root root 4096 Dec 12 14:40 /tmp/test/cache
drwxrwxrwx 2 root root 4096 Dec 12 14:40 /tmp/test/cache/.cache
su - ubuntu
cd /tmp/myrepo
curl --proto '=https' --tlsv1.2 -fsSL https://static.pantsbuild.org/setup/get-pants.sh | bash
/home/ubuntu/.local/bin/pants version

less nested .cache directory inside /tmp/test - works

Before running this test, terminate the container and restart it to return filesystem to default.

Get shell in container with docker exec -it -u root foo /bin/bash and then

apt update && apt install -y curl
mkdir -p /tmp/test/.cache
chmod 777 /tmp/test /tmp/test/.cache
if [ -L /home/ubuntu/.cache ]; then rm /home/ubuntu/.cache; fi
ln -s /tmp/test/.cache /home/ubuntu/.cache
ls -ld /tmp/test /tmp/test/.cache /home/ubuntu/.cache
   lrwxrwxrwx 1 root root   16 Dec 12 14:56 /home/ubuntu/.cache -> /tmp/test/.cache
    drwxrwxrwx 3 root root 4096 Dec 12 14:56 /tmp/test
    drwxrwxrwx 2 root root 4096 Dec 12 14:56 /tmp/test/.cache
su - ubuntu
cd /tmp/myrepo
curl --proto '=https' --tlsv1.2 -fsSL https://static.pantsbuild.org/setup/get-pants.sh | bash
/home/ubuntu/.local/bin/pants version

2.23.0 failure message

Bootstrapping Pants 2.23.0
Installing pantsbuild.pants==2.23.0 into a virtual environment at /home/ubuntu/.cache/nce/9de06a50ca43f773ddf5463ea534b781a193096c5d9c3cf4fa9687d592fb0986/bindings/venvs/2.23.0
Failed to create Pants virtual environment.
Error: Command '['/tmp/containerbase/cache/.cache/nce/9de06a50ca43f773ddf5463ea534b781a193096c5d9c3cf4fa9687d592fb0986/bindings/pex_root/venvs/e9325278eb97b235cc28d540e6599e7f5e69fa25/ef0210ddc65deea0460a3aa02dbb08eab37714fc/bin/python', '/tmp/tmpsaa54k0r.pex', 'venv', '--prompt', 'Pants 2.23.0', '--compile', '--pip', '--collisions-ok', '--no-emit-warnings', '--disable-cache', '/home/ubuntu/.cache/nce/9de06a50ca43f773ddf5463ea534b781a193096c5d9c3cf4fa9687d592fb0986/bindings/venvs/2.23.0']' returned non-zero exit status 1., output:
-----
b'Traceback (most recent call last):\n  File "/tmp/containerbase/cache/.cache/nce/7d19e1ecd6e582423f7c74a0c67491eaa982ce9d5c5f35f0e4289f83127abcb8/cpython-3.9.18+20240107-aarch64-unknown-linux-gnu-install_only.tar.gz/python/lib/python3.9/runpy.py", line 197, in _run_module_as_main\n    return _run_code(code, main_globals, None,\n  File "/tmp/containerbase/cache/.cache/nce/7d19e1ecd6e582423f7c74a0c67491eaa982ce9d5c5f35f0e4289f83127abcb8/cpython-3.9.18+20240107-aarch64-unknown-linux-gnu-install_only.tar.gz/python/lib/python3.9/runpy.py", line 87, in _run_code\n    exec(code, run_globals)\n  File "/home/ubuntu/.cache/pex/unzipped_pexes/5d3ad1f48b31f75a4afacd01941825831b5cd152/__main__.py", line 227, in <module>\n    result, should_exit, is_globals = boot(\n  File "/home/ubuntu/.cache/pex/unzipped_pexes/5d3ad1f48b31f75a4afacd01941825831b5cd152/__main__.py", line 216, in boot\n    from pex.globals import Globals\nModuleNotFoundError: No module named \'pex\'\n'
-----

Error: Failed to establish atomic directory /home/ubuntu/.cache/nce/9de06a50ca43f773ddf5463ea534b781a193096c5d9c3cf4fa9687d592fb0986/locks/install-9a7e34655c6fec617f37def0a028aa5075179a3c657b2dec3e986db78c2a89a3. Population of work directory failed: Boot binding command failed: exit status: 1

2.23.0 success message

When using /tmp/test/.cache, 2.23.0 succeeds with

Bootstrapping Pants 2.23.0
Installing pantsbuild.pants==2.23.0 into a virtual environment at /home/ubuntu/.cache/nce/9de06a50ca43f773ddf5463ea534b781a193096c5d9c3cf4fa9687d592fb0986/bindings/venvs/2.23.0
New virtual environment successfully created at /home/ubuntu/.cache/nce/9de06a50ca43f773ddf5463ea534b781a193096c5d9c3cf4fa9687d592fb0986/bindings/venvs/2.23.0
14:47:19.51 [INFO] Initializing scheduler...
14:47:19.53 [INFO] Initializing Nailgun pool for 8 processes...
14:47:20.66 [INFO] Scheduler initialized.
2.23.0
14:47:24.08 [WARN] Executor shutdown took unexpectedly long: tasks were likely leaked!

@benjyw
Copy link
Contributor

benjyw commented Dec 12, 2024

Thanks for the specific repro instructions. I now reproduce this locally, so will dig in.

@benjyw
Copy link
Contributor

benjyw commented Dec 13, 2024

This is because pex computes a relative path of the "physical" cache dir /tmp/test/cache/.cache/pex/bootstraps/ff146c4e9ca2a34371658662bfd7c7714bba8c10 relative to the symlinked dir /home/ubuntu/.cache/pex/unzipped_pexes/5d3ad1f48b31f75a4afacd01941825831b5cd152. So we end up with ../../../../../../tmp/test/cache/.cache/pex/bootstraps/ff146c4e9ca2a34371658662bfd7c7714bba8c10 as the relpath (the intention was for that to be ../../bootstraps/ff146c4e9ca2a34371658662bfd7c7714bba8c10).

And so if paths under the symlink are at a different depth than the corresponding "physical" paths, that relpath is incorrect. If they are at the same depth then that ../../../../../../ prefix will climb up to the filesystem root and back down again, which is not what was intended, but will happen to work.

The underlying issue is that we os.path.realpath the pex_root in most cases, but not in at least one case (here it's the fallback value in a call to Variables.PEX_ROOT.value_or(...).

I will file and fix over in pex.

@benjyw
Copy link
Contributor

benjyw commented Dec 13, 2024

Fixed here: pex-tool/pex#2626

jsirois pushed a commit to pex-tool/pex that referenced this issue Dec 13, 2024
In almost all codepaths, a `pex_root` will be a realpath. See, e.g.,


https://github.com/pex-tool/pex/blob/06b8850f35ae67377ad2fe31d62ee1f71ba61eea/pex/pex_info.py#L511

https://github.com/pex-tool/pex/blob/06b8850f35ae67377ad2fe31d62ee1f71ba61eea/pex/variables.py#L320

However there was one codepath by which a non-realpath could
be obtained, namely `boot()` calling`Variables.PEX_ROOT.value_or()` 
with a `raw_pex_root` given as the fallback value.

This change ensures that `Variables.PEX_ROOT` is always a realpath.

This manifested as a bug at pex boot time in the presence of a
symlinked cache dir: pantsbuild/pants#21321
@benjyw
Copy link
Contributor

benjyw commented Dec 15, 2024

#21762 upgrades Pants to use Pex 2.27.1, which includes this fix. This should go out in the next dev release of Pants (2.25.0.dev2).

If you need this fix in an earlier version of Pants you can manually update the Pex version in config:

[pex-cli]
version = "v2.27.1"
known_versions.add = [
  "v2.27.1|macos_arm64 |013a824e5af50f9687f765a43e8ffe94b4faa4fe795d017333c687bf943a4a68|4369121",
  "v2.27.1|macos_x86_64|013a824e5af50f9687f765a43e8ffe94b4faa4fe795d017333c687bf943a4a68|4369121",
  "v2.27.1|linux_arm64 |013a824e5af50f9687f765a43e8ffe94b4faa4fe795d017333c687bf943a4a68|4369121",
  "v2.27.1|linux_x86_64|013a824e5af50f9687f765a43e8ffe94b4faa4fe795d017333c687bf943a4a68|4369121",
]

@benjyw benjyw closed this as completed Dec 15, 2024
benjyw added a commit that referenced this issue Dec 15, 2024
@csqzhang
Copy link
Contributor

@benjyw I also encountered this issue for a long time. Happy to see some progress and effort made here. However, I just tried the pants 2.23.0 with Pex 2.27.1. I still have the same issue. Below is my sample log.

Downloading https://github.com/pantsbuild/pants/releases/download/release_2.23.0/pants.2.23.0-cp39
Downloading https://github.com/pantsbuild/pants/releases/download/release_2.23.0/pants.2.23.0-cp39
Traceback (most recent call last):
  File "/data/user/<userid>/.cache/nce/nce/f3ff38b1ccae7dcebd8bbf2e533c9a984fac881de0ffd1636fbb61842bd924de/cpython-3.9.18+20231002-x86_64-unknown-linux-gnu-install_only.tar.gz/python/lib/python3.9/runpy.py", line 197, in _run_module_as_main
    return _run_code(code, main_globals, None,
  File "/data/user/<userid>/.cache/nce/nce/f3ff38b1ccae7dcebd8bbf2e533c9a984fac881de0ffd1636fbb61842bd924de/cpython-3.9.18+20231002-x86_64-unknown-linux-gnu-install_only.tar.gz/python/lib/python3.9/runpy.py", line 87, in _run_code
    exec(code, run_globals)
  File "/users/<userid>/.cache/pex/unzipped_pexes/7be46b58df48a5f579dd66d7f7ed2f32307063b5/__main__.py", line 227, in <module>
    result, should_exit, is_globals = boot(
  File "/users/<userid>/.cache/pex/unzipped_pexes/7be46b58df48a5f579dd66d7f7ed2f32307063b5/__main__.py", line 216, in boot
    from pex.globals import Globals
ModuleNotFoundError: No module named 'pex'

when I go inside to /users/<userid>/.cache/pex/unzipped_pexes/7be46b58df48a5f579dd66d7f7ed2f32307063b5, below is what I see

 [.... 7be46b58df48a5f579dd66d7f7ed2f32307063b5]$ ls -l
total 20
-rwxr-xr-x. 1 <userid> <usergroup> 7919 Dec 17 17:37 __main__.py
lrwxrwxrwx. 1 <userid> <usergroup> 104 Dec 17 17:37 __pex__ -> ../../../../../../<xyz>/home/<userid>/.cache/pex/user_code/68b87e96476955d5120f0cbfa1ef1141290ead52/__pex__
-rw-r--r--. 1 <userid> <usergroup> 3890 Dec 17 17:37 PEX-INFO
-rw-rw----. 1 <userid> <usergroup>6 Dec 17 17:37 PEX-LAYOUT
drwxrwx---. 2 <userid> <usergroup> 4096 Dec 17 17:37 __pycache__

What seems to me is that __pex__ -> ../../../../../../<xyz>/home/<userid>/.cache/pex/user_code/68b87e96476955d5120f0cbfa1ef1141290ead52/__pex__ is still broken.

In my case /users/<userid> is the same as <xyz>/home/<userid>. I guess that is why the linked was created? Do you have any clue on the issue here?

@benjyw
Copy link
Contributor

benjyw commented Dec 18, 2024

I've just realized that upgrading pex in an existing version of Pants won't help, because this relates to the version of Pex that we package Pants with.

Can you try upgrading to the just-released 2.25.0.dev2?

@csqzhang
Copy link
Contributor

I've just realized that upgrading pex in an existing version of Pants won't help, because this relates to the version of Pex that we package Pants with.

Can you try upgrading to the just-released 2.25.0.dev2?

Thank you @benjyw . I can confirm that 2.25.0.dev2 solves this issue 💯 . I will wait for the official release.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

5 participants