8000 Tests on Windows must usually be run in Git Bash or a similar environment · Issue #1359 · GitoxideLabs/gitoxide · GitHub
[go: up one dir, main page]
More Web Proxy on the site http://driver.im/
Skip to content

Tests on Windows must usually be run in Git Bash or a similar environment #1359

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
EliahKagan opened this issue May 7, 2024 · 8 comments · Fixed by #1712
Closed

Tests on Windows must usually be run in Git Bash or a similar environment #1359

EliahKagan opened this issue May 7, 2024 · 8 comments · Fixed by #1712
Labels
acknowledged an issue is accepted as shortcoming to be fixed C-Windows Windows-specific issues help wanted Extra attention is needed

Comments

@EliahKagan
Copy link
Member
EliahKagan commented May 7, 2024

See #1359 (comment) for updated information.

Current behavior 😯

In this project, tests are typically run on Windows in a Git Bash environment. On CI in Windows, bash is Git Bash. This is also the environment from which I have typically run tests. This is a native environment; it should not be confused with WSL or even something like Cygwin. Furthermore, tools maintained by rustup are not specifically connected to this environment, though they are run from it.

All tests are able to pass when run in Git Bash on Windows, and likely would pass if run in similar environments that provide a POSIX-compatible shell for Windows, set environment variables accordingly, and provide access to POSIX versions of common tools.

But running the tests from PowerShell instead produces 627 test failures. This presumably relates to the associated environment rather than the shell itself, since however the tests are run, a test runner subprocess is controlling the run.

The main problem seems to be the use of paths with backslashes in them, and the way that interacts with fixture scripts. Note, however, that this is not specific to the situation where GIX_TEST_IGNORE_ARCHIVES is set. That is to say that this is a separate issue from #1358 (which also involves far fewer failures) and occurs even when GIX_TEST_IGNORE_ARCHIVES is unset.

Even the test summary showing one failure per line is too long for GitHub to allow me to include it in this issue description. This gist has it. The full output can be seen in this other gist.

As of now, I have not figured out the cause, which I think is a prerequisite for making a decision about whether to try to support such environments or to instead document them as unsupported.

Expected behavior 🤔

Either all tests should pass even when run from PowerShell, or the need to run them in a Git Bash environment (or whatever other more precise requirement is known) should be documented.

I think which is better depends on how cumbersome it would be to enable the tests to pass when run from PowerShell and, more importantly, how cumbersome it would be to maintain this state. I suspect it may be feasible, though.

Git behavior

A direct comparison is difficult in that this is about running the tests rather than the functionality of Git and gitoxide. However, it is worth noting that Git for Windows has its own SDK environment that is used for building it and running its tests, and this environment is separate even from the more minimal Git Bash environment.

Thus, if it is infeasible to support running tests from PowerShell, documenting that would still not be very restrictive, even compared to what is needed to develop and test Git for Windows, since the non-SDK "Git Bash" environment works fine for running gitoxide's tests.

Steps to reproduce 🕹

Using Windows, make sure the tests can pass when run from Git Bash using the first of two cargo test-runner commands shown below.

Optionally clean the build with cargo clean or, if preferred, clean everything with gix clean -xde.

There are two approaches. The first approach is to display the output in the console: To do that, in PowerShell, run:

cargo nextest --all --no-fail-fast

However, that is hard to read because of the very large volume of output combined with the staggering of output lines suggesting a problem identifying the terminal width or computing the width of text written.

Therefore, you may instead wish to write both stdout and stderr to a file, and inspect the file during and/or after the run.

cargo nextest --all --no-fail-fast *> ../output.txt

This writes all output to output.txt in the directory above the current directory. This is to avoid creating a new file in the repository while running the tests, in case that were to interfere with something, though it shouldn't since that should only be able to affect a small number of journey test runs, which are not included when running cargo nextest.

The *> operator in PowerShell is similar to the &> or >& operators in some shells and to the effect of 2>&1 > in most shells.

Although Windows comes with Windows PowerShell, I suggest against using it for this, since is somewhat less intuitive, and has fewer convenience features, compared to the newer PowerShell (which is sometimes called PowerShell Core). Windows PowerShell is also less likely to be used by developers on Windows, and thus probably less important to support than recent versions of PowerShell. I used PowerShell 7.4.2 on Windows 10.

@Byron Byron added help wanted Extra attention is needed acknowledged an issue is accepted as shortcoming to be fixed C-Windows Windows-specific issues labels May 7, 2024
< 8000 /div>
@Byron
Copy link
Member
Byron commented May 7, 2024

Thanks for the detailed report!

It appears that gix-testtools are unable to execute any script with backslashes in its path as these will count as escape-character. So for this to work (better), all it might take is to properly escape such paths. There might be more to it though. Maybe a Windows build of gix-testtools could have special mitigations for that built in at some point.

EliahKagan added a commit to EliahKagan/gitoxide that referenced this issue Aug 26, 2024
This runs `cargo nextest ...` and `cargo test --doc` in separate
steps in the `test-fast` job, so that the job fails when
`cargo nextest ...` fails. Otherwise, with `pwsh` on Windows, test
failures (other than doctests) are masked.

Background: Since 89a0567 (GitoxideLabs#1556), doctests are run on all three
major platforms, and not only on the full test job done on Ubuntu.
But the way this was done relied on a script failing as soon as
(or, at least, whenever) any command in the script failed. That
works on Ubuntu and macOS, where a `bash` shell is used by default,
with `-e` passed. But on Windows, GitHub Actions uses `pwsh` as the
default shell. `pwsh` is not run in a way that causes it to stop at
the first failing command.

So, on Windows, when the `cargo nextest` command failed but the
`cargo test --doc` command that followed it in the same script
step passed, the step passed, thus allowing the job and workflow to
pass. This was observed in GitoxideLabs#1429 after a rebase (see comments).

Note that this is not related to the increased use of `nextest`.
While that was also done in GitoxideLabs#1556, it did not affect the
`test-fast` job where the bug was introduced, which was already
using `nextest`.

This fixes the problem by putting the two commands in separate
steps.

This is simpler than doing anything in PowerShell to make the
script stop, such as using `&&` or attempting to produce `-e`-like
behavior.

Another option could be to use `bash` as the shell, which is a Git
Bash environment suitable for running the tests. The reason I
didn't do that is that I think it is valuable to see the results
when the tests are run from a PowerShell environment.

In particular, continuing to use PowerShell here should help in
investigating GitoxideLabs#1359 (and shows that the claim I made is overly
strong, since CI on Windows with `pwsh` not itself started from a
Unix-style shell is not "Git Bash or a similar environment").
@EliahKagan

This comment was marked as off-topic.

@EliahKagan
Copy link
Member Author

As noted in #1559, my claim in this issue is apparently overstated, since tests are able to run in PowerShell on Windows GHA runners, and a pwsh shell that does not have any Unix-style shell as an ancestor in the process tree is decisively not a "Git Bash or similar environment" in the sens that I meant it or that this phrase is likely to be understood. I'll try to narrow down the claim here when I can; for now, this comment serves as a partial correction. That the tests work in pwsh on CI may, through examination of the CI environment help reveal what is going on here.

@EliahKagan
Copy link
Member Author
EliahKagan commented Aug 29, 2024

The reason this works on CI as noted in #1359 (comment), while not working locally, is that bash on a Windows GHA runner is Git Bash even in the PATH used in pwsh, while this is rare locally.

That the current implementation is ever able to use Git Bash on Windows, even when the tests are run from a Git Bash shell, also depends inadvertently on undocumented behavior of std::process::Command that may change in the future. So it's possible Windows tests will break suddenly at some point, even if run from Git Bash. Fixing this bug appears feasible and will address that, too. Details follow.

How the tests fail

Consider the following output, which is typical of the test failures:

        FAIL [   0.557s] gix remote::connection::fetch::refs::tests::update::local_direct_refs_are_written_with_symbolic_ones

--- STDOUT:              gix remote::connection::fetch::refs::tests::update::local_direct_refs_are_written_with_symbolic_ones ---

running 1 test
test remote::connection::fetch::refs::tests::update::local_direct_refs_are_written_with_symbolic_ones ... FAILED

failures:

failures:
    remote::connection::fetch::refs::tests::update::local_direct_refs_are_written_with_symbolic_ones

test result: FAILED. 0 passed; 1 failed; 0 ignored; 0 measured; 16 filtered out; finished in 0.54s


--- STDERR:              gix remote::connection::fetch::refs::tests::update::local_direct_refs_are_written_with_symbolic_ones ---
Archive at 'tests\fixtures\generated-archives\make_remote_repos.tar' not found, creating fixture using script 'make_remote_repos.sh'
thread 'remote::connection::fetch::refs::tests::update::local_direct_refs_are_written_with_symbolic_ones' panicked at tests\tools\src\lib.rs:566:17:
fixture script of "bash" "C:\\Users\\ek\\source\\repos\\gitoxide\\gix\\tests\\fixtures\\make_remote_repos.sh" failed: stdout:
stderr: /bin/bash: C:Userseksourcereposgitoxidegixtestsfixturesmake_remote_repos.sh: No such file or directory

note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace

There are three things going on here:

  1. Some generated archives are (intentionally) listed in .gitignore files, so fixture scripts always have to run to create them. Therefore, a number of tests will fail if fixture scripts cannot run successfully.
  2. Running fixture scripts on Windows passes their Windows-style paths to bash. These paths are resolved before use, so they usually begin with drive letters, e.g. with C:. They also use \ (rather than /) characters as directory separators.
  3. Whether bash can tolerate a script path argument that starts with a Windows drive letter, or that uses \ as a directory separator, varies across bash implementations. Which bash is selected is affected by what exists on the system, the value of the PATH environment variable, and subtleties of std::process::Command.

Why is it even running bash?

The sequence is:

  1. Find that a generated archive is not available, so that running the fixture script is necessary.

  2. Attempt to run the script as an executable:

    https://github.com/Byron/gitoxide/blob/1cfe577d461293879e91538dbc4bbfe01722e1e8/tests/tools/src/lib.rs#L553

    On Windows, that always fails, because files that do not match a supported binary executable format must be batch files with a .cmd or .bat extension for std::process::Command to run them automatically with an interpreter (which Command supports because CreateProcessW supports it). The fixture scripts here are bash scripts and they have a .sh suffix.

  3. Check if the error was due to the inability to execute the file:

    https://github.com/Byron/gitoxide/blob/1cfe577d461293879e91538dbc4bbfe01722e1e8/tests/tools/src/lib.rs#L555-L556

    Windows error code 193 means this is a "not a valid Win32 application" error, so we do effectively detect that the script was not able to be executed.

  4. Try again by running bash, with the script path and the arguments for the script as its arguments:

    https://github.com/Byron/gitoxide/blob/1cfe577d461293879e91538dbc4bbfe01722e1e8/tests/tools/src/lib.rs#L558-L559

What is bash on Windows?

Today, there are two typical cases for a bash shell on Windows:

  • The port of bash to Windows that is provided by MSYS2 and similar environments, including Git Bash, tolerates backslashes used as directory separators.

  • The bash associated with WSL does not, since it delegates to bash in an installed distribution for WSL, which either fails because there is no distribution or the distribution does not have bash, or succeeds but runs that distribution's native bash executable that has no custom behavior for compatibility with Windows.

    In addition, even if the paths were normalized so a WSL bash would run the fixture scripts, we should expect the scripts, if run that way, to fail to produce correct test data. This is because they would be run inside a WSL distribution, but these are the scripts that have to be run because their generated archives are in .gitignore, which should only be done if the scripts do not currently produce archives that work when used on another system, including when generated on a GNU/Linux system and used on Windows.

    So any fixture for which this would work reliably should just be omitted from all .gitignore files and have its generated archive committed. Fixing the paths is thus not a worthwhile solution, at least not if done by itself.

Of those, which bash is chosen?

In practice, there seem to be three cases for how bash is resolved on Windows:

  • In an MSYS2-like environment including Git Bash, the PATH will almost always be set in such a way that this environment's bash precedes any other bash.
  • Outside such an environment, the PATH will usually be set so that the bash associated with WSL, usually at C:\Windows\System32\bash.exe, precedes any other bash.
  • Outside such an environment but on a GitHub Actions runner, the PATH is set so that bash is Git Bash.

So when the the tests are not run from Git Bash or a similar environment, they will usually find the wrong bash, the major exception being when run on CI.

Here's a simplified demonstration on my machine, in PowerShell (pwsh), where bash is the WSL-associated bash, and the scoop-installed bash is Git Bash (though the effect is in no way scoop-specific):

C:\Users\ek\tmp> (Get-Command bash).path
C:\Windows\system32\bash.exe
C:\Users\ek\tmp> Get-Content 'a\b'
echo 'Hello, world!'
C:\Users\ek\tmp> bash 'a/b'
Hello, world!
C:\Users\ek\tmp> bash 'a\b'
/bin/bash: ab: No such file or directory
C:\Users\ek\tmp> C:\Users\ek\scoop\shims\bash.exe 'a/b'
Hello, world!
C:\Users\ek\tmp> C:\Users\ek\scoop\shims\bash.exe 'a\b'
Hello, world!

But that's far from the whole story

To run Git Bash, why is it sufficient for the directory that contains the Git Bash bash.exe to precede the System32 directory (that has the WSL-associated bash.exe) in PATH? Why does it search in PATH before other locations?

The above example feels convincing, but it shouldn't. I ran those commands in PowerShell, which searches PATH first. On most OSes, a path search is a PATH search, but that is not generally the case on Windows, nor is this really an accurate characterization of what std::process::Command is doing.

  • Other than shells, which are expected to perform their own path search with different rules from CreateProcessW, most programs on Windows do not do path search themselves and instead let CreateProcessW do it. Most libraries and frameworks that offer process creation facilities delegate fully, including for path search, to CreateProcessW. But CreateProcessW searches various locations before PATH, regardless of whether or where those locations are also listed in PATH. These prioritized locations include the System32 directory, which usually contains the WSL-associated bash.exe.
  • As noted in #1432 (comment), for security, the Rust standard library performs its own PATH search when std::process::Command is used. It passes the resolved absolute path to CreateProcessW, rather than using CreateProcessW for the search. This is mainly to avoid searching in the current directory. It was implemented in rust-lang/rust#87704 and fixes the main concern of rust-lang/rust#87945.
  • But that is not intended to, and usually does not, prevent the System32 directory from taking precedence over PATH directories. As reported in rust-lang/rust#122660, std::process::Command will often choose the bash.exe in System32 rather than a Git Bash bash.exe that has been placed earlier in PATH, for this very reason.
  • But as noted there and elaborated in rust-lang/rust#122660 (comment) (see also rust-lang/rust#37519), this is averted when methods such as env are called on the Command instance to customize the child process environment. This happens because, when customizing its environment, the current (undocumented) behavior of Command is to search the PATH for the child before searching special locations and the PATH of the parent. This is done even if PATH is not one of the environment variables being changed or unset.

The latter condition applies to the way gix-testtools runs fixture scripts on Windows. Returning attention to the code shown above, it calls configure_command, which makes a few env_remove calls and numerous env calls.

https://github.com/Byron/gitoxide/blob/1cfe577d461293879e91538dbc4bbfe01722e1e8/tests/tools/src/lib.rs#L598-L604

There are many more env calls than shown in that quoted portion, but none of them set PATH, nor is that variable ever unset or the environment ever cleared. So the child PATH is the parent's PATH, but since other customizations to the environment have been made, std::process::Command searches this PATH for bash before it searches in any special locations.

A possible fix

Although I tend to think about improvements to how gix-testtools runs fixture scripts as being closely connected to improvements to gix-command and the machinery for running hooks, I think the key to identifying a fix for this bug is to notice that they are different in ways that allow changes to gix-testtools that might not be suitable elsewhere.

However gix-testtools runs fixture scripts, it:

  • Does not need to be compatible with how git runs scripts (though it should use the same shell if possible).
  • Does not need to be compatible with any documented or intended behavior related to how any other gitoxide crates run scripts.
  • Does not need to be able to run scripts that are not shell scripts.
  • Does not need to be able to run binary executables.
  • Can require git, because nearly all of the fixture scripts would fail without it anyway.

Therefore, on Windows, gix-testtools can always safely attempt to use Git Bash to run the script first. If it can't find Git Bash based on information provided by git, then it can either just fail, or fall back to something like what it is currently doing.

Git Bash can usually be located using a git --exec-path-based method like what is done in gitpython-developers/GitPython#1791. Conceptually, this is along the lines of

realpath -- "$(git --exec-path)/../../../bin/bash.exe"

though of course it would be in Rust and not Bash, since the point is to find which bash to use.

@EliahKagan EliahKagan changed the title Tests on Windows require Git Bash or a similar environment Tests on Windows must usually be run in Git Bash or a similar environment Aug 29, 2024
@Byron
Copy link
Member
Byron commented Aug 29, 2024

Thanks so much for researching this, and the very informative and interesting read!

The special-behaviour in std::process::Command is indeed something that could break a lot of people if it ever changes, and if it wasn't intended to kick-in when PATH isn't set, then it might well be considered a bug that will be fixed eventually.

By the looks of it, this does affect gix-command as well, which is exposed to the undocumented behaviour of std::process::Command currently.

Fixing this both in gix-testtools and in gix-command should be possible, I was thinking about using gix_path::env::exe_location() can be used to derive the bash.exe location, which would be nice as it's cached.

However, I wouldn't mind any fix to start with whatever is needed for gix-testtools, and then take it from there.

EliahKagan added a commit to EliahKagan/gitoxide that referenced this issue Dec 1, 2024
Rather than hard-coding `bash` on all systems as the fallback
interpreter when a fixture script cannot be run directly, this
falls back in an operating system specific manner:

- Except on Windows, always fall back to `bash`, as before.

- On Windows, run `git --exec-path` to find the `git-core`
  directory. Then check if a `bash.exe` exists at the expected
  location relative to that. In Git for Windows installations,
  this will usually work. If so, use that path (with `..`
  components resolved away).

- On Windows, if a specific `bash.exe` is not fou
8000
nd in that way,
  then fall back to using the relative path `bash.exe`. This is to
  preserve the ability to run `bash` on Windows systems where it
  may have worked before even without `bash.exe` in an expected
  location provided by a Git for Windows installation.

(The distinction between `bash` and `bash.exe` is only slightly
significant: we check for the existence of the interpreter without
initially running it, and that check requires the full filename.
It is called `bash.exe` elsewhere for consistency both with the
checked-for executable and for consistencey with how we run most
other programs on Windows, e.g., the `git` vs. `git.exe`.)

This fixes GitoxideLabs#1359. That bug is not currently observed on CI, but
this change is verified to fix it on a local test system where it
previously always occurred when running the test suite from
PowerShell in an unmodified environment. The fix applies both with
`GIX_TEST_IGNORE_ARCHIVES` unset, in which case there are now no
failures, and with `GIX_TEST_IGNORE_ARCHIVES=1`, in which case the
failures are now limited to the 15 cases tracked in GitoxideLabs#1358.

Previously, fixture scripts had been run on Windows with whatever
`bash` was found in a `PATH` search, which had two problems:

- On most Windows systems, even if no WSL distribution is installed
  and even if WSL itself is not set up, the `System32` directory
  contains a `bash.exe` program associated with WSL. This program
  attempts to use WSL to run `bash` in an installed distribution.
  The `wsl.exe` program also provides this functionality and is
  favored for this purpose, but the `bash.exe` program is still
  present and is likely to remain for many years for compatibility.

  Even when this `bash` is usable, it is not suited for running
  most shell scripts meant to operate on the native Windows system.
  In particular, it is not suitable for running our fixture
  scripts, which need to use the native `git` to prepare fixtures
  to be used natively, among other requirements that would not be
  satisfied with WSL (except when the tests are actually running in
  WSL).

  Since some fixtures are `.gitignore`d because creating them on
  the test system (rather than another system) is part of the test,
  this has caused breakage in most Windows environments unless
  `PATH` is modified -- either explicitly or by testing in an MSYS2
  environment, such as the Git Bash environment -- whether or not
  `GIX_TEST_IGNORE_ARCHIVES` is set. This was the cause of GitoxideLabs#1359.

- Although using a Git Bash environment or otherwise adjusting the
  path *currently* works, the reasons it works are subtle and rely
  on non-guaranteed behavior of `std::process::Command` path search
  that may change without warning.

  On Windows, processes are created by calling the `CreateProcessW`
  API function. `CreateProcessW` is capable of performing a `PATH`
  search, but this `PATH` search is not secure in most uses, since
  it includes the current directory (and searches it before `PATH`
  directories) unless `NoDefaultCurrentDirectoryInExePath` is set
  in the caller's environment.

  While it is the most relevant to security, the CWD is not the
  only location `CreateProcessW` searches before searching `PATH`
  directories (and regardless of where, if anywhere, they may also
  appear in `PATH`). Another such location is the `System32`
  directory. This is to say that, even when another directory with
  `bash.exe` precedes `System32` in `PATH`, an executable search
  will still find the WSL-associated `bash.exe` in `System32`
  unless it deviates from the algorithm `CreateProcessW` uses.

  To avoid including the CWD in the search, `std::process::Command`
  performs its own path search, then passes the resolved path to
  `CreateProcessW`. The path search it performs is currently almost
  the same the algorithm `CreateProcessW` uses, other than not
  automatically including the CWD. But there are some other subtle
  differences.

  One such difference is that, when the `Command` instance is
  configured to create a modified child environment (for example,
  by `env` calls), the `PATH` for the child is searched early on.
  This precedes a search of the `System32` directory. It is done
  even if none of the customizations of the child environment
  modify its `PATH`.

  This behavior is not guaranteed, and it may change at any time.
  It is also the behavior we rely on inadvertently every time we
  run `bash` on Windows with a `std::process::Command` instance
  constructed by passing `bash` or `bash.exe` as the `program`
  argument: it so happens that we are also customizing the child
  environment, and due to implementation details in the Rust
  standard library, this manages to find a non-WSL `bash` when
  the tests are run in Git Bash, in GitHub Actions jobs, and in
  some other cases.

  If in the future this is not done, or narrowed to be done only
  when `PATH` is one of the environment variables customized for
  the child process, then putting the directory with the desired
  `bash.exe` earlier than the `System32` directory in `PATH` will
  no longer prevent `std::proces::Command` from finding the
  `bash.exe` in `System32` as `CreateProcessW` would and using it.
  Then it would be nontrivial to run the test suite on Windows.

For references and other details, see GitoxideLabs#1359 and comments including:
GitoxideLabs#1359 (comment)

On the approach of finding the Git for Windows `bash.exe` relative
to the `git-core` directory, see the GitPython pull request
gitpython-developers/GitPython#1791.

Two possible future enhancements are *not* included in this commit:

1. This only modifies how test fixture scripts are run. It only
   affects the behavior of `gix-testtools`, and not of any other
   gitoxide crates such as `gix-command`. This is because:

   - While gitoxide uses information from `git` to find out where
     it is installed, mainly so we know where to find installation
     level configuration, we cannot in assume that `git` is present
     at all. Unlike GitPython, gitoxide is usable without `git`.

   - We know our test fixture scripts are all (at least currently)
     `bash` scripts, and this seems likely for other software that
     currently uses this functionality of `gix-testtools`. But
     scripts that are run as hooks, or as custom commands, or
     filters, etc., are often written in other languages, such as
     Perl. (The fallback here does not examine leading `#!` lines.)

   - Although a `bash.exe` located at the usual place relative to
     (but outside of) the `git-core` directory is usually suitable,
     there may be scenarios where running an executable found this
     way is not safe. Limiting it to `gix-testtools` pending
     further research may help mitigate this risk.

2. As in other runs of `git` by `gix-testools`, this calls
   `git.exe`, letting `std::process::Command` do an executable
   search, but not trying any additional locations where Git is
   known sometimes to be installed. This does not find `git.exe` in
   as many situations as `gix_path::env::exe_invocation` does.

   The reasons for not (or not quite yet) including that change are:

   - It would add `gix-path` as a dependency of `gix-testtools`.

   - Finding `git` in a `std::process::Command` path search is an
     established (though not promised) approach in `gix-testtools`,
     including to run `git --exec-path` (to find `git-daemon`).

   - It is not immediately obvious that `exe_invocation` behavior
     is semantically correct for `gix-testtools`, though it most
     likely is reasonable.

     The main issue is that, in many cases where `git` itself runs
     scripts, it prepends the path to the `git-core` directory to
     the `PATH` environment variable for the script. This directory
     has a `git` (or `git.exe`) executable in it, so scripts run
     an equivalent `git` associated with the same installation.

     In contrast, when we run test fixture scripts with a
     `bash.exe` associated with a Git for Windows installation, we
     do not customize its path. Since top-level scripts written to
     use `git` but not to be used *by* `git` are usually written
     without the expectation of such an environment, prepending
     this will not necessarily be an improvement.
@EliahKagan
Copy link
Member Author

However, I wouldn't mind any fix to start with whatever is needed for gix-testtools

I've opened #1712 for this.

@EliahKagan
Copy link
Member Author
EliahKagan commented May 31, 2025

The special-behaviour in std::process::Command is indeed something that could break a lot of people if it ever changes, and if it wasn't intended to kick-in when PATH isn't set, then it might well be considered a bug that will be fixed eventually.

By the looks of it, this does affect gix-command as well, which is exposed to the undocumented behaviour of std::process::Command currently.

It looks like that has happened: rust-lang/rust#137673 makes std::process::Command search the child PATH only when PATH is one of the environment variables that has been set for the child, instead of when any environment variable at all has been set for the child.

As noted in rust-lang/rust#122660 (comment), this change appears in Rust 1.87.0. That also shows how the old behavior can, if necessary, often be restored, by setting the child PATH to the parent PATH explicitly.

I think we do not need to do that. The reasons I think we don't need to do that, in descending order of immediate interest, are:

  1. Nothing seems to have broken for gix-* usage since that change. The new Rust version is used when the test suite runs and for other CI checks (except the MSRV check, of course), and people are using it in production. As far as I know, no new problems related both to gitoxide and to the changed behavior of std::process::Command have been reported.

  2. This issue was fixed for gix-testtools in #1712. One of the motivations for the fix was that a change in std like rust-lang/rust#137673 might eventually be made.

  3. The way we find Git Bash was subsequently improved and seems to cover all common setups (#1864) – though it probably does not cover all significant setups yet (for reasons that overlap #1761 for sh).

  4. When gix-command runs a command with a shell due to an explicit use_shell value of true, it tries to run it with a POSIX-compatible shell sh, which it never attempts to invoke directly with a bash command – even if sh ends up being a symlink, copy, or shim of bash.

    Since c7d06a6 (#1862), it looks for this using gix_path::env::shell(), searching for an sh command provided by Git for Windows and falling back to the simple name sh.exe. Before that, going back to when use_shell was introduced in 8c61b0b (#496), it hard-coded the simple name sh.

    Because there is typically no sh/sh.exe in System32 – and WSL supplies no command within Windows called sh – this has never had the WSL-related problem associated with doing a path search for the bash command.

    However, there is admittedly a case where we would have a problem: if an explicit shell_program value of "bash" is used. It's not obvious what path search behavior the caller would intend, but it would probably be better to avoid the WSL-associated bash.exe.

  5. When gix-command is used with an explicit or implicit use_shell: False to run a text file on Windows that looks like a script with a #! (shebang) line, it attempts to honor the #!. In principle this could cause it to wrongly run the WSL-associated bash command, but I believe that does not currently happen in practice.

    A #! line very well could specify bash, e.g., #!/bin/bash, #!/usr/bin/env bash. But other problems with gix-command shebang handling keep that from working correctly, whether a bash.exe exists in System32 or not. That is, it does not make it far enough to suffer from any WSL-related difficulties.

    I intend to improve how gix-command runs scripts on Windows, including shebang handling, path search, and their overlap. This is actually the main goal I've been working toward with all the shell-related issues and PRs over the last several months. So we will have to figure out whether to adjust the fallback path-search behavior for bash and, if so, how. But I don't think any shebangs that would otherwise work, that people are likely to attempt on Windows, are currently broken because we don't do so.

In addition, if and when we do encounter a problem that seems like it can be worked around by setting the child PATH to the parent PATH, I think we should still prefer to solve it in a different way:

  1. For functionality that returns a std::process::Command object that the caller can further customize – such as let mut cmd: Command = prepare(…).into(); – especially if the caller may be outside of any gix-* crate, this approach is not reliable. The caller may receive the Command object, reset or unset the child PATH, and then use the object to invoke the command, without intending to reprioritize System32.

    use std::process::Command;
    use gix::command::prepare;
    
    let mut cmd: Command = prepare("uname -a") // Or something.
        .command_may_be_shell_script_disallow_manual_argument_splitting()
        .with_shell()
        .shell_program("bash")
        .into();
    cmd.env("PATH", "/usual/child/path"); // A
    // Do some stuff...
    if /* one weird situation where we don't want to customize the child path */ {
        cmd.env_remove("PATH"); // B
    }
    // Do some more stuff...
    cmd.output()

    Due to A, directories are no longer duplicated from the parent path into the child path, so if the bash executable is not found in the child path, System32 will be searched sooner than intended. (A causes this even without B.) In the rarer but still plausible B, the caller is trying merely to conditionally undo a change made by the caller, but ends up also undoing the change gix-command made to cause System32 to be deprioritized. This is not exhaustive.

  2. Even aside from reliability, this approach is not strong enough to do what we would want it to do. As detailed in GitoxideLabs/gitoxide#1359 (comment), in most Windows environments, the System32 directory does appear in PATH, and appears early enough that a search through only PATH directories will find its WSL-associated bash first. But if we prefer to use a bash associated with the Git installation, then we should skip the one in System32 regardless of whether or where System32 appears in PATH.

    (There are good reasons System32 is usually early in PATH. While CreateProcessW searches other directories, including System32, before it searches PATH directories, shells in Windows such as cmd and powershell, as well as various applications, do not. I think Windows users and developers implicitly expect any kind of path search to find some programs that are in System32, such as cmd.exe, even if something of the same name resides elsewhere.)

Instead, there are alternatives that would overcome those problems and that make sense given what we are already doing:

  • For the goal of first (or only) searching PATH, we can do the search explicitly ourselves, at least on Windows. gix-command already does an explicit PATH search on Windows to look up Prepare::command, though it interprets PATH in a subtly different way from what is expected on Windows. (I hope to improve that soon.) One of the effects of this seems to be that the fallback behavior of passing a simple name for Windows path lookup is relied on slightly more often than anticipated. This could be refined, applied to other cases – such as gix_path::env::shell() and Prepare::shell_program – and the fallback behavior eliminated or done under fewer circumstances.

  • For the goal of avoiding the WSL-associated bash in System32, if we are already doing an explicit PATH search, we can special-case looking up bash and bash.exe, skipping the System32 directory, whose path can be obtained from the system. (That is, this approach need not rely on the potentially brittle assumption that it is has the usual value of C:\Windows\System32.)

    We could instead look inside the bash.exe that is found to examine its metadata and discern if it is the Microsoft-signed WSL-associated bash.exe. But I would not rush to do that. Because having a WSL-associated bash.exe in System32 causes problems, it is recommended to use wsl.exe instead, and Microsoft may eventually remove bash.exe from System32, in some future version of Windows. Then users may sometimes place the bash.exe that they do want used by default in System32. If that becomes common, then distinguishing the cases could be justified, but I think currently it is not.

Such changes can probably wait until after greater unification of path lookups and shell execution across gix-path, gix-command, and gix-testtools. (Such unification may include changes such as gix-path providing a means to look up the Git-associated bash that works analogously to gix_path::env::shell() for the Git-associated sh, as is already almost there.)

@Byron
Copy link
Member
Byron commented Jun 1, 2025

Thanks a lot for the exhaustive summary! Without claiming I have grokked it all, I think I understand enough to be looking forward to the anticipated changes to how gix-command operates.

5D8E

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
acknowledged an issue is accepted as shortcoming to be fixed C-Windows Windows-specific issues help wanted Extra attention is needed
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants
0