-
Notifications
You must be signed in to change notification settings - Fork 1.8k
conda extremely slow #7239
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
Conda will never be as fast as pip, so long as we're doing real environment solves and pip satisfies itself only for the current operation. A more appropriate comparison for performance is yum or apt-get. Unless we've had a major regression, improving performance is a (perfectly legitimate) feature request, not a bug report. |
I guess.. #6174 looked interesting, but 30s+ to install a package is not ideal. Even OS packagers like yum or aptitude aren't that slow... |
Yeah not saying there’s not room to improve. There is. A lot. But setting performance expectations based on the current pip isn’t reasonable, and is the wrong target. Benchmarking against yum and apt-get is *closer*, but there we’re even doing substantially more work.
Except for filesystem issues on Windows, I think our actual uninstall-install transaction times are reasonable. The areas we should target are probably (1) better approaches to repodata management, (2) faster SAT implementation, (3) concurrent download and extract.
…Sent from my iPhone
On May 2, 2018, at 9:12 AM, Chris Withers ***@***.***> wrote:
I guess.. #6174 looked interesting, but 30s+ to install a package is not ideal. Even OS packagers like yum or aptitude aren't that slow...
—
You are receiving this because you commented.
Reply to this email directly, view it on GitHub, or mute the thread.
|
Obviously, performance improvements are/will be an ongoing goal. But this issue doesn't add any value and can be closed, IMHO. Apart from the initial download and extraction (which, currently, can mostly only be marginally (meaning by a low constant factor) be improved by parallelization) that
takes < 5 seconds on my machine. Without knowing what "existing environment" from the problem description
means, there isn't really much to analyze/improve here. |
@kalefranz - I don't think I actually mentioned pip, that was you ;-) @mbargull - here's a more useful set of example:
This was up near a minute before the packages were downloaded, but even with them all in the cache, that's still 15s. Okay, let's try a more normal scipy stack:
Over a minute now, but I have more sympathy here as it's a more complete environment and there were some downloads in there... Now here's the kicker: into that test2 environment, installing one extra package:
I'd be intrigued to know why what takes you 5s takes me 11s. My suspicion is still that this is solver slowness, which would explain why in my more complicated production environment this jumps up to 30-40s. |
I agree that problem in particular sounds concentrated in the solver rather than other areas of the code base. It's hard to tell though without actual profiling data. There's that likely explanation, than a few dozen other edge case explanations that could all be in play. Conda interacts with a lot of surface area. |
If you have profiling commands I can usefully run, more than happy to do so :-) |
For anything north of 20 s or so (and excluding the time during download and extraction of packages), I agree it's likely time spent by the solver. I'd be intrigued to know why what takes you 5s takes me 11s. That's likely because I didn't include See the following output for execution times that nearly solely consist of index updates -- without and with
|
The best way to isolate repodata handling for benchmarking, etc, is to use `conda search`.
To isolate the solver use `conda create -nt --dry-run` and set `local_repodata_ttl` to something like 3600.
…Sent from my iPhone
On May 8, 2018, at 3:16 PM, Marcel Bargull ***@***.***> wrote:
For anything north of 20 s or so (and excluding the time during download and extraction of packages), I agree it's likely time spent by the solver.
I'd be intrigued to know why what takes you 5s takes me 11s.
That's likely because I didn't include conda-forge in the channel list. conda-forge is a huge channel which gets updated very frequently. Hence, usually conda will need to download and parse the channel's repository index. The is some room for performance improvements regarding the parsing, but that might mean working against the generic and tidy concept that code uses, which wouldn't be very nice. For overall performance improvements, there are other parts (mainly solver) that provide more potential for gains 😉.
See the following output for execution times that nearly solely consist of index updates -- without and with conda-forge added.
(Those CONDA_CHANNELS= lines are to preload the less often updated parts of the defaults channel.)
The gist is that adding conda-forge adds approx. 6 seconds to the index update phase -- which is exactly the 5 to 11 seconds delta. Looks like our setups perform surprisingly similarly 😄
unset conda
set -x
conda --version
conda clean -yi
time conda create --dry-run -vvnx 2>&1 | grep 'make_request'
time conda create --dry-run -vvnx -c conda-forge 2>&1 | grep 'make_request'
conda clean -yi
time conda create --dry-run -vvnx -c conda-forge 2>&1 | grep 'make_request'
conda clean -yi
CONDA_CHANNELS="$(conda config --show default_channels | sed -n 's/^ - //p' | grep -v 'main$' | paste -sd,)" \
conda create --dry-run -nx 2>&1 >/dev/null
time conda create --dry-run -vvnx 2>&1 | grep 'make_request'
conda clean -yi
CONDA_CHANNELS="$(conda config --show default_channels | sed -n 's/^ - //p' | grep -v 'main$' | paste -sd,)" \
conda create --dry-run -nx 2>&1 >/dev/null
time conda create --dry-run -vvnx -c conda-forge 2>&1 | grep 'make_request'
time conda create --dry-run -vvnx 2>&1 | grep 'make_request'
time conda create --dry-run -vvnx -c conda-forge 2>&1 | grep 'make_request'
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub, or mute the thread.
|
I agree that number is at the point where we should start considering improvements. Some type of repodata diff scheme is an obvious option. Or we could consider sharding by package name. |
Oh. The Dist removal that just got merged into master might cut 2 sec off the 6. |
Good point, but I was mostly trying to replicate/showcase the most common use case, i.e.
Didn't know about
Nice! Have yet to try |
Sorry, can't confirm that statement:
Looks like it's more than 2 seconds 😉. Good job!!! |
🎉🎉🎉 |
Dear Conda developers, I have three different osx machines where I use conda, one of them brand new, and conda is slow enough that I would ditch it if there was a halfway decent alternative. I work on several different projects and need to switch and create environments constantly. It is not acceptable to say, "oh it's probably your system config". Conda should work out of the box on the machines people actually use. If conda-forge has gotten too large, take it out of the defaults. At least add some documentation about performance. Downloading packages is fast, but |
@erbas conda version 4.6 included some fixes regarding speed improvements. Could you check
And test if this version does indeed improve your specific problems? |
Thank you! This version is definitely quicker for things like `conda list`, but `Solving environment` is still frustratingly slow.
… On May 20, 2018, at 11:42 AM, Gonzalo Peña-Castellanos ***@***.***> wrote:
@erbas <https://github.com/erbas> conda version 4.6 included some fixes regarding speed improvements. Could you check
conda install conda -c conda-canary
And test if this version does indeed improve your specific problems?
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub <#7239 (comment)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/ADOEGzg0sFn3XYwNAUK9YbzmECurk0rXks5t0bkUgaJpZM4Tveso>.
|
@erbas indeed optimization went to commands like |
The @erbas Can you please provide the output if |
Also confirm that "Solving environment" is painfully slow on a modern machine (>1 minute). I have conda-forge, intel, and defaults as channels. Perhaps this is too much? |
Also confirmed that seeing slowness for doing a simple conda install of a local package. I only have default as a channel.
|
Indeed Please keep the work on optimization and keep us in touch. |
Although not recommended, I use the following in my
The |
|
Nehal maybe we could get some profiling/timing numbers? The hashing does add some extra time, although I doubt that’s where most users get tripped up here. It definitely makes a difference when building a lot of packages. For most users I think setting The settings you have there are EXTREMELY dangerous and SHOULD NOT be used by most users. They disable all current and future security guarantees built into conda, and they also disable transactional rollbacks, so your environment will be completely hosed if something goes wrong. |
Haven’t been doing much data science lately, although I work with Python on a daily basis, but curious to understand what is exactly anaconda solving that pip/pipenv/poetry can’t?
What’s a minimal ml/data science example where the environment of easily solved by anaconda and not by the aforementioned package managers?
…
On 25 Feb 2021, at 22:15, Sylvain Corlay ***@***.***> wrote:
Solving your environment with mamba is almost instantaneous.
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub, or unsubscribe.
|
I was thinking that I can't count download time (it's instantaneous if the packages are already there and depends on internet speed), but it took around 3 minutes to solve the environment and start downloading. |
@okomarov -- how about an environment for bioinformatics that uses python, gatk, and samtools? Or, all of your packages and a single r package? Basically, any environment that includes python stuff AND a few things that aren't python, but need to meet the same dependencies. |
@SylvainCorlay I'm jealous. I cannot get mamba to work at all because of "Problem with the SSL CA cert" messages. I've filed an issue with that project (the problem happens often, but not always... which is weird). @abalter I think mine is going a significantly faster since I set strict channel priority today:
Not fast, but less slow (37 minutes is a lot better than it was):
|
Oh wait... after 37 minutes, it eventually just died without creating an environment. So that's not ideal. Why the search started considering pypy versions I haven't the faintest idea. The environment.yml is exactly the one listed a couple comments up. No mention of pypy whatsoever in it. |
Ah, ah... There is complains about taking more than 30s. I would like this, it is very short time for me. When package builds, installing takes hours too, when it works. When it don't work we retry and it works... Conda versions:
Here is the output of conda-build for python 3.7:
|
@ppoilbarbe you can try using It will definitely fail a long, long time before hitting 24 hours. Also, I would recommend using strict-channel-priority, and removing defaults if possible. That should help a lot, too. There are some incompatibilities between defaults and conda-forge. |
Thanks... it worked... in less than 10 minutes! |
Wow, from 24 hours to 10 minutes? That's a pretty impressive speedup cc @humitos |
Cool, that's great to hear! |
Could someone help me with a workaround by mamba? I try to upgrade python 3.8.10 => 3.9.5 in my root environment.
Have someone got a similar problem? |
@mateusz91t you need to get rid of the Anaconda distribution in your base environment ... it pins too many packages. It's better to have a small base environment and create new environments for things you want to use. For example, start with mambaforge: https://github.com/conda-forge/miniforge#mambaforge and then create a new environment with all anaconda packages using |
Thanks @wolfv, your solution solved my problem with python's version, but now I have the same problem with newer version of Spyder ;p below result from fullanaconda env
|
I guess I'm lucky, as I've so far waited about 2 hours for |
The real folks would have a better answer, but I think the problem is that solving the package dependencies is NOT a process that can be multi-threaded---it is a vast network problem. Downloading will just depend on your bandwidth, and even on my 50 Kb/s home network never makes me wait very long at all. So I doubt there would be much to gain anyway. Besides, the bottleneck is bandwidth/IO. I suggest you try |
I'm not "real folks" either, but I think the fundamental issue is that this is a constraint satisfaction problem, and in general, those can take an exponential amount of time to run. (And indeed, running time can vary wildly depending on seemingly insignificant changes in the input.) This implies that multi-threading isn't likely to be much help--one tenth of forever is still forever. I've hit this occasionally, and what has worked is trying some of the advice on the conda pages. In particular, it helps to prune the set of possibilities being considered. Perhaps you can provide some hints on what you already "know" you want. For example, you might "know" that you prefer a setup based on Python 3.6 or later, even if there is some theoretical combination of packages out there that might seem better to the algorithm, even though they're based on Python 3.1 (or 2.6). Add some constraints and see if it helps. |
I still got this issue. This issue is closed. is there any solution here? |
@jingpengw One alternative is to use mamba to manage your conda environments. |
I started fresh: deleted Anaconda & everything. Downloaded & installed fresh Anaconda. Opened the Anaconda terminal and the first command I executed was Is this caused by Anaconda loading up everything in the world? Should I install Anaconda (to get all the pkgs installed), then run bare Python, and create a fullanaconda environment to install Anaconda? But I'm not sure how to install Anaconda in an existing environment. Should I dump Anaconda and just install the packages I need? BTW Anaconda itself seems a bit hosed. In my previous install, and in my new fresh install, it wants to update Anaconda to 2.2.0. Last time I tried this, I let it spin for 12 hours before giving up. Related problem? |
@garyfritzz I would heavily recommend to start with a smaller distribution, such as Miniforge / Mambaforge: https://github.com/conda-forge/miniforge |
My So I think I'm done with Anaconda. I deleted Anaconda and installed Mambaforge. That worked fine, but when I tried to install numpy I got
I'm not sure why it wanted to install under C:\ProgramData, but that was the default location. So I uninstalled it and installed basic Miniforge under C:\. As before, first command I ran was Question: I hate using the Windoze command prompt for the Miniforge prompt. I have a bash I can use (Rtools bash) or I could find another one. But how do I get the conda envs stuff to work in bash on Windows? Or am I better off to use pip and pipenv? |
@garyfritzz : At our site, we do a base install of 'miniconda' only. Then, we religiously install packages into conda environments, leaving the base environment completely clean. I think this helps a lot with speeding up the constraint-satisfaction step, as well as making like easier overall. |
@michaelkarlcoleman: Hm OK. I'm still learning how conda works. I had thought I would put my "default" packages in base, and then just add on specific stuff in separate environments. But from what I see, I get the impression it fully installs EVERYthing in each env, not taking advantage of the "default" stuff in base. Well installing miniconda is pretty easy. I can always uninstall it and do the envs right the next time. |
It's not entirely obvious, but everything in the 'base' (default) environment leaks through to all other environments, at least to a degree. So keeping it small probably saves some grief. That said, even with this, some combinations of packages take a long time to resolve. I think bioconda is somewhat notorious for this. In these cases, it can sometimes help to limit the versions of some packages, which limits the number of combinations that the constraint solver has to consider. (If you limit too much, though, you can end up with an unsolvable specification.) |
Well all I added to the base env is mamba, jupyterlab, pandas, pandas-datareader, numpy, matplotlib. Hopefully that doesn't overdo it. There were 40 packages after initial install, 196 now ... |
In 2023 we have a better story for the Python conda installer. Users can add the conda-libmamba-solver package, we have added parallel package download and extraction, repodata.json parser improvements, and the |
at this point I have just ditched the conda ecosystem because of extended problems since long time ago. It was nice talking to you all. unsubscribing to this topic. Bye. |
Uh oh!
There was an error while loading. Please reload this page.
I'm submitting a...
Current Behavior
conda operations appear to be particularly slow, this seems to have got worse as the versions have progressed.
Steps to Reproduce
The above currently takes 30-40s into an existing environment.
Expected Behavior
Hopefully done and dusted within a second or two.
Environment Information
`conda info`
The text was updated successfully, but these errors were encountered: