8000 Speed up the best-case scenario of dependency resolution. by ugodiggi · Pull Request #37 · pex-tool/pex · GitHub
[go: up one dir, main page]
More Web Proxy on the site http://driver.im/
Skip to content

Speed up the best-case scenario of dependency resolution. #37

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 1 commit into from
Jan 30, 2015

Conversation

ugodiggi
Copy link
Contributor

A few small touchups that improve the speed of dependency resolution when
many packages are already in the local cache.

  • when dispatching multiple crawling threads:
    • make each thread a bit more reactive by reducing the polling timeout from
      100ms to 10ms
    • do not wait for each of the workers threads to complet, just wait for them to complete their
      workload.
      Without this change, dependency resolution is guaranteed to take longer than 100ms per
      dependency, which is a large amount of time for just checking a local zipfile's content.
  • cache the result of a couple calls that are repeated many times:
    • Link.from_filename
    • Package.from_href
      Each of this call is performed for each file in the cache, for each dependency that is resolved.
      While both these calls are not especially expensive, when we repeat them n^2 times in a largish
      local cache * set of dependencies they do add up.

Somewhat unscientific benchmarking on my system show that the average time for resolving a single
dependency (namely 'pytz==2013b') goes down from 150ms to 30ms.

Running the modified code on the urbancompass codebase produced similarly desireable timings.

@ugodiggi ugodiggi force-pushed the fully_cached_performance_improv branch 2 times, most recently from 3a1ac2b to a8841a4 Compare January 29, 2015 21:46
A few small touchups that improve the speed of dependency resolution when
many packages are already in the local cache.

- when dispatching multiple crawling threads:
  - make each thread a bit more reactive by reducing the polling timeout from
    100ms to 10ms
  - do not wait for each of the workers threads to complet, just wait for them to complete their
    workload.
  Without this change, dependency resolution is guaranteed to take longer than 100ms per
  dependency, which is a large amount of time for just checking a local zipfile's content.

- cache the result of a couple calls that are repeated many times:
  - Link.from_filename
  - Package.from_href
  Each of this call is performed for each file in the cache, for each dependency that is resolved.
  While both these calls are not especially expensive, when we repeat them n^2 times in a largish
  local cache * set of dependencies they do add up.

Somewhat unscientific benchmarking on my system show that the average time for resolving a single
dependency (namely 'pytz==2013b') goes down from 150ms to 30ms.

Running the modified code on the urbancompass codebase produced similarly desireable timings.
@ugodiggi ugodiggi force-pushed the fully_cached_performance_improv branch from a8841a4 to b660f1f Compare January 30, 2015 02:30
@wickman
Copy link
Contributor
wickman commented Jan 30, 2015

Only minor nit is that the package memoizer should be flushed each time we call .register, but this is purely pedantic. Will merge as-is.

wickman added a commit that referenced this pull request Jan 30, 2015
Speed up the best-case scenario of dependency resolution.
@wickman wickman merged commit 258fc73 into pex-tool:master Jan 30, 2015
@Yasumoto
Copy link
Contributor

This is awesome! Thanks @ugodiggi !!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants
0