8000 Cache z3 results · Issue #464 · a16z/halmos · GitHub
[go: up one dir, main page]
More Web Proxy on the site http://driver.im/
Skip to content

Cache z3 results #464

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and 10000 privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
iirekm opened this issue Feb 20, 2025 · 5 comments
Open

Cache z3 results #464

iirekm opened this issue Feb 20, 2025 · 5 comments
Labels
enhancement New feature or request

Comments

@iirekm
Copy link
iirekm commented Feb 20, 2025

z3 can be slow, very slow when codebase grows. On-disk caching would help a lot during development (only new things would be computed).
I tried to do ad-hoc caching with the following wrapper around z3:

#!/bin/bash -e

# Cache directory
CACHE_DIR="$HOME/.z3cache"
mkdir -p "$CACHE_DIR"

# Check if SMT file is provided
if [ "$#" -ne 1 ]; then
    echo "Usage: $0 <smt-file>" >&2
    exit 1
fi

SMT_FILE="$1"

# Ensure the file exists
if [ ! -f "$SMT_FILE" ]; then
    echo "Error: File '$SMT_FILE' not found!"  >&2
    exit 1
fi

# Generate a hash of the SMT file for caching
FILE_HASH=$(sha256sum "$SMT_FILE" | awk '{print $1}')
CACHE_FILE="$CACHE_DIR/$FILE_HASH.result"

# Check if result is already cached
if [ -f "$CACHE_FILE" ]; then
    echo "[CACHE HIT] Returning cached result."  >&2
    cat "$CACHE_FILE"
    exit 0
fi

cat "$SMT_FILE" >"$CACHE_DIR/$FILE_HASH.smt"

# Run Z3 and cache the result
echo "[CACHE MISS] Running Z3..." >&2
"$0".orig "$SMT_FILE" | tee "$CACHE_FILE"

It almost works, but the problem is that the variable identifiers change from run to run (e.g. it's p_x_uint256_9fdce11_00 one run, and p_x_uint256_7721c25_00 another). How to make those identifiers deterministic? From what piece of code do they come from?

In long term a better solution could be made, e.g. with popular joblib.Memory library.

@iirekm iirekm added the enhancement New feature or request label Feb 20, 2025
@iirekm
Copy link
Author
iirekm commented Feb 20, 2025

I made an experimental pull request for that. It needs lots of improvements, but in general - it works.

@0xkarmacoma
Copy link
Collaborator

I see, the idea is to build a global cache that works across runs. That's a cool idea! diskcache in particular seems really well suited for that (and I see it has a method to ensure the cache doesn't grow too big)

the random-look parts in identifiers comes from the uuid library, they change across runs because it is based on the current time. We may be able to seed it or switch to a deterministic id generation mechanism, wdyt @daejunpark ?

@iirekm
Copy link
Author
iirekm commented Feb 22, 2025

The random part also comes from .to_smt2() method (replacing with .sexpr() seems to fix it), and also from somewhere else in the code (I couldn't figure out where, but Z3 queries generated from run to run do differ sometimes). The final result is that the current code in PR does improve speed in some examples (e.g. ArithTest), but even worsens in other (due to unresolved all sources of randomness, and probably the fact that Z3 is called even thousands of times per run so the diskcache library becomes performance bottleneck itself).

@daejunpark
Copy link
Collaborator

@iirekm thanks for your initiatives! it's a great idea overall. indeed, we've been thinking about this, but haven't had the chance to work on it.

regarding the two points you mentioned:

  • deterministic variable names: when a test function is executed (or a user function is executed using createCalldata()), symbolic argument values are passed to the function where symbol names are generated based on their parameter names. since the same function may be called multiple times, these names need to be unique across different calls. currently, uuid() is used for simplicity, but a global (or path-specific) counter could be considered to make them deterministic.
  • to_smt2() vs sexpr(): sexpr() has a critical performance issue, when symbolic terms are deeply nested and reused. for example, given a term t * t where t = a + b + c + d + e, to_smt2() generates (let (t (+ a b c d e)) (+ t t)), whereas sexpr() generates (* (+ a b c d e) (+ a b c d e)). as a result, sexpr() terms may grow exponentially, which significantly impacts performance for some queries.

@0xkarmacoma
Copy link
Collaborator

the diskcache library becomes performance bottleneck itself

we could probably use some threshold strategy, for instance only cache queries that took >1s to solve. That way, we would avoid filling the diskcache with queries that can be solved fast.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

3 participants
0