-
Notifications
You must be signed in to change notification settings - Fork 42
mirror_ocp_release: fixes for concurrent jobs #626
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
from change #626:
|
1 similar comment
from change #626:
|
@@ -11,20 +11,26 @@ | |||
when: | |||
- mor_force or not _mor_target.stat.exists | |||
block: | |||
- name: "Extract installer and metadata from release image" | |||
ansible.builtin.shell: > | |||
flock -x {{ mor_cache_dir }}/{{ mor_version }}/release_extract.lock -c ' |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
With this new approach, use of filesystem locks is not needed anymore.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Have you considered scenarios where two jobs run concurrently, both extracting to a temporary location and writing to mor_cache_dir
? How do you prevent conflicts in such cases? Implementing a mechanism like a lock might help avoid these issues.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
that's a fair point. I was accepting the risk of facing such scenarios provided moving files around the filesystem is must faster than running the operations directly on the mor_cache_dir.
But at this point it's right what we could just limit the protected zone to the task were we copy the files to the cache directory once they have been processed in the temporary directory.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
here comes another thought. The way this implementation works, the problematic task would be when we run the copy module to move the files from the temporary directory to the cache directory. Alternatively or in combination of the lock usage, we can add the parameter "force: false" to the module call, so ansible won't replace a file that already exists, even if the contents are different.
The question here is whether it'd be safe to assume the artifact won't change between jobs deploying the same OCP release.
My guess is the files won't change if the jobs are running concurrently, but would those artifacts change between jobs running with, say, days of difference?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Unfortunately, I don't find a way of implementing locks that would allows us to run ansible tasks in the locked zone.
In other words, when using locks the lock code must be part of the same shell script.
{{ mor_oc }} adm release extract | ||
--registry-config {{ mor_auths_file }} | ||
--command={{ mor_installer }} | ||
--from {{ mor_pull_url }} | ||
--to "{{ mor_cache_dir }}/{{ mor_version }}"; | ||
--to "{{ _mor_tmp.path }}"; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Artifacts are first extracted into the temporary directory for the job.
get_checksum: false | ||
register: target | ||
when: | ||
- not mor_force |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since we're extracting the artifacts on a temporary directory, the file won't exist in advance.
Build succeeded. ✔️ dci-rpm-build-el8 SUCCESS in 2m 43s |
- name: Copy artifacts to release directory | ||
ansible.builtin.copy: | ||
src: "{{ _mor_tmp.path }}/" | ||
dest: "{{ mor_cache_dir }}/{{ mor_version }}" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
At the end of the job execution the artifacts are copied to the job's target directory. For this, we use a regular task at the end of the role instead of handlers, so we don't wait for the end of the play to copy the artifacts.
7e5d9f5
to
e7675e7
Compare
from change #626:
|
1 similar comment
from change #626:
|
state: directory | ||
prefix: mor- | ||
register: _mor_tmp | ||
notify: "Remove temporary directory" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We use a handler to remove the temporary directory so we make sure it's run at the end of the play.
The goal is to have it running on successful and failed jobs. For this to work, the play calling this role must activate the flag "force_handlers: true", otherwise the handler will only be run on success.
We chose this approach to a block with an "always" section, so we don't need to add extra tasks to track the failed step properly.
Build succeeded. ✔️ dci-rpm-build-el8 SUCCESS in 2m 53s |
e7675e7
to
903f001
Compare
from change #626:
|
1 similar comment
from change #626:
|
@@ -35,4 +27,5 @@ | |||
ansible.builtin.command: /usr/sbin/restorecon -R "{{ mor_dir }}/{{ mor_uri | basename }}" | |||
become: true | |||
when: _mor_selinux.rc == 0 | |||
# we may need to run this task over the target directory rather than mor_dir (= _mor_tmp.path) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In the original scenario, the artifacts are extracted into an httpd served directory, so restoring the contexts is needed for the files to be properly served. Restoring the contexts on the temporary directory may not have any effect.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually, the next tasks in the artifacts.yml file after including the fetch.yml set new the new context container_file_t on the extracted artifacts. This I assume is done from here so it'll be applied to both, OCP versions greater or equal than 4.8 and lower than 4.8. But that means the tasks in fetch.yml are not needed.
Build succeeded. ✔️ dci-rpm-build-el8 SUCCESS in 2m 48s |
903f001
to
11767d3
Compare
from change #626:
|
1 similar comment
from change #626:
|
- name: "Apply new SELinux file context to file" | ||
ansible.builtin.command: /usr/sbin/restorecon -R "{{ mor_dir }}/{{ mor_uri | basename }}" | ||
become: true | ||
when: _mor_selinux.rc == 0 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Restoring the selinux context does not make sense when extracting the artifacts on a temporary directory.
Also, the first tasks in artifacts.yml after including fetch.yml override the context and set it to container_file_t, which should be valid even after moving the artifacts to the target directory served from the cache container.
A different discussion is whether these tasks should be run before or after copying the artifacts to the target directory.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We have to restore this block of code, since fetch.yml is also included from images.yml to pull the disk image directly into the cache store (version directory ignored) so then it's directly served by the cache container.
Build succeeded. ✔️ dci-rpm-build-el8 SUCCESS in 2m 46s |
11767d3
to
4995c13
Compare
from change #626:
|
1 similar comment
from change #626:
|
4995c13
to
13e17c5
Compare
from change #626:
|
1 similar comment
from change #626:
|
Build succeeded. ✔️ dci-rpm-build-el8 SUCCESS in 2m 49s |
13e17c5
to
bd32593
Compare
19e8084
to
b3dc5c9
Compare
from change #626: |
Build succeeded. ✔️ dci-rpm-build-el8 SUCCESS in 2m 50s |
b3dc5c9
to
89d08c8
Compare
from change #626: |
Build succeeded. ✔️ dci-rpm-build-el8 SUCCESS in 2m 51s |
src: "{{ _mor_tmp.path }}/" | ||
dest: "{{ mor_cache_dir }}/{{ mor_version }}/" | ||
mode: preserve | ||
force: false |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
By setting the "force" parameter to "false" we prevent the module to get a file into an inconsistent state if the file already exists.
This is interesting, for instance, to prevent binary execution exceptions if modification of the binary file are detected during the execution.
There are some concerns regarding this approach, though:
- if force is set to true, existing files are only replaced if changed, which should not happen between files belonging in the same release number.
- since Ansible copy module uses atomic moves, the target file path should not suffer hash code changes during the copy process, so it shouldn't be possible for Ansible to mark a file as replaceable just because it's content is in an unstable state yet.
- thus, the force: false option is only an extra precaution we take.
- this feature would prevent files to be updated if they suffer any modification even within the same ocp release number.
89d08c8
to
355027e
Compare
Build succeeded. ✔️ dci-rpm-build-el8 SUCCESS in 2m 49s |
355027e
to
ceeec63
Compare
Build succeeded. ✔️ dci-rpm-build-el8 SUCCESS in 2m 46s |
ceeec63
to
33a30f4
Compare
from change #626: |
Build succeeded. ✔️ dci-rpm-build-el8 SUCCESS in 2m 48s |
33a30f4
to
eb3c103
Compare
Build succeeded. ✔️ dci-rpm-build-el8 SUCCESS in 2m 46s |
from change #626: |
This PR is stale because it has been open 30 days with no activity. Remove stale label or comment or this will be closed in 7 days |
WalkthroughThis update refactors several Ansible tasks in the Changes
Sequence Diagram(s)sequenceDiagram
participant Ansible
participant TempDir
participant CacheDir
Ansible->>TempDir: Extract OCP installer and metadata
Ansible->>TempDir: Conditionally extract rhcos.json
Ansible->>TempDir: Fetch rhcos.json for <4.8 (if needed)
Ansible->>TempDir: Check SELinux status
Ansible->>CacheDir: Atomically copy artifacts with rsync, apply SELinux context & permissions (with lock)
Ansible->>TempDir: Remove temporary directory
Poem
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. 🪧 TipsChatThere are 3 ways to chat with CodeRabbit:
SupportNeed help? Create a ticket on our support page for assistance with any issues or questions. Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments. CodeRabbit Commands (Invoked using PR comments)
Other keywords and placeholders
CodeRabbit Configuration File (
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
🧹 Nitpick comments (7)
roles/mirror_ocp_release/tasks/fetch.yml (1)
30-31
: Review SELinux context task changed_when logic.
Usingchanged_when: _mor_restorecon.rc == 0
will mark the task as changed wheneverrestorecon
succeeds, even if no context modifications occurred. Consider adding idempotency checks (e.g.,changed_when: _mor_restorecon.stdout
) or using theansible.posix.selinux
module for more accurate change reporting.roles/mirror_ocp_release/tasks/registry.yml (2)
18-26
: Avoid register variable collisions and improve readability.
Theregister: _mor_result
is reused here and again in the manifest generation task, which can lead to confusion and unintended overrides. Consider using distinct register variables (e.g.,_mor_mirror_result
for this task) and splitting inline conditional flags for clarity:- --from={{ mor_ocp_release_data['container_image'] | quote }} + --from={{ mor_ocp_release_data['container_image'] | quote }} @@ - --to={{ mor_registry_url }}/{{ mor_registry_path }}{%- if mor_allow_insecure_registry | bool %} - --insecure{%- endif %} + --to={{ mor_registry_url }}/{{ mor_registry_path }} + {% if mor_allow_insecure_registry | bool %}--insecure{% endif %} @@ - register: _mor_result + register: _mor_mirror_result - until: _mor_result.rc == 0 + until: _mor_mirror_result.rc == 0 - changed_when: _mor_result.rc == 0 + changed_when: _mor_mirror_result.rc == 0
40-44
: Correct whitespace in mirror_instructions and separate flag templating.
The variablemirror_instructions
includes an unintended leading space, which may introduce extra whitespace in the command. Refactor as:- mirror_instructions: "--print-mirror-instructions= {{ mor_is_type | lower }}" + mirror_instructions: "--print-mirror-instructions={{ mor_is_type | lower }}"Then use it cleanly in the command:
{{ mor_version is version("4.14", "<") | ternary("--dry-run", mirror_instructions) }}
This removes extraneous spaces and keeps the flag templating clear.
roles/mirror_ocp_release/tasks/facts.yml (2)
67-75
: Maintain backward compatibility for <4.8 releases
The< 4.8
loop mirrors the ≥4.8 logic with appropriate URI construction and fallback defaults, preserving legacy support.Consider renaming
add_item
to something likeimage_fact
for clearer intent within the loop.
112-116
: Atomic OS image URL overrides
Buildingbootstraposimage
,clusterosimage
, andmetalosimage
fromrhcos_images
guarantees correct artifact references. Verify thatmor_webserver_url
consistently excludes a trailing slash to avoid double-slashes in generated URLs.Optionally, apply
| urlencode
on thesha256
query parameters if downstream URL parsers require strict encoding.roles/mirror_ocp_release/tasks/artifacts.yml (2)
21-32
: Use dedicated temp dir for installer extraction
Extracting the OCP installer into_mor_tmp.path
with retries andchanged_when
makes this step robust.Consider distinct register names (e.g.,
_mor_installer_extract_res
vs_mor_metadata_extract_res
) to avoid overwriting logs and aid debugging.
81-93
: Atomic copy with locking and permissions
Theflock + rsync
wrapper secures the cache directory against concurrent writes and applies SELinux and executable-bit adjustments in one atomic operation.Since this is local, consider dropping
-z
fromrsync -avz
to avoid unnecessary compression overhead.Review the indentation inside the single-quoted
flock -c
block to ensure no unintended whitespace alters command parsing.
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
Cache: Disabled due to data retention organization setting
Knowledge Base: Disabled due to data retention organization setting
📒 Files selected for processing (7)
roles/mirror_ocp_release/tasks/artifacts.yml
(3 hunks)roles/mirror_ocp_release/tasks/facts.yml
(3 hunks)roles/mirror_ocp_release/tasks/fetch.yml
(2 hunks)roles/mirror_ocp_release/tasks/files.yml
(1 hunks)roles/mirror_ocp_release/tasks/images.yml
(1 hunks)roles/mirror_ocp_release/tasks/main.yml
(1 hunks)roles/mirror_ocp_release/tasks/registry.yml
(2 hunks)
🔇 Additional comments (9)
roles/mirror_ocp_release/tasks/files.yml (1)
16-17
: Consistent internal variable naming for download results.
Renamed the registered variable to_mor_downloaded
and updated the retry condition accordingly to follow the underscore-prefixed convention for internal task results.roles/mirror_ocp_release/tasks/main.yml (1)
15-23
: Consistent internal variable naming for authentication file check.
Renamed the registered variable to_mor_auths_file_check
and updated its reference in the assertion to maintain consistency with underscore-prefixed internal variables across tasks.roles/mirror_ocp_release/tasks/images.yml (1)
5-6
:✅ Verification successful
Update RHCOS image data variable references.
The task now correctly referencesmor_ocp_release_data['rhcos_images']
instead of the oldocp_release_data
. Ensuremor_ocp_release_data
is defined infacts.yml
to avoid undefined variable errors.
Run the following to confirm no leftoverocp_release_data
references:
🏁 Script executed:
#!/bin/bash # Confirm renaming of release data variable rg -n "ocp_release_data" -C2 roles/mirror_ocp_release/tasksLength of output: 3787
Review complete:
mor_ocp_release_data
is correctly defined and noocp_release_data
remains
Confirmed thatmor_ocp_release_data
is set inroles/mirror_ocp_release/tasks/facts.yml
(lines 26–30 & 72–75) and a ripgrep search shows no leftoverocp_release_data
references inroles/mirror_ocp_release/tasks
.roles/mirror_ocp_release/tasks/registry.yml (1)
6-11
: Add idempotent change detection for image availability check.
Thechanged_when: _mor_release_image.rc == 0
marks the inspect as changed on success, whilefailed_when: false
prevents failures if the image is already present. This correctly avoids playbook failures on pullable images.roles/mirror_ocp_release/tasks/facts.yml (2)
16-22
: Leverage slurp for robust JSON retrieval
Switching fromcat
/command
toansible.builtin.slurp
ensures idempotent, platform-agnostic reads ofrhcos.json
. Theno_log: true
flag appropriately hides sensitive content.
26-32
: Unify JSON parsing and fact naming
The introduction ofrhcos_json
(decoding_mor_rhcos.content
) and consolidation undermor_ocp_release_data
aligns with the role’s conventions. Thecombine
withdefault({})
correctly accumulates per-item entries.Please confirm that all downstream tasks (
images.yml
,registry.yml
, etc.) now referencemor_ocp_release_data
instead of the oldocp_release_data
.roles/mirror_ocp_release/tasks/artifacts.yml (3)
10-15
: Create isolated workspace per run
Adding atempfile
directory scoped to each invocation is crucial for thread-safe extraction. Thewhen
clause correctly gates creation on cache misses or forced runs.
34-45
: Extract metadata to the shared temp workspace
The metadata extraction aligns with the installer step, ensuring all artifacts land in the same temp folder before publishing.
95-98
: Cleanup temporary workspace reliably
Thealways
block ensures the temp directory is removed regardless of success or failure, preventing disk-space leaks.
register: _mor_downloaded | ||
until: _mor_downloaded is not failed | ||
when: | ||
- mor_force or not target.stat.exists |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Undefined target
variable in fetch condition causing runtime errors.
The when
clause references target.stat.exists
, but the stat task and target
registration were removed, leading to an undefined variable. You should reintroduce a stat task before this fetch or update the condition to reference an existing variable.
Example fix:
+ - name: "Check if target file exists"
+ ansible.builtin.stat:
+ path: "{{ mor_dir }}/{{ mor_uri | basename }}"
+ register: target
- name: "Fetch file from URL"
ansible.builtin.get_url:
url: "{{ mor_uri }}"
dest: "{{ mor_dir }}"
@@
- when:
- - mor_force or not target.stat.exists
+ when:
+ - mor_force or not target.stat.exists
📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
register: _mor_downloaded | |
until: _mor_downloaded is not failed | |
when: | |
- mor_force or not target.stat.exists | |
- name: "Check if target file exists" | |
ansible.builtin.stat: | |
path: "{{ mor_dir }}/{{ mor_uri | basename }}" | |
register: target | |
- name: "Fetch file from URL" | |
ansible.builtin.get_url: | |
url: "{{ mor_uri }}" | |
dest: "{{ mor_dir }}" | |
register: _mor_downloaded | |
until: _mor_downloaded is not failed | |
when: | |
- mor_force or not target.stat.exists |
🤖 Prompt for AI Agents
In roles/mirror_ocp_release/tasks/fetch.yml around lines 15 to 18, the when
condition references target.stat.exists, but the target variable is undefined
because the stat task and its registration were removed. To fix this, add a stat
task before this fetch task to check the existence of the target file or
directory and register it as target, or modify the when condition to use an
existing variable that indicates the target's presence. Ensure the variable used
in the when clause is properly defined to avoid runtime errors.
SUMMARY
Fixes CILAB-2034: when multiple jobs run to mirror the same version on the same host, the status of the mirroring directory may be unstable depending on the order the tasks are run on each job.
This change implements temporary directories per role call so the artifacts are only moved to the final location at the end of the execution.
ISSUE TYPE
Tests
TestBos2Sno: sno sno:components=ocp=4.18.5,
Summary by CodeRabbit
Refactor
Bug Fixes