Releases: LLNL/merlin
Releases · LLNL/merlin
Version 1.13.0b2
[1.13.0b2]
Added
- Ability to turn off the auto-restart functionality of the monitor with
--no-restart
- Tests for the monitor files
Changed
- Refactored the
main.py
module so that it's broken into smaller, more-manageable pieces
Version 1.13.0b1
[1.13.0b1]
Added
- API documentation for Merlin's core codebase
- New
merlin database
command to interact with new database functionality- When running locally, SQLite will be used as the database. Otherwise your current results backend will be used
merlin database info
: prints some basic information about the databasemerlin database get
: allows you to retrieve and print entries in the databasemerlin database delete
: allows you to delete entries in the database
- Added
db_scripts/
folder containing several new files all pertaining to database interactiondata_models
: a module that houses dataclasses that define the format of the data that's stored in Merlin's database.db_commands
: an interface for user commands ofmerlin database
to be processedmerlin_db
: houses theMerlinDatabase
class, used as the main point of contact for interactions with the databaseentities/
: A folder containing modules that define a structured interface for interacting with persisted data.entity_managers/
: A folder containing classes responsible for managing high-level database operations across all entities.
- Added
backends/
folder containing a new OOP way to interact with results backend databasesresults_backend
: houses an abstract classResultsBackend
that defines what every supported backend implement in Merlinredis/
: A folder containing theRedisBackend
class that defines specific interactions with the Redis databasesqlite/
: A folder containing theSQLiteBackend
class that defines specific interactions with the SQLite databasebackend_factory
: houses a factory classMerlinBackendFactory
that initializes an appropriateResultsBackend
instance
- Added
monitors/
folder containing a refactored, OOP approach to handling themerlin monitor
commandcelery_monitor
: houses theCeleryMonitor
class a concrete subclass ofTaskServerMonitor
for monitoring Celery task serversmonitor_factory
: houses a factory classMonitorFactory
that initializes an appropriateTaskServerMonitor
instancemonitor
: houses theMonitor
class, used as the top-level point of interaction for the monitor commandtask_server_monitor
: houses theTaskServerMonitor
ABC class, which serves as a common interface for monitoring task servers
- A new celery task called
mark_run_as_complete
that is automatically added to the task queue associated with the final step in a workflow - Added support for Python 3.12 and 3.13
- Added additional tests for the
merlin run
andmerlin purge
commands - Aliased types to represent different types of pytest fixtures
- New test condition
StepFinishedFilesCount
to help search forMERLIN_FINISHED
files in output workspaces - Added "Unit-tests" GitHub action to run the unit test suite
- Added
CeleryTaskManager
context manager to the test suite to ensure tasks are safely purged from queues if tests fail - Added
command-tests
,workflow-tests
, andintegration-tests
to the Makefile - Added tests and docs for the new
merlin config
options - Python 3.8 now requires
orderly-set==5.3.0
to avoid a bug with the deepdiff library - New step 'Reinstall pip to avoid vendored package corruption' to CI workflow jobs that use pip
- New GitHub actions to reduce common code in CI
- COPYRIGHT file for ownership details
- New check for copyright headers in the Makefile
Changed
- Updated the
merlin monitor
command- it will now attempt to restart workflows automatically if a workflow is hanging
- it utilizes an object oriented approach in the backend now
- Celery's default settings have been updated to add:
interval_max: 300
-> tasks will retry for up to 5 minutes instead of 1 minute like it previously was- new
broker_transport_options
:socket_timeout: 300
-> increases the socket timeout to 5 minutes instead of the default 2 minutesretry_policy: {timeout: 600}
-> sets the maximum amount of time that Celery will keep trying to connect to the broker to 10 minutes
broker_connection_timeout: 60
-> establishing a connection to the broker will not timeout for an entire minute now instead of the previous 4 seconds- new generic backend settings:
result_backend_always_retry: True
-> backend will now auto-retry on the event of recoverable exceptionsresult_backend_max_retries: 20
-> maximum number of retries in the event of recoverable exceptions
- new Redis specific settings:
redis_retry_on_timeout: True
-> retries read/write operations on TimeoutError to the Redis serverredis_socket_connect_timeout: 300
-> 5 minute socket timeout for connections to Redisredis_socket_timeout: 300
-> 5 minute socket timeout for read/write operations to Redisredis_socket_keepalive: True
-> socket TCP keepalive to keep connections healthy to the Redis server
- The
merlin config
command:- Now defaults to the LaunchIT setup
- No longer required to have configuration named
app.yaml
- New subcommands:
create
: Creates a new configuration fileupdate-broker
: Updates thebroker
section of the configuration fileupdate-backend
: Updates theresults_backend
section of the configuration fileuse
: Point your active configuration to a new configuration file
- The
merlin server
command no longer modifies the~/.merlin/app.yaml
file by default. Instead, it modifies the./merlin_server/app.yaml
file. - Dropped support for Python 3.7
- Ported all distributed tests of the integration test suite to pytest
- There is now a
commands/
directory and aworkflows/
directory under the integration suite to house these tests - Removed the "Distributed-tests" GitHub action as these tests will now be run under "Integration-tests"
- There is now a
- Removed
e2e-distributed*
definitions from the Makefile - Modified GitHub CI to use shared testing servers hosted by LaunchIT rather than the jackalope server
- CI to use new actions
- Copyright headers in all files
- These now point to the LICENSE and COPYRIGHT files
- LICENSE: Legal permissions (e.g., MIT terms)
- COPYRIGHT: Ownership, institutional metadata
- Make commands that change version/copyright year have been modified
Fixed
- Running Merlin locally no longer requires an
app.yaml
configuration file - Removed dead lgtm link
- Potential security vulnerabilities related to logging
Deprecated
- The
--steps
argument of themerlin monitor
command is now deprecated and will be removed in Version 1.14.0.
Version 1.12.2
[1.12.2]
Added
- Conflict handler option to the
dict_deep_merge
function inutils.py
- Ability to add module-specific pytest fixtures
- Added fixtures specifically for testing status functionality
- Added tests for reading and writing status files, and status conflict handling
- Added tests for the
dict_deep_merge
function - Pytest-mock as a dependency for the test suite (necessary for using mocks and fixtures in the same test)
- New github action test to make sure target branch has been merged into the source first, so we know histories are ok
- Check in the status commands to make sure we're not pulling statuses from nested workspaces
- Added
setuptools
as a requirement for python 3.12 to recognize thepkg_resources
library - Patch to celery results backend to stop ChordErrors being raised and breaking workflows when a single task fails
- New step return code
$(MERLIN_RAISE_ERROR)
to force an error to be raised by a task (mainly for testing)- Added description of this to docs
- New test to ensure a single failed task won't break a workflow
- Several new unit tests for the following subdirectories:
merlin/common/
merlin/config/
merlin/examples/
merlin/server/
- Context managers for the
conftest.py
file to ensure safe spin up and shutdown of fixturesRedisServerManager
: context to help with starting/stopping a redis server for testsCeleryWorkersManager
: context to help with starting/stopping workers for tests
- Ability to copy and print the
Config
object frommerlin/config/__init__.py
- Equality method to the
ContainerFormatConfig
andContainerConfig
objects frommerlin/server/server_util.py
Changed
merlin info
is cleaner and gives python package info- merlin version now prints with every banner message
- Applying filters for
merlin detailed-status
will now log debug statements instead of warnings - Modified the unit tests for the
merlin status
command to use pytest rather than unittest - Added fixtures for
merlin status
tests that copy the workspace to a temporary directory so you can see exactly what's run in a test - Batch block and workers now allow for variables to be used in node settings
- Task id is now the path to the directory
- Split the
start_server
andconfig_server
functions ofmerlin/server/server_commands.py
into multiple functions to make testing easier - Split the
create_server_config
function ofmerlin/server/server_config.py
into two functions to make testing easier - Combined <
8000
code>set_snapshot_seconds and
set_snapshot_changes
methods ofRedisConfig
into one methodset_snapshot
Fixed
- Bugfix for output of
merlin example openfoam_wf_singularity
- A bug with the CHANGELOG detection test when the target branch isn't in the ci runner history
- Link to Merlin banner in readme
- Issue with escape sequences in ascii art (caught by python 3.12)
- Bug where Flux wasn't identifying total number of nodes on an allocation
- Not supporting Flux versions below 0.17.0
Version 1.12.2b1
[1.12.2b1]
Added
- Conflict handler option to the
dict_deep_merge
function inutils.py
- Ability to add module-specific pytest fixtures
- Added fixtures specifically for testing status functionality
- Added tests for reading and writing status files, and status conflict handling
- Added tests for the
dict_deep_merge
function - Pytest-mock as a dependency for the test suite (necessary for using mocks and fixtures in the same test)
- New github action test to make sure target branch has been merged into the source first, so we know histories are ok
- Check in the status commands to make sure we're not pulling statuses from nested workspaces
- Added
setuptools
as a requirement for python 3.12 to recognize thepkg_resources
library - Patch to celery results backend to stop ChordErrors being raised and breaking workflows when a single task fails
- New step return code
$(MERLIN_RAISE_ERROR)
to force an error to be raised by a task (mainly for testing)- Added description of this to docs
- New test to ensure a single failed task won't break a workflow
Changed
merlin info
is cleaner and gives python package info- merlin version now prints with every banner message
- Applying filters for
merlin detailed-status
will now log debug statements instead of warnings - Modified the unit tests for the
merlin status
command to use pytest rather than unittest - Added fixtures for
merlin status
tests that copy the workspace to a temporary directory so you can see exactly what's run in a test - Batch block and workers now allow for variables to be used in node settings
- Task id is now the path to the directory
Fixed
- Bugfix for output of
merlin example openfoam_wf_singularity
- A bug with the CHANGELOG detection test when the target branch isn't in the ci runner history
- Link to Merlin banner in readme
- Issue with escape sequences in ascii art (caught by python 3.12)
- Bug where Flux wasn't identifying total number of nodes on an allocation
- Not supporting Flux versions below 0.17.0
Version 1.12.1
[1.12.1]
Added
- New Priority.RETRY value for the Celery task priorities. This will be the new highest priority.
- Support for the status command to handle multiple workers on the same step
- Documentation on how to run cross-node workflows with a containerized server (
merlin server
)
Changed
- Modified some tests in
test_status.py
andtest_detailed_status.py
to accommodate bugfixes for the status commands
Fixed
- Bugfixes for the status commands:
- Fixed "DRY RUN" naming convention so that it outputs in the progress bar properly
- Fixed issue where a step that was run with one sample would delete the status file upon condensing
- Fixed issue where multiple workers processing the same step would break the status file and cause the workflow to crash
- Added a catch for the JSONDecodeError that would potentially crash a run
- Added a FileLock to the status write in
_update_status_file()
ofMerlinStepRecord
to avoid potential race conditions (potentially related to JSONDecodeError above) - Added in
export MANPAGER="less -r"
call behind the scenes fordetailed-status
to fix ASCII error
Version 1.12.0
[1.12.0]
Added
- A new command
merlin queue-info
that will print the status of your celery queues- By default this will only pull information from active queues
- There are options to look for specific queues (
--specific-queues
), queues defined in certain spec files (--spec
; this is the same functionality as themerlin status
command prior to this update), and queues attached to certain steps (--steps
) - Queue info can be dumped to outfiles with
--dump
- A new command
merlin detailed-status
that displays task-by-task status information about your study- This has options to filter by return code, task queues, task statuses, and workers
- You can set a limit on the number of tasks to display
- There are 3 options to modify the output display
- Docs for all of the monitoring commands
- New file
merlin/study/status.py
dedicated to work relating to the status command- Contains the Status and DetailedStatus classes
- New file
merlin/study/status_renderers.py
dedicated to formatting the output for the detailed-status command - New file
merlin/common/dumper.py
containing a Dumper object to help dump output to outfiles - Study name and parameter info now stored in the DAG and MerlinStep objects
- Added functions to
merlin/display.py
that help display status information:display_task_by_task_status
handles the display for themerlin detailed-status
commanddisplay_status_summary
handles the display for themerlin status
commanddisplay_progress_bar
generates and displays a progress bar
- Added new methods to the MerlinSpec class:
- get_worker_step_map()
- get_queue_step_relationship()
- get_tasks_per_step()
- get_step_param_map()
- Added methods to the MerlinStepRecord class to mark status changes for tasks as they run (follows Maestro's StepRecord format mostly)
- Added methods to the Step class:
- establish_params()
- name_no_params()
- Added a property paramater_labels to the MerlinStudy class
- Added two new utility functions:
- dict_deep_merge() that deep merges two dicts into one
- ws_time_to_dt() that converts a workspace timestring (YYYYMMDD-HHMMSS) to a datetime object
- A new celery task
condense_status_files
to be called when sets of samples finish - Added a celery config setting
worker_cancel_long_running_tasks_on_connection_loss
since this functionality is about to change in the next version of celery - Tests for the Status and DetailedStatus classes
- this required adding a decent amount of test files to help with the tests; these can be found under the tests/unit/study/status_test_files directory
- Pytest fixtures in the
conftest.py
file of the integration test suite- NOTE: an export command
export LC_ALL='C'
had to be added to fix a bug in the WEAVE CI. This can be removed when we resolve this issue for themerlin server
command
- NOTE: an export command
- Tests for the
celeryadapter.py
module - New CeleryTestWorkersManager context to help with starting/stopping workers for tests
Changed
- Reformatted the entire
merlin status
command- Now accepts both spec files and workspace directories as arguments
- Removed the --steps flag
- Replaced the --csv flag with the --dump flag
- New functionality:
- Shows step_by_step progress bar for tasks
- Displays a summary of task statuses below the progress bar
- Split the
add_chains_to_chord
function inmerlin/common/tasks.py
into two functions:get_1d_chain
which converts a 2D list of chains into a 1D listlaunch_chain
which launches the 1D chain
- Pulled the needs_merlin_expansion() method out of the Step class and made it a function instead
- Removed
tabulate_info
function; replaced with tabulate from the tabulate library - Moved
verify_filepath
andverify_dirpath
frommerlin/main.py
tomerlin/utils.py
- The entire documentation has been ported to MkDocs and re-organized
- Dark Mode
- New "Getting Started" example for a simple setup tutorial
- More detail on configuration instructions
- There's now a full page on installation instructions
- More detail on explaining the spec file
- More detail with the CLI page
- New "Running Studies" page to explain different ways to run studies, restart them, and accomplish command line substitution
- New "Interpreting Output" page to help users understand how the output workspace is generated in more detail
- New "Examples" page has been added
- Updated "FAQ" page to include more links to helpful locations throughout the documentation
- Set up a place to store API docs
- New "Contact" page with info on reaching Merlin devs
- The Merlin tutorial defaults to using Singularity rather than Docker for the OpenFoam example. Minor tutorial fixes have also been applied.
Fixed
- The
merlin status
command so that it's consistent in its output whether using redis or rabbitmq as the broker - The
merlin monitor
command will now keep an allocation up if the queues are empty and workers are still processing tasks - Add the restart keyword to the specification docs
- Cyclical imports and config imports that could easily cause ci issues
Version 1.11.1
[1.11.1]
Fixed
- Typo in
batch.py
that caused lsf launches to fail (ALL_SGPUS
changed toALL_GPUS
)
Version 1.11.0
[1.11.0]
Added
- New reserved variable:
VLAUNCHER
: The same functionality as theLAUNCHER
variable, but will substitute shell variablesMERLIN_NODES
,MERLIN_PROCS
,MERLIN_CORES
, andMERLIN_GPUS
for nodes, procs, cores per task, and gpus
Changed
- Hardcoded Sphinx v5.3.0 requirement is now removed so we can use latest Sphinx
Fixed
- A bug where the filenames in iterative workflows kept appending
.out
,.partial
, or.expanded
to the filenames stored in themerlin_info/
subdirectory - A bug where a skewed sample hierarchy was created when a restart was necessary in the
add_merlin_expanded_chain_to_chord
task
Version 1.10.3
[1.10.3]
Added
- The *.conf regex for the recursive-include of the merlin server directory so that pip will add it to the wheel
- A note to the docs for how to fix an issue where the
merlin server start
command hangs
Changed
- Bump certifi from 2022.12.7 to 2023.7.22 in /docs
- Bump pygments from 2.13.0 to 2.15.0 in /docs
- Bump requests from 2.28.1 to 2.31.0 in /docs
Version 1.10.2
[1.10.2]
Fixed
- A bug where the .orig, .partial, and .expanded file names were using the study name rather than the original file name
- A bug where the openfoam_wf_singularity example was not being found
- Some build warnings in the docs (unknown targets, duplicate targets, title underlines too short, etc.)
- A bug where when the output path contained a variable that was overridden, the overridden variable was not changed in the output_path
- A bug where permission denied errors happened when checking for system scheduler
Added
- Tests for ensuring
$(MERLIN_SPEC_ORIGINAL_TEMPLATE)
,$(MERLIN_SPEC_ARCHIVED_COPY)
, and$(MERLIN_SPEC_EXECUTED_RUN)
are stored correctly - A pdf download format for the docs
- Tests for cli substitutions
Changed
- The ProvenanceYAMLFileHasRegex condition for integration tests now saves the study name and spec file name as attributes instead of just the study name
- This lead to minor changes in 3 tests ("local override feature demo", "local pgen feature demo", and "remote feature demo") with what we pass to this specific condition
- Updated scikit-learn requirement for the openfoam_wf_singularity example
- Uncommented Latex support in the docs configuration to get pdf builds working