Tags: fusionshen/pai
Tags
Initial HivedScheduler (microsoft#3283) Authors: @zhypku Scheduling Algorithm Core @mzmssg Scheduling Algorithm Config Parser @yqwang-ms User Experience/Interface, Scheduling Framework/Architecture, K8S Integration, and others
Release v0.13.0 New Features ------------ * OpenPAI protocol: - Introduce [OpenPAI protocol](./docs/pai-job-protocol.yaml) and job submission v2 ([microsoft#2260](microsoft#2260)) - Add new job submission v2 plugin ([microsoft#2461](microsoft#2461)) * Web portal: - Add login page for guests ([microsoft#2544](microsoft#2544)) - Add user home page ([microsoft#2614](microsoft#2614)) - Job Status - My virtual clusters - Available GPU nodes (whole cluster) - My recent jobs  - Add new user management page ([microsoft#2726](microsoft#2726), [microsoft#2796](microsoft#2796)) - User Management UX refactoring with new layout and themes ([microsoft#2726](microsoft#2726), [microsoft#2796](microsoft#2796)) Improvements ------------ * OpenPAI protocol: - Update example jobs in marketplace v2 for OpenPAI protocol ([microsoft#2827](microsoft#2827)) * Web portal: - Refine styles in job pages ([microsoft#2829](microsoft#2829), [microsoft#2856](microsoft#2856), [microsoft#2858](microsoft#2858), [microsoft#2862](microsoft#2862)) - Refine alert message in job pages ([microsoft#2698](microsoft#2698)) - Reduce the build bundle size to improve webportal performance ([microsoft#2715](microsoft#2715)) * Rest server: - Add job v1 config to v2 converter ([microsoft#2756](microsoft#2756)) - Check default runtime before starting Docker ([microsoft#2754](microsoft#2754)) * Framework launcher: - Upgrade to Hadoop 2.9.0 ([microsoft#2704](microsoft#2704)) * Job exporter: - Change triggering rule for exporter hangs ([microsoft#2766](microsoft#2766)) - Add GPU temperature detection ([microsoft#2757](microsoft#2757)) * Watchdog: - Use `/api/v1/pods` to get all pods ([microsoft#2750](microsoft#2750)) * Deployement: - Allow user to use <kbd>Backspace</kbd> in `paictl` input ([microsoft#2769](microsoft#2769)) - Disable InfiniBand driver installation by default ([microsoft#2595](microsoft#2595)) Documentation ------------- * Refine document of VS Code extension ([microsoft#2707](microsoft#2707)) * Add document for PAI storage ([microsoft#2822](microsoft#2822)) * OpenPAI protocol specification document ([microsoft#2260](microsoft#2260)) * Job submission v2 plugin document ([microsoft#2820](microsoft#2820)) * Update RESTful API document for API v2 ([microsoft#2816](microsoft#2816)) * Fix typos in document ([microsoft#2818](microsoft#2818)) Bug Fixes --------- * Web portal: - Fix text broken when create or edit user ([microsoft#2849](microsoft#2849)) - Fix token authentication bug ([microsoft#2843](microsoft#2843)) - Fix retry count's margin-top ([microsoft#2845](microsoft#2845)) - Fix job clone bug ([microsoft#2836](microsoft#2836)) - Fix home page's responsive layout ([microsoft#2805](microsoft#2805)) - Fix job list page filter bug ([microsoft#2787](microsoft#2787)) - Fix home page failed to load virtual cluster list bug ([microsoft#2774](microsoft#2774)) * Rest server: - Check duplicate job in submission v2 ([microsoft#2837](microsoft#2837)) * Hadoop: - Increase YARN kill container timeout ([microsoft#2778](microsoft#2778)) - Remove cross origin in resource manager ([microsoft#2758](microsoft#2758)) - Fix Haddoop AI matching nvidia-smi regex ([microsoft#2681](microsoft#2681)) Known Issues ------------ * Deployments issues on NVIDIA DGX2 ([microsoft#2742](microsoft#2742))
Release v0.12.0 New Features ------------ * Web portal: - Display error message in job detail page [microsoft#2456](microsoft#2456) - Import users from CSV file directly and show the final results [microsoft#2495](microsoft#2495) - Add TotalGpuCount and TotalTaskCount into job list [microsoft#2499](microsoft#2499) * Deployment - Add cluster version info [microsoft#2528](microsoft#2528) - Check if the nodes are ubuntu 16.04 [microsoft#2520](microsoft#2520) - Check duplicate hostname [microsoft#2403](microsoft#2403) Improvements ------------ * Web portal: - Replace the suffix if a cloned job is resubmited [microsoft#2451](microsoft#2451) - Refine view full log [microsoft#2431](microsoft#2431) - Job list: optimize filter [microsoft#2444](microsoft#2444) - Replace the url module with the querystring module [microsoft#1825](microsoft#1825) * REST server: - Follow REST protocol in job create controller [microsoft#2481](microsoft#2481) - Add task state; Add job's retry details; Refine job config [microsoft#2306](microsoft#2306) - Remove error message [microsoft#2464](microsoft#2464) * Framework Launcher: - Add more info into SummarizedFrameworkInfo [microsoft#2435](microsoft#2435) * Alert manager: - Send resolved email and make user can config repeat interval [microsoft#2438](microsoft#2438) - Monitor process memory consumption and alert for `omiagent` and `omsagent` [microsoft#2419](microsoft#2419) Documentation ------------- - Doc refactoring and update hello-world sample [microsoft#2445](microsoft#2445) - Add Chinese translation [microsoft#2344](microsoft#2344) Bug Fixes --------- * Web portal: - Add validation when submitting job by json [microsoft#2375](microsoft#2375) - Job List-filter UI fix [microsoft#2479](microsoft#2479) - Fix job detail "jobConfig is null" bug [microsoft#2500](microsoft#2500) - Fix job detail page's "retry link" [microsoft#2478](microsoft#2478) - Fix job v2 detail page rendering error [microsoft#2480](microsoft#2480) * REST server: - code_dir_size report incorrect error message [microsoft#2388](microsoft#2388) - fix script entrypoint [microsoft#2522](microsoft#2522) - Fixed jq invocation errors with numeric taskRoles [microsoft#2405](microsoft#2405) * Hadoop: - Remove duplicate diagnostics [microsoft#2527](microsoft#2527) * Alart manager: - Fix alert label error [microsoft#2521](microsoft#2521) * Drivers: - Add an optional configuration to skip ib drivers installation. [microsoft#2514](microsoft#2514) - Fix delete script of rollback nvidia runtime [microsoft#2370](microsoft#2370) - Fix driver parse [microsoft#2458](microsoft#2458) * Storage plugin - Add environment and handle corner cases [microsoft#2525](microsoft#2525) Known Issues ------------ N/A Upgrading from Earlier Release ------------------------------ Please follow the [Upgrading to v0.12](./docs/upgrade/upgrade_to_v0.12.md) for detailed instructions.
[WIP] 0.11.0 release notes (microsoft#2410) * add release note * refine layout * refine layout * refine layout * replace images * Revert "replace images" This reverts commit 361afac. * replace images * typo * Update RELEASE_NOTES.md * Update RELEASE_NOTES.md * Update RELEASE_NOTES.md * Update RELEASE_NOTES.md * fix * remove tail # * remove trailing space * add know issue * add retry link missing * add nfs wiki link
WIP: Draft a v0.10.1 release note (microsoft#2156) * Draft a v0.10.0 release note * version * fix H1 * Contrib packages * Reverse list in order to merge time * Fix * Update RELEASE_NOTES.md * Update RELEASE_NOTES.md * Update RELEASE_NOTES.md * Update RELEASE_NOTES.md * Update RELEASE_NOTES.md * Update RELEASE_NOTES.md * Update RELEASE_NOTES.md * Update RELEASE_NOTES.md * Update RELEASE_NOTES.md * Update RELEASE_NOTES.md * Update RELEASE_NOTES.md * Update RELEASE_NOTES.md * Move upgrading instructions to separately doc
v0.9.1: Feb. 2019 Release Bug Fixes: * REST Server: Fix admin permission, Closes microsoft#2172
[VS Code] Readme update (microsoft#2108) * readme update * update screenshot picture
PreviousNext