Auto-detect bf16 support for CUDA #993

tiran · 2024-04-25T05:35:29Z

Changes

Which issue is resolved by this Pull Request:
See #647

Description of your changes:

bf16 (bfloat16) is not available on older CUDA versions < 11.0 as well as devices with CUDA support level < 8.0. linux_train now detects and reports bf16 support. Training on CUDA falls back to fp16 (half precision float).

also closes #1006

tiran · 2024-05-02T13:52:45Z

@Mergifyio rebase

mergify · 2024-05-02T13:52:53Z

rebase

❌ Unable to rebase: user `tiran` is unknown.

Please make sure tiran has logged in Mergify dashboard.

tiran · 2024-05-02T13:54:49Z

@Mergifyio rebase

mergify · 2024-05-02T13:54:55Z

rebase

❌ Base branch update has failed

tiran does not have write access to the forked repository.

mergify · 2024-05-06T18:11:50Z

This pull request has merge conflicts that must be resolved before it can be
merged. @tiran please rebase it. https://docs.github.com/en/pull-requests/collaborating-with-pull-requests/working-with-forks/syncing-a-fork

mergify · 2024-05-07T12:21:15Z

This pull request has merge conflicts that must be resolved before it can be
merged. @tiran please rebase it. https://docs.github.com/en/pull-requests/collaborating-with-pull-requests/working-with-forks/syncing-a-fork

leseb · 2024-05-23T08:48:04Z

@tiran what's the status on this? Thanks!

tiran · 2024-05-23T11:55:18Z

@leseb I have rebased the PR. Let's see if tests are now passing.

leseb

I'm currently putting this to the test, I'll report my result shortly.

leseb

Kindly request to edit the newly added CHANGELOG.md file, since this looks like a nice improvement for CPU-only system. Thanks!

leseb · 2024-06-05T15:03:15Z

Sharing my unimpressive results here, my machine:

>>> torch.backends.cpu.get_cpu_capability()
'AVX2'

(venv) [leseb@tarox~/cli][main] ilab sysinfo
instructlab.version: 0.16.1.dev60
sys.version: 3.10.7 (main, Sep  7 2022, 00:00:00) [GCC 11.3.1 20220421 (Red Hat 11.3.1-2)]
sys.platform: linux
os.name: posix
platform.release: 5.19.16-100.fc35.x86_64
platform.machine: x86_64
os-release.ID: fedora
os-release.VERSION_ID: 35
os-release.PRETTY_NAME: Fedora Linux 35 (Thirty Five)
torch.version: 2.3.0+cu121
torch.backends.cpu.capability: AVX2
torch.version.cuda: 12.1
torch.version.hip: None
torch.cuda.available: False
torch.backends.cuda.is_built: True
torch.backends.mps.is_built: False
torch.backends.mps.is_available: False
llama_cpp_python.version: 0.2.75
llama_cpp_python.supports_gpu_offload: False

Output of lshw:

(venv) [leseb@tarox~/cli][main] sudo lshw
tarox
    description: Desktop Computer
    product: Small Desktop (SFF) (SKU)
    vendor: TAROX
    version: 082016
    serial: 1526767
    width: 64 bits
    capabilities: smbios-3.0.0 dmi-3.0.0 smp vsyscall32
    configuration: boot=normal chassis=desktop family=To be filled by O.E.M. sku=SKU uuid=609C45B0-BA72-E311-A505-3497F69AB602
  *-core
       description: Motherboard
       product: Z170M-PLUS
       vendor: ASUSTeK COMPUTER INC.
       physical id: 0
       version: Rev X.0x
       serial: 160470422200617
       slot: Default string
     *-firmware
          description: BIOS
          vendor: American Megatrends Inc.
          physical id: 0
          version: 0704
          date: 02/18/2016
          size: 64KiB
          capacity: 16MiB
          capabilities: pci apm upgrade shadowing cdboot bootselect socketedrom edd int13floppy1200 int13floppy720 int13floppy2880 int5printscreen int9keyboard int14serial int17printer acpi usb biosbootspecification uefi
     *-cache:0
          description: L1 cache
          physical id: 40
          slot: L1 Cache
          size: 128KiB
          capacity: 128KiB
          capabilities: synchronous internal write-back data
          configuration: level=1
     *-cache:1
          description: L1 cache
          physical id: 41
          slot: L1 Cache
          size: 128KiB
          capacity: 128KiB
          capabilities: synchronous internal write-back instruction
          configuration: level=1
     *-cache:2
          description: L2 cache
          physical id: 42
          slot: L2 Cache
          size: 1MiB
          capacity: 1MiB
          capabilities: synchronous internal write-back unified
          configuration: level=2
     *-cache:3
          description: L3 cache
          physical id: 43
          slot: L3 Cache
          size: 8MiB
          capacity: 8MiB
          capabilities: synchronous internal write-back unified
          configuration: level=3
     *-cpu
          description: CPU
          product: Intel(R) Core(TM) i7-6700K CPU @ 4.00GHz
          vendor: Intel Corp.
          physical id: 44
          bus info: cpu@0
          version: Intel(R) Core(TM) i7-6700K CPU @ 4.00GHz
          serial: To Be Filled By O.E.M.
          slot: LGA1151
          size: 3900MHz
          capacity: 4200MHz
          width: 64 bits
          clock: 100MHz
          capabilities: lm fpu fpu_exception wp vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx pdpe1gb rdtscp x86-64 constant_tsc art arch_perfmon pebs bts rep_good nopl xtopology nonstop_tsc cpuid aperfmperf pni pclmulqdq dtes64 monitor ds_cpl vmx est tm2 ssse3 sdbg fma cx16 xtpr pdcm pcid sse4_1 sse4_2 x2apic movbe popcnt aes xsave avx f16c rdrand lahf_lm abm 3dnowprefetch cpuid_fault invpcid_single pti tpr_shadow vnmi flexpriority ept vpid ept_ad fsgsbase tsc_adjust bmi1 hle avx2 smep bmi2 erms invpcid rtm mpx rdseed adx smap clflushopt intel_pt xsaveopt xsavec xgetbv1 xsaves dtherm ida arat pln pts hwp hwp_notify hwp_act_window hwp_epp cpufreq
          configuration: cores=4 enabledcores=4 threads=8
     *-memory
          description: System Memory
          physical id: 45
          slot: System board or motherboard
          size: 64GiB
        *-bank:0
             description: DIMM DDR4 Synchronous 2133 MHz (0.5 ns)
             product: 16ATF2G64AZ-2G1B1
             vendor: Micron
             physical id: 0
             serial: 18905267
             slot: DIMM_A1
             size: 16GiB
             width: 64 bits
             clock: 2133MHz (0.5ns)
        *-bank:1
             description: DIMM DDR4 Synchronous 2133 MHz (0.5 ns)
             product: 16ATF2G64AZ-2G1B1
             vendor: Micron
             physical id: 1
             serial: 18905271
             slot: DIMM_A2
             size: 16GiB
             width: 64 bits
             clock: 2133MHz (0.5ns)
        *-bank:2
             description: DIMM DDR4 Synchronous 2133 MHz (0.5 ns)
             product: 16ATF2G64AZ-2G1B1
             vendor: Micron
             physical id: 2
             serial: 18905269
             slot: DIMM_B1
             size: 16GiB
             width: 64 bits
             clock: 2133MHz (0.5ns)
        *-bank:3
             description: DIMM DDR4 Synchronous 2133 MHz (0.5 ns)
             product: 16ATF2G64AZ-2G1B1
             vendor: Micron
             physical id: 3
             serial: 18905270
             slot: DIMM_B2
             size: 16GiB
             width: 64 bits
             clock: 2133MHz (0.5ns)
     *-pci
          description: Host bridge
          product: Xeon E3-1200 v5/E3-1500 v5/6th Gen Core Processor Host Bridge/DRAM Registers
          vendor: Intel Corporation
          physical id: 100
          bus info: pci@0000:00:00.0
          version: 07
          width: 32 bits
          clock: 33MHz
          configuration: driver=skl_uncore
          resources: irq:0
        *-display
             description: VGA compatible controller
             product: HD Graphics 530
             vendor: Intel Corporation
             physical id: 2
             bus info: pci@0000:00:02.0
             version: 06
             width: 64 bits
             clock: 33MHz
             capabilities: pciexpress msi pm vga_controller bus_master cap_list rom
             configuration: driver=i915 latency=0
             resources: irq:125 memory:f6000000-f6ffffff memory:e0000000-efffffff ioport:f000(size=64) memory:c0000-dffff
        *-usb
             description: USB controller
             product: 100 Series/C230 Series Chipset Family USB 3.0 xHCI Controller
             vendor: Intel Corporation
             physical id: 14
             bus info: pci@0000:00:14.0
             version: 31
             width: 64 bits
             clock: 33MHz
             capabilities: pm msi xhci bus_master cap_list
             configuration: driver=xhci_hcd latency=0
             resources: irq:124 memory:f7030000-f703ffff
           *-usbhost:0
                product: xHCI Host Controller
                vendor: Linux 5.19.16-100.fc35.x86_64 xhci-hcd
                physical id: 0
                bus info: usb@1
                logical name: usb1
                version: 5.19
                capabilities: usb-2.00
                configuration: driver=hub slots=16 speed=480Mbit/s
           *-usbhost:1
                product: xHCI Host Controller
                vendor: Linux 5.19.16-100.fc35.x86_64 xhci-hcd
                physical id: 1
                bus info: usb@2
                logical name: usb2
                version: 5.19
                capabilities: usb-3.00
                configuration: driver=hub slots=10 speed=5000Mbit/s
        *-communication
             description: Communication controller
             product: 100 Series/C230 Series Chipset Family MEI Controller #1
             vendor: Intel Corporation
             physical id: 16
             bus info: pci@0000:00:16.0
             version: 31
             width: 64 bits
             clock: 33MHz
             capabilities: pm msi bus_master cap_list
             configuration: driver=mei_me latency=0
             resources: irq:126 memory:f704d000-f704dfff
        *-sata
             description: SATA controller
             product: Q170/Q150/B150/H170/H110/Z170/CM236 Chipset SATA Controller [AHCI Mode]
             vendor: Intel Corporation
             physical id: 17
             bus info: pci@0000:00:17.0
             logical name: scsi0
             version: 31
             width: 32 bits
             clock: 66MHz
             capabilities: sata msi pm ahci_1.0 bus_master cap_list emulated
             configuration: driver=ahci latency=0
             resources: irq:123 memory:f7048000-f7049fff memory:f704c000-f704c0ff ioport:f090(size=8) ioport:f080(size=4) ioport:f060(size=32) memory:f704b000-f704b7ff
           *-disk
                description: ATA Disk
                product: SanDisk SD8SN8U5
                physical id: 0.0.0
                bus info: scsi@0:0.0.0
                logical name: /dev/sda
                version: 0000
                serial: 162024802552
                size: 476GiB (512GB)
                capabilities: gpt-1.00 partitioned partitioned:gpt
                configuration: ansiversion=5 guid=4279a938-b23a-4258-8601-d93e4fda1e1a logicalsectorsize=512 sectorsize=512
              *-volume:0 UNCLAIMED
                   description: Windows FAT volume
                   vendor: mkfs.fat
                   physical id: 1
                   bus info: scsi@0:0.0.0,1
                   version: FAT16
                   serial: 9766-f794
                   size: 198MiB
                   capacity: 199MiB
                   capabilities: boot fat initialized
                   configuration: FATs=2 filesystem=fat name=EFI System Partition
              *-volume:1
                   description: EXT4 volume
                   vendor: Linux
                   physical id: 2
                   bus info: scsi@0:0.0.0,2
                   logical name: /dev/sda2
                   logical name: /boot
                   version: 1.0
                   serial: e95e916b-8cb5-4928-b948-012d01ba5d19
                   size: 1GiB
                   capabilities: journaled extended_attributes large_files huge_files dir_nlink recover 64bit extents ext4 ext2 initialized
                   configuration: created=2017-12-18 11:32:55 filesystem=ext4 lastmountpoint=/boot modified=2024-04-26 14:18:02 mount.fstype=ext4 mount.options=rw,seclabel,relatime mounted=2024-04-26 14:18:02 state=mounted
              *-volume:2
                   description: LVM Physical Volume
                   vendor: Linux
                   physical id: 3
                   bus info: scsi@0:0.0.0,3
                   logical name: /dev/sda3
                   serial: pQZ93q-MZqT-hABD-XEGb-yJAT-AN14-dnvlDz
                   size: 475GiB
                   capabilities: multi lvm2
        *-pci:0
             description: PCI bridge
             product: 100 Series/C230 Series Chipset Family PCI Express Root Port #17
             vendor: Intel Corporation
             physical id: 1b
             bus info: pci@0000:00:1b.0
             version: f1
             width: 32 bits
             clock: 33MHz
             capabilities: pci pciexpress msi pm normal_decode bus_master cap_list
             configuration: driver=pcieport
             resources: irq:120
        *-pci:1
             description: PCI bridge
             product: 100 Series/C230 Series Chipset Family PCI Express Root Port #1
             vendor: Intel Corporation
             physical id: 1c
             bus info: pci@0000:00:1c.0
             version: f1
             width: 32 bits
             clock: 33MHz
             capabilities: pci pciexpress msi pm normal_decode bus_master cap_list
             configuration: driver=pcieport
             resources: irq:121
        *-pci:2
             description: PCI bridge
             product: 100 Series/C230 Series Chipset Family PCI Express Root Port #9
             vendor: Intel Corporation
             physical id: 1d
             bus info: pci@0000:00:1d.0
             version: f1
             width: 32 bits
             clock: 33MHz
             capabilities: pci pciexpress msi pm normal_decode bus_master cap_list
             configuration: driver=pcieport
             resources: irq:122
        *-isa
             description: ISA bridge
             product: Z170 Chipset LPC/eSPI Controller
             vendor: Intel Corporation
             physical id: 1f
             bus info: pci@0000:00:1f.0
             version: 31
             width: 32 bits
             clock: 33MHz
             capabilities: isa bus_master
             configuration: latency=0
        *-memory UNCLAIMED
             description: Memory controller
             product: 100 Series/C230 Series Chipset Family Power Management Controller
             vendor: Intel Corporation
             physical id: 1f.2
             bus info: pci@0000:00:1f.2
             version: 31
             width: 32 bits
             clock: 33MHz (30.3ns)
             capabilities: bus_master
             configuration: latency=0
             resources: memory:f7044000-f7047fff
        *-multimedia
             description: Audio device
             product: 100 Series/C230 Series Chipset Family HD Audio Controller
             vendor: Intel Corporation
             physical id: 1f.3
             bus info: pci@0000:00:1f.3
             version: 31
             width: 64 bits
             clock: 33MHz
             capabilities: pm msi bus_master cap_list
             configuration: driver=snd_hda_intel latency=32
             resources: irq:128 memory:f7040000-f7043fff memory:f7020000-f702ffff
        *-serial
             description: SMBus
             product: 100 Series/C230 Series Chipset Family SMBus
             vendor: Intel Corporation
             physical id: 1f.4
             bus info: pci@0000:00:1f.4
             version: 31
             width: 64 bits
             clock: 33MHz
             configuration: driver=i801_smbus latency=0
             resources: irq:16 memory:f704a000-f704a0ff ioport:f040(size=32)
        *-network
             description: Ethernet interface
             product: Ethernet Connection (2) I219-V
             vendor: Intel Corporation
             physical id: 1f.6
             bus info: pci@0000:00:1f.6
             logical name: enp0s31f6
             version: 31
             serial: 34:97:f6:9a:b6:02
             size: 1Gbit/s
             capacity: 1Gbit/s
             width: 32 bits
             clock: 33MHz
             capabilities: pm msi bus_master cap_list ethernet physical tp 10bt 10bt-fd 100bt 100bt-fd 1000bt-fd autonegotiation
             configuration: autonegotiation=on broadcast=yes driver=e1000e driverversion=5.19.16-100.fc35.x86_64 duplex=full firmware=0.8-4 ip=192.168.1.192 latency=0 link=yes multicast=yes port=twisted pair speed=1Gbit/s
             resources: irq:127 memory:f7000000-f701ffff
     *-pnp00:00
          product: PnP device PNP0c02
          physical id: 1
          capabilities: pnp
          configuration: driver=system
     *-pnp00:01
          product: PnP device PNP0400
          physical id: 2
          capabilities: pnp
          configuration: driver=parport_pc
     *-pnp00:02
          product: PnP device PNP0501
          physical id: 3
          capabilities: pnp
          configuration: driver=serial
     *-pnp00:03
          product: PnP device PNP0c02
          physical id: 4
          capabilities: pnp
          configuration: driver=system
     *-pnp00:04
          product: PnP device PNP0c02
          physical id: 5
          capabilities: pnp
          configuration: driver=system
     *-pnp00:05
          product: PnP device PNP0b00
          physical id: 6
          capabilities: pnp
          configuration: driver=rtc_cmos
     *-pnp00:06
          product: PnP device INT3f0d
          vendor: Interphase Corporation
          physical id: 7
          capabilities: pnp
          configuration: driver=system
     *-pnp00:07
          product: PnP device PNP0c02
          physical id: 8
          capabilities: pnp
          configuration: driver=system
     *-pnp00:08
          product: PnP device PNP0c02
          physical id: 9
          capabilities: pnp
          configuration: driver=system
     *-pnp00:09
          product: PnP device PNP0c02
          physical id: a
          capabilities: pnp
          configuration: driver=system
     *-pnp00:0a
          product: PnP device PNP0c02
          physical id: b
          capabilities: pnp
          configuration: driver=system
  *-power UNCLAIMED
       description: To Be Filled By O.E.M.
       product: To Be Filled By O.E.M.
       vendor: To Be Filled By O.E.M.
       physical id: 1
       version: To Be Filled By O.E.M.
       serial: To Be Filled By O.E.M.
       capacity: 32768mWh

Without the patch:

LINUX_TRAIN.PY: SANITY CHECKING THE BASE MODEL
 29%|██████████████████████████████████▋                                                                                   | 5/17 [1:08:15<2:32:16, 761.38s/it]^C
Aborted!
 29%|██████████████████████████████████▋                                                                                   | 5/17 [1:13:08<2:55:31, 877.63s/it]
Command exited with non-zero status 1
10659.64user 133.46system 1:28:49elapsed 202%CPU (0avgtext+0avgdata 15378148maxresident)k8inputs+28296112outputs (67major+4854057minor)pagefaults 0swaps

I stopped at 29% after 1h28min.

With this patch:

LINUX_TRAIN.PY: SANITY CHECKING THE BASE MODEL
 18%|█████████████████████▏                                                                                                  | 3/17 [47:09<3:20:29, 859.28s/it]
 29%|██████████████████████████████████▋                                                                                   | 5/17 [1:15:00<2:48:16, 841.38s/it]^C^C
Aborted!
 29%|██████████████████████████████████▋                                                                                   | 5/17 [1:15:40<3:01:37, 908.17s/it]
Command exited with non-zero status 1
10520.19user 101.56system 1:28:01elapsed 201%CPU (0avgtext+0avgdata 15396308maxresident)k0inputs+2608outputs (64major+4539971minor)pagefaults 0swaps

I stopped at 29% after 1h28min.

So the results were identical on my machine.

My bench.sh script:

#!/usr/bin/env -S bash -ex

taxonomy() {
    test -d taxonomy || git clone https://github.com/instructlab/taxonomy || true

    mkdir -p taxonomy/knowledge/sports/overview/softball

    cp /home/leseb/cli/scripts/test-data/basic-workflow-fixture-qna.yaml taxonomy/knowledge/sports/overview/softball/qna.yaml
    head taxonomy/knowledge/sports/overview/softball/qna.yaml | grep --color '1st base'

    ilab diff
}
pushd ..
[ -f config.yaml ] || ilab init --non-interactive
taxonomy
ilab download
ilab generate --num-instructions 5
ilab train
popd

I run it with like so /usr/bin/time .idea/bench.sh.

On a test system with 64 GB RAM, this memory calculation came out as 62, not 64. Check for 60 instead of 64. Obviously this is not very scientific as we're making very rough assumptions about what is required. It would be better to enhance the code further to actually calculate a memory requirement based on the model instead just hard coding a rough guess. Signed-off-by: Russell Bryant <rbryant@redhat.com>

russellb · 2024-06-05T16:31:49Z

I spoke with @leseb on Slack and we determine that the memory check came out to 62 on his 64 GB system, so I've changed the rough check in the code to now be < 60 instead of < 64. I'd like to see if that now gets him a boost, as his system should work with dtype=None (using float32).

leseb · 2024-06-06T09:16:16Z

Here are the results I've been waiting for :), the same system as commented in #993 (comment):

Previously it took 1h28min to barely reach 29% of the training, now the whole training took 1h19min:

LINUX_TRAIN.PY: TRAINING
{'train_runtime': 79.7499, 'train_samples_per_second': 0.075, 'train_steps_per_second': 0.075, 'train_loss': 1.6997551918029785, 'epoch': 1.0}
100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 6/6 [01:19<00:00, 13.29s/it]

leseb · 2024-06-06T09:18:31Z

src/instructlab/train/linux_train.py

+    torch_dtype = "auto" if device.type == "cuda" else None
+    if device.type == "cpu":
+        total_memory = psutil.virtual_memory().total / (1024**3)
+        if total_memory < 60:


Suggested change

if total_memory < 60:

if total_memory < 62:

A system with 64GB of RAM, will report:

>>> import psutil >>> mem = psutil.virtual_memory() >>> mem svmem(total=67228049408, available=31099351040, percent=53.7, used=35383861248, free=468701184, active=27983499264, inactive=37159084032, buffers=1079336960, cached=30296150016, shared=2109440, slab=1340628992)

And we have. 67228049408 Bytes converted to GiB gives us 67228049408 / 1024 ** 3 gives us 62.6 GiB

leseb · 2024-06-06T09:23:38Z

src/instructlab/train/linux_train.py

-            # There's more going on here and needs deeper exploration to find
-            # the right parameters to be checking for choosing the best
-            # configuration.
+            # Anecdotally, 64 GB seems to be enough, but this calculation


A system with 64GB of RAM will report ~62.6 GiB so we base our calculation on 62.

Since it's such a rough guess, 60 still seems fine? We need to actually do some math at some point ...

I'll share my math in a few :) stay tuned!

Some more numbers:

The training part take ~30GB of RAM to process, there is a very small chance that this could work on very minimal Linux installation, by minimal I mean, only system critical services run and nothing else.

The inference part takes ~35GB of RAM

Essentially a system with 48GB of RAM should be able to run both training and inferencing. Although 48 GB of RAM is not very common.

mergify · 2024-07-09T08:41:42Z

This pull request has merge conflicts that must be resolved before it can be
merged. @tiran please rebase it. https://docs.github.com/en/pull-requests/collaborating-with-pull-requests/working-with-forks/syncing-a-fork

github-actions · 2024-10-08T02:03:05Z

This pull request has been automatically marked as stale because it has not had activity within 90 days. It will be automatically closed if no further activity occurs within 30 days.

mergify · 2025-01-06T15:46:30Z

This pull request has merge conflicts that must be resolved before it can be
merged. @tiran please rebase it. https://docs.github.com/en/pull-requests/collaborating-with-pull-requests/working-with-forks/syncing-a-fork

mergify · 2025-02-16T17:25:45Z

This pull request has merge conflicts that must be resolved before it can be merged. @tiran please rebase it. https://docs.github.com/en/pull-requests/collaborating-with-pull-requests/working-with-forks/syncing-a-fork

courtneypacheco · 2025-03-06T16:46:10Z

Hi @tiran! Are you still working on this PR? We're looking to do some housekeeping and close out stale PRs, including drafts.

If we don't hear back within 7 days, we will close this PR, but please know that you are more than welcome to reopen it if you'd like! Thank you!

mergify · 2025-03-27T14:27:20Z

This pull request has merge conflicts that must be resolved before it can be merged. @tiran please rebase it. https://docs.github.com/en/pull-requests/collaborating-with-pull-requests/working-with-forks/syncing-a-fork

mergify · 2025-04-28T21:39:41Z

This pull request has merge conflicts that must be resolved before it can be merged. @tiran please rebase it. https://docs.github.com/en/pull-requests/collaborating-with-pull-requests/working-with-forks/syncing-a-fork

tiran force-pushed the cuda_bf16 branch from a633376 to 7db0437 Compare April 25, 2024 06:46

This was referenced Apr 26, 2024

Training options only allow for well known/tested HW #1007

Closed

Allow arbitary trainging args to be overridden #1008

Closed

tiran force-pushed the cuda_bf16 branch 7 times, most recently from d646d59 to ecc1a38 Compare May 6, 2024 04:28

mergify bot added needs-rebase This Pull Request needs to be rebased and removed needs-rebase This Pull Request needs to be rebased labels May 6, 2024

tiran force-pushed the cuda_bf16 branch from ecc1a38 to 3ad61dd Compare May 7, 2024 15:58

mergify bot removed the needs-rebase This Pull Request needs to be rebased label May 7, 2024

tiran force-pushed the cuda_bf16 branch 2 times, most recently from 8e3d568 to 7999e89 Compare May 7, 2024 16:41

mergify bot added the testing Relates to testing label May 7, 2024

tiran force-pushed the cuda_bf16 branch 2 times, most recently from 5f01310 to 0519a1d Compare May 7, 2024 18:05

tiran mentioned this pull request May 14, 2024

train rework, introduce --backend and --dtype flags #1157

Closed

tiran force-pushed the cuda_bf16 branch from 0519a1d to 34f6440 Compare May 23, 2024 11:53

leseb reviewed Jun 5, 2024

View reviewed changes

leseb requested changes Jun 5, 2024

View reviewed changes

mergify bot added the ci-failure PR has at least one CI failure label Jun 5, 2024

leseb reviewed Jun 6, 2024

View reviewed changes

russellb mentioned this pull request Jul 2, 2024

Add CPU-Only Support to the training library instructlab/training#117

Closed

tiran marked this pull request as draft July 9, 2024 08:41

mergify bot added the needs-rebase This Pull Request needs to be rebased label Jul 9, 2024

github-actions bot added the stale label Oct 8, 2024

mergify bot added the dependencies Relates to dependencies label Oct 8, 2024

github-actions bot removed the stale label Oct 9, 2024

mergify bot removed the needs-rebase This Pull Request needs to be rebased label Jan 6, 2025

mergify bot added the needs-rebase This Pull Request needs to be rebased label Jan 6, 2025

mergify bot removed the needs-rebase This Pull Request needs to be rebased label Feb 16, 2025

mergify bot added the needs-rebase This Pull Request needs to be rebased label Feb 16, 2025

mergify bot removed the needs-rebase This Pull Request needs to be rebased label Mar 27, 2025

mergify bot added the needs-rebase This Pull Request needs to be rebased label Mar 27, 2025

mergify bot removed the needs-rebase This Pull Request needs to be rebased label Apr 28, 2025

mergify bot added the needs-rebase This Pull Request needs to be rebased label Apr 28, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Auto-detect bf16 support for CUDA #993

Auto-detect bf16 support for CUDA #993

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Auto-detect bf16 support for CUDA #993

Are you sure you want to change the base?

Auto-detect bf16 support for CUDA #993

Uh oh!

Conversation

Uh oh!

Changes

Uh oh!

Uh oh!

❌ Unable to rebase: user tiran is unknown.

Uh oh!

Uh oh!

❌ Base branch update has failed

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

❌ Unable to rebase: user `tiran` is unknown.