8000 Auto-detect bf16 support for CUDA by tiran · Pull Request #993 · instructlab/instructlab · GitHub
[go: up one dir, main page]
More Web Proxy on the site http://driver.im/
Skip to content

Auto-detect bf16 support for CUDA #993

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Draft
wants to merge 4 commits into
base: main
Choose a base branch
from
Draft

Conversation

Copy link
Contributor
@tiran tiran commented Apr 25, 2024

Changes

Which issue is resolved by this Pull Request:
See #647

Description of your changes:

bf16 (bfloat16) is not available on older CUDA versions < 11.0 as well as devices with CUDA support level < 8.0. linux_train now detects and reports bf16 support. Training on CUDA falls back to fp16 (half precision float).


also closes #1006

@tiran
Copy link
Contributor Author
tiran commented May 2, 2024

@Mergifyio rebase

Copy link
Contributor
mergify bot commented May 2, 2024

rebase

❌ Unable to rebase: user tiran is unknown.

Please make sure tiran has logged in Mergify dashboard.

@tiran
Copy link
Contributor Author
tiran commented May 2, 2024

@Mergifyio rebase

Copy link
Contributor
mergify bot commented May 2, 2024

rebase

❌ Base branch update has failed

tiran does not have write access to the forked repository.

@tiran tiran force-pushed the cuda_bf16 branch 7 times, most recently from d646d59 to ecc1a38 Compare May 6, 2024 04:28
Copy link
Contributor
mergify bot commented May 6, 2024

This pull request has merge conflicts that must be resolved before it can be
merged. @tiran please rebase it. https://docs.github.com/en/pull-requests/collaborating-with-pull-requests/working-with-forks/syncing-a-fork

@mergify mergify bot added needs-rebase This Pull Request needs to be rebased and removed needs-rebase This Pull Request needs to be rebased labels May 6, 2024
Copy link
Contributor
mergify bot commented May 7, 2024

This pull request has merge conflicts that must be resolved before it can be
merged. @tiran please rebase it. https://docs.github.com/en/pull-requests/collaborating-with-pull-requests/working-with-forks/syncing-a-fork

@mergify mergify bot removed the needs-rebase This Pull Request needs to be rebased label May 7, 2024
@tiran tiran force-pushed the cuda_bf16 branch 2 times, most recently from 8e3d568 to 7999e89 Compare May 7, 2024 16:41
@mergify mergify bot added the testing Relates to testing label May 7, 2024
@tiran tiran force-pushed the cuda_bf16 branch 2 times, most recently from 5f01310 to 0519a1d Compare May 7, 2024 18:05
@leseb
Copy link
Contributor
leseb commented May 23, 2024

@tiran what's the status on this? Thanks!

@tiran
Copy link
Contributor Author
tiran commented May 23, 2024

@leseb I have rebased the PR. Let's see if tests are now passing.

Copy link
Contributor
@leseb leseb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm currently putting this to the test, I'll report my result shortly.

Copy link
Contributor
@leseb leseb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Kindly request to edit the newly added CHANGELOG.md file, since this looks like a nice improvement for CPU-only system. Thanks!

@leseb
Copy link
Contributor
leseb commented Jun 5, 2024

Sharing my unimpressive results here, my machine:

>>> torch.backends.cpu.get_cpu_capability()
'AVX2'

(venv) [leseb@tarox~/cli][main] ilab sysinfo
instructlab.version: 0.16.1.dev60
sys.version: 3.10.7 (main, Sep  7 2022, 00:00:00) [GCC 11.3.1 20220421 (Red Hat 11.3.1-2)]
sys.platform: linux
os.name: posix
platform.release: 5.19.16-100.fc35.x86_64
platform.machine: x86_64
os-release.ID: fedora
os-release.VERSION_ID: 35
os-release.PRETTY_NAME: Fedora Linux 35 (Thirty Five)
torch.version: 2.3.0+cu121
torch.backends.cpu.capability: AVX2
torch.version.cuda: 12.1
torch.version.hip: None
torch.cuda.available: False
torch.backends.cuda.is_built: True
torch.backends.mps.is_built: False
torch.backends.mps.is_available: False
llama_cpp_python.version: 0.2.75
llama_cpp_python.supports_gpu_offload: False

Output of lshw:

(venv) [leseb@tarox~/cli][main] sudo lshw
tarox
    description: Desktop Computer
    product: Small Desktop (SFF) (SKU)
    vendor: TAROX
    version: 082016
    serial: 1526767
    width: 64 bits
    capabilities: smbios-3.0.0 dmi-3.0.0 smp vsyscall32
    configuration: boot=normal chassis=desktop family=To be filled by O.E.M. sku=SKU uuid=609C45B0-BA72-E311-A505-3497F69AB602
  *-core
       description: Motherboard
       product: Z170M-PLUS
       vendor: ASUSTeK COMPUTER INC.
       physical id: 0
       version: Rev X.0x
       serial: 160470422200617
       slot: Default string
     *-firmware
          description: BIOS
          vendor: American Megatrends Inc.
          physical id: 0
          version: 0704
          date: 02/18/2016
          size: 64KiB
          capacity: 16MiB
          capabilities: pci apm upgrade shadowing cdboot bootselect socketedrom edd int13floppy1200 int13floppy720 int13floppy2880 int5printscreen int9keyboard int14serial int17printer acpi usb biosbootspecification uefi
     *-cache:0
          description: L1 cache
          physical id: 40
          slot: L1 Cache
          size: 128KiB
          capacity: 128KiB
          capabilities: synchronous internal write-back data
          configuration: level=1
     *-cache:1
          description: L1 cache
          physical id: 41
          slot: L1 Cache
          size: 128KiB
          capacity: 128KiB
          capabilities: synchronous internal write-back instruction
          configuration: level=1
     *-cache:2
          description: L2 cache
          physical id: 42
          slot: L2 Cache
          size: 1MiB
          capacity: 1MiB
          capabilities: synchronous internal write-back unified
          configuration: level=2
     *-cache:3
          description: L3 cache
          physical id: 43
          slot: L3 Cache
          size: 8MiB
          capacity: 8MiB
          capabilities: synchronous internal write-back unified
          configuration: level=3
     *-cpu
          description: CPU
          product: Intel(R) Core(TM) i7-6700K CPU @ 4.00GHz
          vendor: Intel Corp.
          physical id: 44
          bus info: cpu@0
          version: Intel(R) Core(TM) i7-6700K CPU @ 4.00GHz
          serial: To Be Filled By O.E.M.
          slot: LGA1151
          size: 3900MHz
          capacity: 4200MHz
          width: 64 bits
          clock: 100MHz
          capabilities: lm fpu fpu_exception wp vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx pdpe1gb rdtscp x86-64 constant_tsc art arch_perfmon pebs bts rep_good nopl xtopology nonstop_tsc cpuid aperfmperf pni pclmulqdq dtes64 monitor ds_cpl vmx est tm2 ssse3 sdbg fma cx16 xtpr pdcm pcid sse4_1 sse4_2 x2apic movbe popcnt aes xsave avx f16c rdrand lahf_lm abm 3dnowprefetch cpuid_fault invpcid_single pti tpr_shadow vnmi flexpriority ept vpid ept_ad fsgsbase tsc_adjust bmi1 hle avx2 smep bmi2 erms invpcid rtm mpx rdseed adx smap clflushopt intel_pt xsaveopt xsavec xgetbv1 xsaves dtherm ida arat pln pts hwp hwp_notify hwp_act_window hwp_epp cpufreq
          configuration: cores=4 enabledcores=4 threads=8
     *-memory
          description: System Memory
          physical id: 45
          slot: System board or motherboard
          size: 64GiB
        *-bank:0
             description: DIMM DDR4 Synchronous 2133 MHz (0.5 ns)
             product: 16ATF2G64AZ-2G1B1
             vendor: Micron
             physical id: 0
             serial: 18905267
             slot: DIMM_A1
             size: 16GiB
             width: 64 bits
             clock: 2133MHz (0.5ns)
        *-bank:1
             description: DIMM DDR4 Synchronous 2133 MHz (0.5 ns)
             product: 16ATF2G64AZ-2G1B1
             vendor: Micron
             physical id: 1
             serial: 18905271
             slot: DIMM_A2
             size: 16GiB
             width: 64 bits
             clock: 2133MHz (0.5ns)
        *-bank:2
             description: DIMM DDR4 Synchronous 2133 MHz (0.5 ns)
             product: 16ATF2G64AZ-2G1B1
             vendor: Micron
             physical id: 2
             serial: 18905269
             slot: DIMM_B1
             size: 16GiB
             width: 64 bits
             clock: 2133MHz (0.5ns)
        *-bank:3
             description: DIMM DDR4 Synchronous 2133 MHz (0.5 ns)
             product: 16ATF2G64AZ-2G1B1
             vendor: Micron
             physical id: 3
             serial: 18905270
             slot: DIMM_B2
             size: 16GiB
             width: 64 bits
             clock: 2133MHz (0.5ns)
     *-pci
          description: Host bridge
          product: Xeon E3-1200 v5/E3-1500 v5/6th Gen Core Processor Host Bridge/DRAM Registers
          vendor: Intel Corporation
          physical id: 100
          bus info: pci@0000:00:00.0
          version: 07
          width: 32 bits
          clock: 33MHz
          configuration: driver=skl_uncore
          resources: irq:0
        *-display
             description: VGA compatible controller
             product: HD Graphics 530
             vendor: Intel Corporation
             physical id: 2
             bus info: pci@0000:00:02.0
             version: 06
             width: 64 bits
             clock: 33MHz
             capabilities: pciexpress msi pm vga_controller bus_master cap_list rom
             configuration: driver=i915 latency=0
             resources: irq:125 memory:f6000000-f6ffffff memory:e0000000-efffffff ioport:f000(size=64) memory:c0000-dffff
        *-usb
             description: USB controller
             product: 100 Series/C230 Series Chipset Family USB 3.0 xHCI Controller
             vendor: Intel Corporation
             physical id: 14
             bus info: pci@0000:00:14.0
             version: 31
             width: 64 bits
             clock: 33MHz
             capabilities: pm msi xhci bus_master cap_list
             configuration: driver=xhci_hcd latency=0
             resources: irq:124 memory:f7030000-f703ffff
           *-usbhost:0
                product: xHCI Host Controller
                vendor: Linux 5.19.16-100.fc35.x86_64 xhci-hcd
                physical id: 0
                bus info: usb@1
                logical name: usb1
                version: 5.19
                capabilities: usb-2.00
                configuration: driver=hub slots=16 speed=480Mbit/s
           *-usbhost:1
                product: xHCI Host Controller
                vendor: Linux 5.19.16-100.fc35.x86_64 xhci-hcd
                physical id: 1
                bus info: usb@2
                logical name: usb2
                version: 5.19
                capabilities: usb-3.00
                configuration: driver=hub slots=10 speed=5000Mbit/s
        *-communication
             description: Communication controller
             product: 100 Series/C230 Series Chipset Family MEI Controller #1
             vendor: Intel Corporation
             physical id: 16
             bus info: pci@0000:00:16.0
             version: 31
             width: 64 bits
             clock: 33MHz
             capabilities: pm msi bus_master cap_list
             configuration: driver=mei_me latency=0
             resources: irq:126 memory:f704d000-f704dfff
        *-sata
             description: SATA controller
             product: Q170/Q150/B150/H170/H110/Z170/CM236 Chipset SATA Controller [AHCI Mode]
             vendor: Intel Corporation
             physical id: 17
             bus info: pci@0000:00:17.0
             logical name: scsi0
             version: 31
             width: 32 bits
             clock: 66MHz
             capabilities: sata msi pm ahci_1.0 bus_master cap_list emulated
             configuration: driver=ahci latency=0
             resources: irq:123 memory:f7048000-f7049fff memory:f704c000-f704c0ff ioport:f090(size=8) ioport:f080(size=4) ioport:f060(size=32) memory:f704b000-f704b7ff
           *-disk
                description: ATA Disk
                product: SanDisk SD8SN8U5
                physical id: 0.0.0
                bus info: scsi@0:0.0.0
                logical name: /dev/sda
                version: 0000
                serial: 162024802552
                size: 476GiB (512GB)
                capabilities: gpt-1.00 partitioned partitioned:gpt
                configuration: ansiversion=5 guid=4279a938-b23a-4258-8601-d93e4fda1e1a logicalsectorsize=512 sectorsize=512
              *-volume:0 UNCLAIMED
                   description: Windows FAT volume
                   vendor: mkfs.fat
                   physical id: 1
                   bus info: scsi@0:0.0.0,1
                   version: FAT16
                   serial: 9766-f794
                   size: 198MiB
                   capacity: 199MiB
                   capabilities: boot fat initialized
                   configuration: FATs=2 filesystem=fat name=EFI System Partition
              *-volume:1
                   description: EXT4 volume
                   vendor: Linux
                   physical id: 2
                   bus info: scsi@0:0.0.0,2
                   logical name: /dev/sda2
                   logical name: /boot
                   version: 1.0
                   serial: e95e916b-8cb5-4928-b948-012d01ba5d19
                   size: 1GiB
                   capabilities: journaled extended_attributes large_files huge_files dir_nlink recover 64bit extents ext4 ext2 initialized
                   configuration: created=2017-12-18 11:32:55 filesystem=ext4 lastmountpoint=/boot modified=2024-04-26 14:18:02 mount.fstype=ext4 mount.options=rw,seclabel,relatime mounted=2024-04-26 14:18:02 state=mounted
              *-volume:2
                   description: LVM Physical Volume
                   vendor: Linux
                   physical id: 3
                   bus info: scsi@0:0.0.0,3
                   logical name: /dev/sda3
                   serial: pQZ93q-MZqT-hABD-XEGb-yJAT-AN14-dnvlDz
                   size: 475GiB
                   capabilities: multi lvm2
        *-pci:0
             description: PCI bridge
             product: 100 Series/C230 Series Chipset Family PCI Express Root Port #17
             vendor: Intel Corporation
             physical id: 1b
             bus info: pci@0000:00:1b.0
             version: f1
             width: 32 bits
             clock: 33MHz
             capabilities: pci pciexpress msi pm normal_decode bus_master cap_list
             configuration: driver=pcieport
             resources: irq:120
        *-pci:1
             description: PCI bridge
             product: 100 Series/C230 Series Chipset Family PCI Express Root Port #1
             vendor: Intel Corporation
             physical id: 1c
             bus info: pci@0000:00:1c.0
             version: f1
             width: 32 bits
             clock: 33MHz
             capabilities: pci pciexpress msi pm normal_decode bus_master cap_list
             configuration: driver=pcieport
             resources: irq:121
        *-pci:2
             description: PCI bridge
             product: 100 Series/C230 Series Chipset Family PCI Express Root Port #9
             vendor: Intel Corporation
             physical id: 1d
             bus info: pci@0000:00:1d.0
             version: f1
             width: 32 bits
             clock: 33MHz
             capabilities: pci pciexpress msi pm normal_decode bus_master cap_list
             configuration: driver=pcieport
             resources: irq:122
        *-isa
             description: ISA bridge
             product: Z170 Chipset LPC/eSPI Controller
             vendor: Intel Corporation
             physical id: 1f
             bus info: pci@0000:00:1f.0
             version: 31
             width: 32 bits
             clock: 33MHz
             capabilities: isa bus_master
             configuration: latency=0
        *-memory UNCLAIMED
             description: Memory controller
             product: 100 Series/C230 Series Chipset Family Power Management Controller
             vendor: Intel Corporation
             physical id: 1f.2
             bus info: pci@0000:00:1f.2
             version: 31
             width: 32 bits
             clock: 33MHz (30.3ns)
             capabilities: bus_master
             configuration: latency=0
             resources: memory:f7044000-f7047fff
        *-multimedia
             description: Audio device
             product: 100 Series/C230 Series Chipset Family HD Audio Controller
             vendor: Intel Corporation
             physical id: 1f.3
             bus info: pci@0000:00:1f.3
             version: 31
             width: 64 bits
             clock: 33MHz
             capabilities: pm msi bus_master cap_list
             configuration: driver=snd_hda_intel latency=32
             resources: irq:128 memory:f7040000-f7043fff memory:f7020000-f702ffff
        *-serial
             description: SMBus
             product: 100 Series/C230 Series Chipset Family SMBus
             vendor: Intel Corporation
             physical id: 1f.4
             bus info: pci@0000:00:1f.4
             version: 31
             width: 64 bits
             clock: 33MHz
             configuration: driver=i801_smbus latency=0
             resources: irq:16 memory:f704a000-f704a0ff ioport:f040(size=32)
        *-network
             description: Ethernet interface
             product: Ethernet Connection (2) I219-V
             vendor: Intel Corporation
             physical id: 1f.6
             bus info: pci@0000:00:1f.6
             logical name: enp0s31f6
             version: 31
             serial: 34:97:f6:9a:b6:02
             size: 1Gbit/s
             capacity: 1Gbit/s
             width: 32 bits
             clock: 33MHz
             capabilities: pm msi bus_master cap_list ethernet physical tp 10bt 10bt-fd 100bt 100bt-fd 1000bt-fd autonegotiation
             configuration: autonegotiation=on broadcast=yes driver=e1000e driverversion=5.19.16-100.fc35.x86_64 duplex=full firmware=0.8-4 ip=192.168.1.192 latency=0 link=yes multicast=yes port=twisted pair speed=1Gbit/s
             resources: irq:127 memory:f7000000-f701ffff
     *-pnp00:00
          product: PnP device PNP0c02
          physical id: 1
          capabilities: pnp
          configuration: driver=system
     *-pnp00:01
          product: PnP device PNP0400
          physical id: 2
          capabilities: pnp
          configuration: driver=parport_pc
     *-pnp00:02
          product: PnP device PNP0501
          physical id: 3
          capabilities: pnp
          configuration: driver=serial
     *-pnp00:03
          product: PnP device PNP0c02
          physical id: 4
          capabilities: pnp
          configuration: driver=system
     *-pnp00:04
          product: PnP device PNP0c02
          physical id: 5
          capabilities: pnp
          configuration: driver=system
     *-pnp00:05
          product: PnP device PNP0b00
          physical id: 6
          capabilities: pnp
          configuration: driver=rtc_cmos
     *-pnp00:06
          product: PnP device INT3f0d
          vendor: Interphase Corporation
          physical id: 7
          capabilities: pnp
          configuration: driver=system
     *-pnp00:07
          product: PnP device PNP0c02
          physical id: 8
          capabilities: pnp
          configuration: driver=system
     *-pnp00:08
          product: PnP device PNP0c02
          physical id: 9
          capabilities: pnp
          configuration: driver=system
     *-pnp00:09
          product: PnP device PNP0c02
          physical id: a
          capabilities: pnp
          configuration: driver=system
     *-pnp00:0a
          product: PnP device PNP0c02
          physical id: b
          capabilities: pnp
          configuration: driver=system
  *-power UNCLAIMED
       description: To Be Filled By O.E.M.
       product: To Be Filled By O.E.M.
       vendor: To Be Filled By O.E.M.
       physical id: 1
       version: To Be Filled By O.E.M.
       serial: To Be Filled By O.E.M.
       capacity: 32768mWh

Without the patch:

LINUX_TRAIN.PY: SANITY CHECKING THE BASE MODEL
 29%|██████████████████████████████████▋                                                                                   | 5/17 [1:08:15<2:32:16, 761.38s/it]^C
Aborted!
 29%|██████████████████████████████████▋                                                                                   | 5/17 [1:13:08<2:55:31, 877.63s/it]
Command exited with non-zero status 1
10659.64user 133.46system 1:28:49elapsed 202%CPU (0avgtext+0avgdata 15378148maxresident)k8inputs+28296112outputs (67major+4854057minor)pagefaults 0swaps

I stopped at 29% after 1h28min.

With this patch:

LINUX_TRAIN.PY: SANITY CHECKING THE BASE MODEL
 18%|█████████████████████▏                                                                                                  | 3/17 [47:09<3:20:29, 859.28s/it]
 29%|██████████████████████████████████▋                                                                                   | 5/17 [1:15:00<2:48:16, 841.38s/it]^C^C
Aborted!
 29%|██████████████████████████████████▋                                                                                   | 5/17 [1:15:40<3:01:37, 908.17s/it]
Command exited with non-zero status 1
10520.19user 101.56system 1:28:01elapsed 201%CPU (0avgtext+0avgdata 15396308maxresident)k0inputs+2608outputs (64major+4539971minor)pagefaults 0swaps

I stopped at 29% after 1h28min.

So the results were identical on my machine.

My bench.sh script:

#!/usr/bin/env -S bash -ex

taxonomy() {
    test -d taxonomy || git clone https://github.com/instructlab/taxonomy || true

    mkdir -p taxonomy/knowledge/sports/overview/softball

    cp /home/leseb/cli/scripts/test-data/basic-workflow-fixture-qna.yaml taxonomy/knowledge/sports/overview/softball/qna.yaml
    head taxonomy/knowledge/sports/overview/softball/qna.yaml | grep --color '1st base'

    ilab diff
}
pushd ..
[ -f config.yaml ] || ilab init --non-interactive
taxonomy
ilab download
ilab generate --num-instructions 5
ilab train
popd

I run it with like so /usr/bin/time .idea/bench.sh.

On a test system with 64 GB RAM, this memory calculation came out as
62, not 64. Check for 60 instead of 64.

Obviously this is not very scientific as we're making very rough
assumptions about what is required. It would be better to enhance the
code further to actually calculate a memory requirement based on the
model instead just hard coding a rough guess.

Signed-off-by: Russell Bryant <rbryant@redhat.com>
@mergify mergify bot added the ci-failure PR has at least one CI failure label Jun 5, 2024
@russellb
Copy link
Member
russellb commented Jun 5, 2024

I spoke with @leseb on Slack and we determine that the memory check came out to 62 on his 64 GB system, so I've changed the rough check in the code to now be < 60 instead of < 64. I'd like to see if that now gets him a boost, as his system should work with dtype=None (using float32).

@leseb
Copy link
Contributor
leseb commented Jun 6, 2024

Here are the results I've been waiting for :), the same system as commented in #993 (comment):

Previously it took 1h28min to barely reach 29% of the training, now the whole training took 1h19min:

LINUX_TRAIN.PY: TRAINING
{'train_runtime': 79.7499, 'train_samples_per_second': 0.075, 'train_steps_per_second': 0.075, 'train_loss': 1.6997551918029785, 'epoch': 1.0}
100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 6/6 [01:19<00:00, 13.29s/it]

torch_dtype = "auto" if device.type == "cuda" else None
if device.type == "cpu":
total_memory = psutil.virtual_memory().total / (1024**3)
if total_memory < 60:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
if total_memory < 60:
if total_memory < 62:

A system with 64GB of RAM, will report:

>>> import psutil
>>> mem = psutil.virtual_memory()
>>> mem
svmem(total=67228049408, available=31099351040, percent=53.7, used=35383861248, free=468701184, active=27983499264, inactive=37159084032, buffers=1079336960, cached=30296150016, shared=2109440, slab=1340628992)

And we have. 67228049408 Bytes converted to GiB gives us 67228049408 / 1024 ** 3 gives us 62.6 GiB

# There's more going on here and needs deeper exploration to find
# the right parameters to be checking for choosing the best
# configuration.
# Anecdotally, 64 GB seems to be enough, but this calculation
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A system with 64GB of RAM will report ~62.6 GiB so we base our calculation on 62.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since it's such a rough guess, 60 still seems fine? We need to actually do some math at some point ...

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll share my math in a few :) stay tuned!

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Screenshot 2024-06-07 at 14 23 04

Some more numbers:

  • The training part take ~30GB of RAM to process, there is a very small chance that this could work on very minimal Linux installation, by minimal I mean, only system critical services run and nothing else.
  • The inference part takes ~35GB of RAM

Essentially a system with 48GB of RAM should be able to run both training and inferencing. Although 48 GB of RAM is not very common.

@tiran tiran marked this pull request as draft July 9, 2024 08:41
@mergify mergify bot added the needs-rebase This Pull Request needs to be rebased label Jul 9, 2024
Copy link
Contributor
mergify bot commented Jul 9, 2024

This pull request has merge conflicts that must be resolved before it can be
merged. @tiran please rebase it. https://docs.github.com/en/pull-requests/collaborating-with-pull-requests/working-with-forks/syncing-a-fork

Copy link
github-actions bot commented Oct 8, 2024

This pull request has been automatically marked as stale because it has not had activity within 90 days. It will be automatically closed if no further activity occurs within 30 days.

@github-actions github-actions bot added the stale label Oct 8, 2024
@mergify mergify bot added the dependencies Relates to dependencies label Oct 8, 2024
@github-actions github-actions bot removed the stale label Oct 9, 2024
@mergify mergify bot removed the needs-rebase This Pull Request needs to be rebased label Jan 6, 2025
Copy link
Contributor
mergify bot commented Jan 6, 2025

This pull request has merge conflicts that must be resolved before it can be
merged. @tiran please rebase it. https://docs.github.com/en/pull-requests/collaborating-with-pull-requests/working-with-forks/syncing-a-fork

@mergify mergify bot added the needs-rebase This Pull Request needs to be rebased label Jan 6, 2025
@mergify mergify bot removed the needs-rebase This Pull Request needs to be rebased label Feb 16, 2025
Copy link
Contributor
mergify bot commented Feb 16, 2025

This pull request has merge conflicts that must be resolved before it can be merged. @tiran please rebase it. https://docs.github.com/en/pull-requests/collaborating-with-pull-requests/working-with-forks/syncing-a-fork

@mergify mergify bot added the needs-rebase This Pull Request needs to be rebased label Feb 16, 2025
@courtneypacheco
Copy link
Contributor

Hi @tiran! Are you still working on this PR? We're looking to do some housekeeping and close out stale PRs, including drafts.

If we don't hear back within 7 days, we will close this PR, but please know that you are more than welcome to reopen it if you'd like! Thank you!

@mergify mergify bot removed the needs-rebase This Pull Request needs to be rebased label Mar 27, 2025
Copy link
Contributor
mergify bot commented Mar 27, 2025

This pull request has merge conflicts that must be resolved before it can be merged. @tiran please rebase it. https://docs.github.com/en/pull-requests/collaborating-with-pull-requests/working-with-forks/syncing-a-fork

@mergify mergify bot added the needs-rebase This Pull Request needs to be rebased label Mar 27, 2025
@mergify mergify bot removed the needs-rebase This Pull Request needs to be rebased label Apr 28, 2025
Copy link
Contributor
mergify bot commented Apr 28, 2025

This pull request has merge conflicts that must be resolved before it can be merged. @tiran please rebase it. https://docs.github.com/en/pull-requests/collaborating-with-pull-requests/working-with-forks/syncing-a-fork

@mergify mergify bot added the needs-rebase This Pull Request needs to be rebased label Apr 28, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
ci-failure PR has at least one CI failure dependencies Relates to dependencies needs-rebase This Pull Request needs to be rebased testing Relates to testing
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Optimize CPU training on Linux
6 participants
0