-
Notifications
You must be signed in to change notification settings - Fork 35
Uplift third_party/tt-metal to 41ce500767a364f66034f6924837dabc133e8d4d 2025-06-04 #3669
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Codecov ReportAll modified and coverable lines are covered by tests ✅
Additional details and impacted files@@ Coverage Diff @@
## main #3669 +/- ##
==========================================
- Coverage 72.65% 72.56% -0.09%
==========================================
Files 211 211
Lines 28952 28978 +26
==========================================
- Hits 21035 21028 -7
- Misses 7917 7950 +33 ☔ View full report in Codecov by Sentry. |
I'm pretty sure that emitc is not going to compile also. This would require that each user which links against _ttnn.so should manually link python for no apparent reason. Are these dependencies coming from ttnn pybind? |
you're right, I'm getting the error from emitc unittests as well in local now after trying to run regtests. |
I'm pretty sure standalone is going to give the same error. There are some instructions on how standalone should be built (ttnn-standalone.md). Adding @svuckovicTT for context around emitc. |
Brings 98 metal commits git log --oneline 3c4aedc9ff67e3a247cd2f30a13c5e525214a51e..41ce500767a364f66034f6924837dabc133e8d4d
41ce500767 #22205: Implement on host data preparation for ND sharding for future perf improvements (#22847)
44d76e744b FDKernel updates for running dispatch on fabric (#22838)
f6837d46c0 Added a reshard test to dm tests that uses existing reshard tests to check perf (#22826)
13e51419df #22829: Update check for erisc app to only run on eth cores that are not running cooperative mode (#22883)
2455717d59 Add sharding support for RM tensors in ttnn.copy (#22792)
3a99e8bff1 #21564: convert some TT_ASSERT statements to TT_FATAL statements (#22869)
6a75624662 Remove old implementation of mistral7b (#22688)
ac1aeeef72 Re-attempt: Add ring + 4 links support to the llama_reduce_scatter (#22827)
4e68ecd814 #0: [skip ci] Why am I dumb? I'm not sure - but clearly vanilla != profiler (#22886)
d6e5c653a8 #22258: Remove tracking of per-device tensor spec (#22763)
fd106d8dba Make CPM_SOURCE_CACHE respect env var. (#22854)
149e50176a [skip ci] Remove duplicate includes (#22761)
f9415ce497 fix the rs fusion program cache issue by updating qkv buffer addresses. (#22855)
d774e34713 Support running the Fabric Mux on idle eth cores (#22486)
aeb53a57c0 [DM]: Move get_noc_addr calls outside of transaction loop (#22420)
8610c085e2 [skip ci] Remove dead code that breaks things (#22881)
f021bf9941 [skip ci] Unity builds for TT-NN sublibs (#22877)
3a255e7ac9 #22020: (Part 1) Add full sharding support for untilize - single core (#22735)
e8dc08634e [Fabric] Updates for handling client connections on TG gateways (#22844)
1f116d2873 Add tensor support for start and end arguments in ttnn.slice (#20393)
c6eb2768a4 Enable weights double buffering on height sharded conv2d (#22791)
c7005e8557 Check for Duplicate Cores - Issue #6754 (#20907)
6629a0da51 Clean up of height sharded conv2d weights kernel reader (#22793)
a38d750a00 Move all profiling dependencies to install_dependencies.sh (#22726)
88e82763a0 Add 6U perf targets for Llama70b decode (#22615)
51d9b1df3d Ring optimizations (#20607)
d498249e58 ttnn round op fix (#22302)
14a750e20c #19062: Make the output of slice sharded in Conv2D DRAM (#22585)
c7a1213ef7 #19609: Fix eltwise backward ops (#22135)
972ab2c8c1 Add statically allocated region on DRAM for the profiler (#22733)
38be15600f #21824: Add uint16 support for eqz and nez (#22207)
55d10fda2a #22460 Model test tests/nightly/single_card/yolov8s_world/test_yolov8… (#22849)
95d90700db #22259: Multi-host aware serialization format for Tensors (#22749)
a0e56ef081 Revert "Revert "Modify active links from UMD for partial cluster" (#22675)" (#22795)
e166f593d5 [tt-train] Profile_no_op implementation (#22788)
7c065e0fe3 #22781: Fixes to support uneven ND sharding (#22782)
214fed58c5 #20712: Update external cable check to support N300/T3K and add report to health check (#22830)
16549d9d52 #21738: [skip ci] Use the unpacked version of weights for Llama BH tests, rather than HF_HOME (#22602)
b5b57d1587 Allow submitting binaries and other kernel data to separate addresses in traces. (#22370)
dec990cb2f #19531: add extra perf info to sweeps message (#22824)
740b4e5907 #22056: [skip ci] Refactor out latest image publishing to a separate action (to be used in build-docker-artifact as well later) and use it in upstream tests. Give option to force publish latest (#22737)
0b3d85fb45 Clean up build-artifact.yaml and add packages: write permissions when calling it (#22798)
53455a0d10 Revert "Add ring + 4 links support to the llama_reduce_scatter" (#22825)
c3c0bb8d0f Add ring + 4 links support to the llama_reduce_scatter (#22457)
8a6dcc91d3 #0: Cleanup public API (#22770)
15c72f744a [DM]: Enhancements for Core Locations test (#22646)
87a66f34ab #20712: Report tray and N-id for UBBs and add option to specify minimum number of links to check to system health check (#22744)
6f3d5d4db6 Revert "[skip ci] Enhance build-artifact.yaml" (#22797)
15163cf252 Revert "Add zero padding for matmul ops (#22333)" (#22747)
3c6ee52094 #21054: Fix Conv2D when input dtype != output dtype (#22605)
f7e34ba2fa #21826: Migrate tanhshrink as device op (#22304)
4c4d1b49e9 UMD bump (#22755)
616fa2c906 #22666: Expose ND sharding to Python (#22724)
60caa90632 Int32 support for binary logical or, xor (#22668)
2a5617e5e2 #22660: Update MeshTrace region size when traces are released (#22662)
2d138d7ee1 [DM] Implemented Directed Ideal Test Cases for the One To One, One From One, and One To All Data Movement Tests (#22700)
12529d2960 [tt-train] MPI UFLM fixes for modern ubuntu (#22645)
57462f4315 Fix misplaced const (delete dead code) (#22777)
14ff0b24fd #22739 No circular buffer with id exists in Program (#22762)
0026600ba3 [skip ci] Enhance build-artifact.yaml (#22769)
482946233c #21846: Throw when zero dimension on shard shape encountered (#22764)
c990008ce7 #22651: Add DRAM support for ND sharding (#22669)
77e8a0297d Remove global torch imports in ttnn (#22477)
ca167eaf8d Lint all headers (#22719)
59a1250ee5 Revert "Embedding perf test (#22681)" (#22748)
32af58813a #22307 binary ng fix for updating sharded tensor cb dynamic address (#22715)
10552adc45 Improved how we provide the << operator for Layout (#22619)
e37877ed9d #22146: BH ttnn unit failure in test for sort fix (#22340)
99914e3ac4 Check for null device in wait_for_fabric_router_sync (#22537)
9e8861d0b5 #0: Remove single device bindings from core.py, serialization code (#22717)
c1b4454eb7 Embedding perf test (#22681)
c8ae5d6295 #21732: [skip ci] Add prefill + decode llama demo to WH 6U upstream tests (#22692)
61faf99a05 Update perf margins in Llama ops (#22730)
314f8b6fed SDXL: optimised conv configs (#22727)
acd7665e10 Llama70B - Update tsu range for 6U in Galaxy quick (#22728)
8c7219a163 #0: Enable deallocate activation in conv2d dram (#22721)
d14dc3b8d8 #13973: Add BFLOAT8_B dtype check for logaddexp2_bw (#22604)
f739251516 SDXL: VAE conv fix (#22712)
1c53c238a5 #18200:Add missing template arguments to unpack_tilizeA_B_block and correct number of input rows to tilize reduce (#22642)
80ab79488e Delete outdated dependencies section from TT-NN doc (#22707)
d56e49e245 #22696: Add benchmarking tests for ND ShardedAccessor (#22699)
89fed2dd55 [skip ci] Actually build each API header on its own (#22704)
6b51044d03 [tt-train] add theta to llama 3 yaml (#22540)
4d844a4a91 Create a microbenchmark with 100 ttnop to experiment with gathering (#22408)
812666db74 [skip ci] Allow usage of system sfpi for firmware build (#22713)
cf7575259a [tt-train] Cross entropy backward pass (#21703)
2185b5f6f2 [skip ci] Do not attempt to load tt-fabric tests. Presumed static object badness inside. (#22470)
27e970a8ed [skip ci] Push TensixTestSubDeviceAllocations out of Smoke and into Basic (#22468)
d4034512ba #0: [skip ci] Add blackhole demos to package and release (#22701)
eba6adfb4f Make BH demo job names consistent with rest of post-commit (#22694)
9dbf84f623 Validate "DEVICE KERNEL FIRST TO LAST START" in llama model perf tests (#22552)
7803dbb1e7 Reduce number of tests in MaxPool2D nightly (#19004)
ff411c0812 BH fabric. Take care of 2 risc cores in 1 eth core (#22171)
68e4263e75 #21299: Fix slice_write channels%16!=0 (#22472)
4289aa0d30 Fix DRAM address offset bug in direct core read/write operations (#22603)
5382d2bbc0 Llama TG: fix page table padding (#22683)
5188355817 Raw ethernet ubench. Add information on summary row (#22658)
db61ab4185 #0: reland "remove path reserve arg from all mcast api" (#22569) FE CI runs: |
This PR uplifts the third_party/tt-metal to the 41ce500767a364f66034f6924837dabc133e8d4d