-
Notifications
You must be signed in to change notification settings - Fork 1.1k
store_in(MemoryType::Stack) should use alloca if the size is small #6289
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
and add a test case that illustrates why this matters.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for doing this!
src/CodeGen_Posix.cpp
Outdated
Value *size_zero = ConstantInt::get(size_type, 0); | ||
Value *alloca_size = builder->CreateSelect(returned_null, llvm_size, size_zero); | ||
// Allocate it. It's zero most of the time. | ||
Value *stack_ptr = builder->CreateAlloca(i8_t->getPointerTo(), alloca_size); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is alloca(0) guaranteed to be a no-op? The docs say: https://llvm.org/docs/LangRef.html#id203
Allocating zero bytes is legal, but the returned pointer may not be unique.
So it's allowed to be a no-op, but not sure if that will always be the case...?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
From that explanation. I would expect it it to return the current stack pointer without doing anything else. Whether that is a nop or not is debatable, but I think that serves the purpose here. I gather the worry is it may instead allocate the smallest size or something?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think maybe we should just go ahead with this and assume that it is a no-op. I can't imagine any target doing something else.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Putting the alloca behind a branch instead was cheaper anyway in the common case of not reallocating.
src/CodeGen_Internal.cpp
Outdated
@@ -262,6 +262,7 @@ bool function_takes_user_context(const std::string &name) { | |||
|
|||
bool can_allocation_fit_on_stack(int64_t size) { | |||
user_assert(size > 0) << "Allocation size should be a positive number\n"; | |||
// Should match the threshold defined in runtime/pseudostack.cpp |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it would be worthwhile to create a runtime/constants.h
that's just a bunch of #define
s or constexpr
s or something that records constants that need to be kept consistent between the compiler and the runtime.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agreed, but can you think of any others to put in there? I've been hunting and coming up blank.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe the target features enum belongs there? There are also a few places we include all of HalideRuntime.h for a single macro (e.g. HALIDE_ALWAYS_INLINE
). I still think it's worthwhile even for one constant, though, since it's a path towards enforcing this and future constant consistency in the code rather than a reminder in a comment.
Lots of failures, apparently |
builds are clean. ptaal |
// of total stack used per Func, we bail and start making heap | ||
// allocations instead. | ||
|
||
// The following would use 200 mb of stack if we just kept |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
200mb of stack ought to be enough for anybody
If an allocation for a Func marked as store_in MemoryType::Stack is just made once (e.g. because it's compute_at just inside a parallel loop), then we currently just call malloc and there's no win from placing it on the stack.
This PR changes the behavior of MemoryType::Stack to call alloca instead when the size is small. This helps a lot in the single-use case.
It's the same pseudostack logic as in master. It allocates on first use (even if inside a loop), and reallocates only if the required size grows. Stack is never freed. If the cumulative stack usage for all reallocations for one Func exceeds our can_fit_on_stack threshold, we switch to heap for all future reallocations of this Func.
The following three runtimes are from a test of the single-use case where in the first case we statically know the allocation size, so we can just size the original stack frame appropriately. In the second case we use the new behavior in this PR and subtract from the stack pointer by a computed amount. In the third case we call malloc, as we would on master.
The runtimes are close together because allocating isn't the only thing this pipeline does. You should view the first of the three figures as the zero, which means the new behavior is about 2.5x faster than the old behavior when it comes to the cost of allocating.
Those numbers are from linux. On macos malloc is slower and the win is 7x.