-
Notifications
You must be signed in to change notification settings - Fork 347
[LLHD] Update Deseq pass to work with process results #8403
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I haven't reviewed this in super detail, but it's better than what we currently have, so good to merge and iterate on.
I'm quite curious about how the DNF vs truth table approaches perform for various inputs parameterized by number of expression terms and number of primitives. Both are exponential, both can't handle processing of arbitrary/big enable conditions, etc. My guess would be that there is no significant difference in practice, but it'd be cool to see actual numbers 🤓
@@ -91,6 +91,7 @@ def FinalOp : LLHDOp<"final", [ | |||
def WaitOp : LLHDOp<"wait", [ | |||
AttrSizedOperandSegments, | |||
HasParent<"ProcessOp">, | |||
Pure, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What's the reasoning of making this op pure? It suspends execution, stores the current values of the block arguments, makes the process hold the values of the yields, etc. Which all sounds implicitly side-effecting to me.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think I added this because it simplified a check in deseq that ensures a process has no side-effects besides what's visible through its results. But you definitely raise a good point, the actual op sounds very side-effecting. I'll revert this and make the check in deseq a bit more careful.
// Technically also non-constants can be reset values. However, since | ||
// async resets are level sensitive (even though Verilog describes them as | ||
// edge sensitive), the reset value would have to be part of the wait op's | ||
// observed values. We don't check for that. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there a particular reason we don't check for that or is it just not implemented yet?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's just not implemented yet. In practice this doesn't really come up, since almost all flip-flop standard cells have a set or reset input that sets them to a constant 1 or 0, respectively. There are ways to cook up resets to a dynamic value, but they are pretty obscure and not really used in practice AFAIK. They are also pretty hard to describe in Verilog, since it generally gets the level-sensitivity of resets wrong.
namespace circt { | ||
namespace llhd { | ||
static llvm::raw_ostream &operator<<(llvm::raw_ostream &os, const DNF &dnf) { | ||
dnf.printWithValues(os); | ||
return os; | ||
} | ||
} // namespace llhd | ||
} // namespace circt |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit: shouldn't this be in DNF.h?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah good point. I'll move that over.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I just remembered why this is here. The unit tests want to define their own operator with different value printing. Since the Deseq pass is the only thing that uses this specific style of printing, and only for debugging, the <<
overload is in here.
propagate(cast<Block *>(node)); | ||
} | ||
LLVM_DEBUG(llvm::dbgs() << "- Finished propagation\n"); | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Computing the DNF for all values is quite inefficient as the runtime is exponential in the number of terms. I see two options:
- Change the DNF datastructure to just collect the terms and only reorder them to form a DNF and optimize it when asked to do so instead of on-the-fly as soon as a new term is added
- Construct the DNFs backward from values that we need to know the DNF of
But we can just merge this and do some benchmarking/profiling as well as collecting statistics about expression sizes and number of primitive values to properly optimize this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Excellent point. I initially had an implementation that performed a depth-first traversal of the interesting signals to determine DNFs for the trigger signals specifically. I think I'll change Deseq over to that approach again in a second iteration. The pass currently has pretty poor scaling for exactly the reasons you mention.
Add an updated version of the desequentialization pass. This new version expects all probes and drives to have been hoisted out of processes, and for the processes to produce the interesting signals as a result. Resets can now be distinguished from clocks more reliably. The previous version of the pass is left untouched for now, since it is used in the circt-verilog pipeline. Once process lowering also works with process results, we can get rid of the old pass implementations in one go. This pass is an absolute headache to implement because of the fuzzy and ill-defined Verilog and VHDL semantics it has to match. I expect us to iterate heavily on this in the future, along the definition of `llhd.process` and potentially process flavors that are more convenient to analyze. In particular, the pass has to do a lot of things in one go: it checks whether a process has only a single wait, if the wait carries past values into the present as block arguments, whether the IR expresses a posedge/negedge detection based on past and present values, whether the process properly observes the necessary triggers values, and it has to match the detected triggers against a very narrow list of things that a register can represent. Life would be easier if we had a more restricted form of processes that already express the basic structure of a loop through a single wait, observed triggers, and which past values are required. This would then also allow us to detect and generate latches, which are missing completely right now. The pass currently performs a data flow analysis by propagating DNFs and tables of potential result values across a lattice. I'm not too happy with this implementation, and doing a depth-first traversal of the IR may be a better approach in the future. We should probably revisit this once we have improved the process op a bit and update this pass again. This commit also adds an implementation of the Disjunctive Normal Form as a way of expressing and manipulating boolean functions. This is necessary to reliably detect if a process computes the posedge or negedge of a value. In the future we may want to switch this to a more compact representation of the underlying truth table, which can be done fairly efficiently using APInt. The current implementation of `DNF::optimize()` already does this internally. Note that DNFs are very inefficient at representing arbitrary boolean functions, as they have a tendency to grow exponentially. Therefore care must be taken that only operations on potential trigger values of sequential processes are captured in the DNF, and all other values are kept as opaque values. This keeps the complexity bounded.
29a51d2
to
274f8b2
Compare
Going to land this and iterate on some of the performance stuff in follow-up PRs. |
Add an updated version of the desequentialization pass. This new version expects all probes and drives to have been hoisted out of processes, and for the processes to produce the interesting signals as a result. Resets can now be distinguished from clocks more reliably. The previous version of the pass is left untouched for now, since it is used in the circt-verilog pipeline. Once process lowering also works with process results, we can get rid of the old pass implementations in one go. This pass is an absolute headache to implement because of the fuzzy and ill-defined Verilog and VHDL semantics it has to match. I expect us to iterate heavily on this in the future, along the definition of `llhd.process` and potentially process flavors that are more convenient to analyze. In particular, the pass has to do a lot of things in one go: it checks whether a process has only a single wait, if the wait carries past values into the present as block arguments, whether the IR expresses a posedge/negedge detection based on past and present values, whether the process properly observes the necessary triggers values, and it has to match the detected triggers against a very narrow list of things that a register can represent. Life would be easier if we had a more restricted form of processes that already express the basic structure of a loop through a single wait, observed triggers, and which past values are required. This would then also allow us to detect and generate latches, which are missing completely right now. The pass currently performs a data flow analysis by propagating DNFs and tables of potential result values across a lattice. I'm not too happy with this implementation, and doing a depth-first traversal of the IR may be a better approach in the future. We should probably revisit this once we have improved the process op a bit and update this pass again. This commit also adds an implementation of the Disjunctive Normal Form as a way of expressing and manipulating boolean functions. This is necessary to reliably detect if a process computes the posedge or negedge of a value. In the future we may want to switch this to a more compact representation of the underlying truth table, which can be done fairly efficiently using APInt. The current implementation of `DNF::optimize()` already does this internally. Note that DNFs are very inefficient at representing arbitrary boolean functions, as they have a tendency to grow exponentially. Therefore care must be taken that only operations on potential trigger values of sequential processes are captured in the DNF, and all other values are kept as opaque values. This keeps the complexity bounded.
Add an updated version of the desequentialization pass. This new version expects all probes and drives to have been hoisted out of processes, and for the processes to produce the interesting signals as a result. Resets can now be distinguished from clocks more reliably. The previous version of the pass is left untouched for now, since it is used in the circt-verilog pipeline. Once process lowering also works with process results, we can get rid of the old pass implementations in one go.
This pass is an absolute headache to implement because of the fuzzy and ill-defined Verilog and VHDL semantics it has to match. I expect us to iterate heavily on this in the future, along the definition of
llhd.process
and potentially process flavors that are more convenient to analyze.In particular, the pass has to do a lot of things in one go: it checks whether a process has only a single wait, if the wait carries past values into the present as block arguments, whether the IR expresses a posedge/negedge detection based on past and present values, whether the process properly observes the necessary triggers values, and it has to match the detected triggers against a very narrow list of things that a register can represent.
Life would be easier if we had a more restricted form of processes that already express the basic structure of a loop through a single wait, observed triggers, and which past values are required. This would then also allow us to detect and generate latches, which are missing completely right now.
The pass currently performs a data flow analysis by propagating DNFs and tables of potential result values across a lattice. I'm not too happy with this implementation, and doing a depth-first traversal of the IR may be a better approach in the future. We should probably revisit this once we have improved the process op a bit and update this pass again.
This commit also adds an implementation of the Disjunctive Normal Form as a way of expressing and manipulating boolean functions. This is necessary to reliably detect if a process computes the posedge or negedge of a value. In the future we may want to switch this to a more compact representation of the underlying truth table, which can be done fairly efficiently using APInt. The current implementation of
DNF::optimize()
already does this internally. Note that DNFs are very inefficient at representing arbitrary boolean functions, as they have a tendency to grow exponentially. Therefore care must be taken that only operations on potential trigger values of sequential processes are captured in the DNF, and all other values are kept as opaque values. This keeps the complexity bounded.