8000 Week 10: Finish the Implementation of Regularized Radom Forest Framework, Tune Parameters and if possible implement Pruning of the Trees · Issue #20 · azmfaridee/mothur · GitHub
[go: up one dir, main page]
More Web Proxy on the site http://driver.im/
Skip to content

Week 10: Finish the Implementation of Regularized Radom Forest Framework, Tune Parameters and if possible implement Pruning of the Trees #20

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
azmfaridee opened this issue Jun 29, 2012 · 2 comments
Assignees

Comments

@azmfaridee
Copy link
Owner

Related Issues: #3, #14, #15, #16, #17, #19

As per issue #19 I have been experimenting with outcomes after we have been running the algorithm on some real life data. Some of the new concerns are tree pruning for better performance, dealing with over-fitting data as well as the scheduled implementation of Regularized Random Forest Algorithm.

In this week I'd try to follow up with all these ideas that have been swirling around

End of Week Deliverable

  • Complete the implementation of Regularized Random Forest and compare performance/accuracy with normal Random Forest
  • Possible improved Performance with pruned trees
  • Investigate other ways of performance turing
@ghost ghost assigned azmfaridee Jun 29, 2012
azmfaridee added a commit that referenced this issue Jul 1, 2012
azmfaridee added a commit that referenced this issue Jul 1, 2012
… of features selected in each split now can be fine tuned between log2 and square root of number of features
azmfaridee added a commit that referenced this issue Jul 3, 2012
azmfaridee added a commit that referenced this issue Jul 3, 2012
azmfaridee added a commit that referenced this issue Jul 3, 2012
@azmfaridee
Copy link
Owner Author

End of Week Update:

  • The implementation of Regularized Radom Forest is note 100% complete, there are a lot of classes that needs to be recycled, so I need to be extra careful to not to break existing code.
  • As for performance tuning, I've been reading the literature again and discovered that we need to use Information Gain Ratio than using basic information gain, that would boost accuracy a bit more.
  • Optimum number of selected feature in each split can be fine tuned between Log2 and SquareRoot based measure.
  • As I did not get sufficient learning materials (the books that I mentioned), I could not implement pruning.

@azmfaridee
Copy link
Owner Author

@kdiverson I've uploaded a pdf titled Pruning Decision Trees and Lists.pdf in Dropbox in the Pruning Folder. It seems to be a PhD thesis paper, and has some good examples of popular pruning methods. Take a look.

azmfaridee added a commit that referenced this issue Aug 8, 2012
azmfaridee added a commit that referenced this issue Aug 9, 2012
…utput + some function call parameter change
azmfaridee added a commit that referenced this issue Aug 10, 2012
…eems to be giving more consistant results than more aggressive value of 1
azmfaridee added a commit that referenced this issue Aug 10, 2012
azmfaridee added a commit that referenced this issue Aug 10, 2012
…ess are automatically discarded (error threshold defaults to 40% now).
azmfaridee added a commit that referenced this issue Sep 8, 2012
…k inside mothur, plus added mothurOut class in all of the RF classes to easy debug message anywhere
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

1 participant
0