8000 Running with Python 3.x · Issue #1 · SongDark/FPgrowth · GitHub
[go: up one dir, main page]
More Web Proxy on the site http://driver.im/
Skip to content

Running with Python 3.x #1

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
jolasman opened this issue Nov 20, 2018 · 2 comments
Open

Running with Python 3.x #1

jolasman opened this issue Nov 20, 2018 · 2 comments

Comments

@jolasman
Copy link

Hi! :) I changed some of the code to use with Python 3,, however, I have some issues.
I cannot find a library with the FP-growth algorithm that works. I tried the pyspark one and the FP-growth. In the pyspark one, I end up with spark's connection errors after some runs. It was working in the beginning, but then it blew up. The second one cannot handle my dataset due to memory problems.

Btw, after I changed some dict problems with iteritems() and has_key(), the nimeFPtree function gives me an error that I do not know what it is:

bigL = [v[0] for v in sorted(headerTable.items(), key=lambda p: p[1])] # (sort header table)
AttributeError: 'NoneType' object has no attribute 'items'

Any thoughts?

Thanks in advance

@Inger-Chao
Copy link

The error happens because headerTable has None value returned in the createFPtree method,

    for k in list(headerTable.keys()):
        if headerTable[k] < minSup:
            del (headerTable[k])  # 删除不满足最小支持度的元素
    freqItemSet = set(headerTable.keys())  # 满足最小支持度的频繁项集
    if len(freqItemSet) == 0:
        return None, None

the headerTable[k] value was all deleted and finally headerTable return None.
The author set the n = 20000 in the demo, maybe it's too big for your dataset, and I decreased the n value to make this demo works at my dataset.

@WissenY
Copy link
WissenY commented Apr 24, 2020

想请作者解释一下,在支持度计数为100000的情况下,如何在mac上用13秒跑完(你的中文博客如是写道),我将你的代码改为python3.7后,在8代i7,内存16g下也依然跑了十几分钟

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
421B
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants
0