8000 Tags · S4M8/intel-extractor · GitHub
[go: up one dir, main page]
More Web Proxy on the site http://driver.im/
Skip to content

Tags: S4M8/intel-extractor

Tags

v0.2.4

Toggle v0.2.4's commit message
add cross-platform browser compatibility

v0.2.3

Toggle v0.2.3's commit message

Verified

This commit was created on GitHub.com and signed with GitHub’s verified signature.
Scraping improvements (#4)

* wip/curious script
Update scraping strategy to target API directly for scraping instead of scrolling the DOM. Add functions to handle verification that the org exists. Determine number of pages to be scraped for citizen dossiers. Add all parsed citizen urls to an array.

* wip/add sqlite database for citizen data storage and export
Add Sequelize package. Create Citizen model with required fields. Add database initialization on startup.

* update scraping method and csv structure

* wip/updated csv structure

* various improvements

* cleanup

* feat/updated scraping method and csv strucuture

* chore/update version

* cleanup and formatting

* fix/reduce org name to only SID; update csv to remove mainOrg from affiliation list
0