8000 add some popular http library & scraping framework by Zerorigin · Pull Request #2524 · coreruleset/coreruleset · GitHub
[go: up one dir, main page]
More Web Proxy on the site http://driver.im/
Skip to content

add some popular http library & scraping framework #2524

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 2 commits into from
Jun 13, 2022
Merged

add some popular http library & scraping framework #2524

merged 2 commits into from
Jun 13, 2022

Conversation

Zerorigin
Copy link
Contributor
  1. Add some popular http client library into scripting-user-agents.data
# https://docs.aiohttp.org/en/stable/
# User-Agent from: https://github.com/aio-libs/aiohttp/blob/3.0/aiohttp/http.py
# User-Agent: Python/VERSION aiohttp/VERSION
aiohttp/


# https://pkg.go.dev/net/http
# old User-Agent from: https://codereview.appspot.com/7532043
 # new User-Agent from: https://cs.opensource.google/go/go/+/refs/tags/go1.10rc1:src/net/http/request.go;l=462
# User-Agent: Go http package
# User-Agent: Go 1.1 package http
# User-Agent: Go-http-client/VERSION
Go http package
Go 1.1 package http
Go-http-client/


# https://www.python-httpx.org/
# User-Agent: python-httpx/VERSION
python-httpx/


# https://kong.github.io/unirest-java/
# User-Agent: unirest-java/VERSION
unirest-java/
  1. Add some popular scraping framework into crawlers-user-agents.data
# https://ache.readthedocs.io/en/latest/
# User-Agent from: https://github.com/VIDA-NYU/ache/blob/master/ache-tools/src/main/java/achecrawler/tools/GenerateTLDLists.java
# User-Agent: (Mozilla/5.0 (compatible; ACME/VERSION; +OPERATOR_CONTACT_URL; +OPERATOR_CONTACT_EMAIL)
ACHE/


# http://go-colly.org/
# User-Agent from: https://github.com/gocolly/colly/blob/master/colly.go
# User-Agent: colly - https://github.com/gocolly/colly/v2
colly -


# https://github.com/yasserg/crawler4j
# User-Agent from: https://github.com/yasserg/crawler4j#user-agent-string
# User-Agent: crawler4j (https://github.com/yasserg/crawler4j/)
crawler4j


# https://github.com/internetarchive/heritrix3
# User-Agent from: https://github.com/internetarchive/heritrix3/blob/master/modules/src/main/java/org/archive/modules/CrawlMetadata.java
# User-Agent: "Mozilla/5.0 (compatible; heritrix/VERSION +OPERATOR_CONTACT_URL)
heritrix/


# http://docs.seattlerb.org/mechanize/Mechanize.html
Mechanize


# https://nutch.apache.org/
# User-Agent from: https://github.com/apache/nutch/blob/2.1/src/plugin/lib-http/src/java/org/apache/nutch/protocol/http/api/HttpBase.java
# User-Agent: NutchCVS/VERSION (Nutch; http://lucene.apache.org/nutch/bot.html; nutch-agent@lucene.apache.org)
NutchCVS/


# https://docs.pyspider.org/en/latest/
# User-Agent from: https://docs.pyspider.org/en/latest/tutorial/AJAX-and-more-HTTP/#user-agent
# User-Agent: pyspider/VERSION (+http://pyspider.org/)
pyspider/


# https://scrapy.org/
# User-Agent from: https://docs.scrapy.org/en/latest/topics/settings.html?highlight=user-agent#user-agent
# User-Agent: Scrapy/VERSION (+https://scrapy.org)
Scrapy/

@dune73
Copy link
Member
dune73 commented Jun 13, 2022

Have not heard any additional comment on this PR. Looks good to me, merging now.

Thank @Zerorigin for your welcome contribution. Well done.

@dune73 dune73 merged commit 6cb1f1d into coreruleset:v4.0/dev Jun 13, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants
0