8000 Toimik · GitHub
[go: up one dir, main page]
More Web Proxy on the site http://driver.im/
Skip to content
@toimik

Toimik

Open source projects for an upcoming web search engine
  • Singapore

Pinned Loading

  1. WarcProtocol WarcProtocol Public

    Parser for WARC (aka WebArchive) files

    C# 12 3

  2. CommonCrawl CommonCrawl Public

    Common Crawl's processing tools

    C# 10

  3. UrlNormalization UrlNormalization Public

    URL normalizer to canonicalize (standardize) the text representation of a URL to determine if differently-formatted URLs are identical

    C# 5

  4. SitemapsProtocol SitemapsProtocol Public

    Parsers for sitemap / sitemap index (aka Sitemaps Protocol)

    C#

  5. RobotsProtocol RobotsProtocol Public

    Parsers for robots.txt (aka Robots Exclusion Standard / Robots Exclusion Protocol), Robots Meta Tag, and X-Robots-Tag

    C#

Repositories

Showing 7 of 7 repositories

Top languages

Loading…

Most used topics

Loading…

0