8000 GitHub - outpoot/vyntr: Independent search engine. Includes web crawling, search indexing, dictionary API, and more. https://vyntr.com
[go: up one dir, main page]
More Web Proxy on the site http://driver.im/
Skip to content
/ vyntr Public

Independent search engine. Includes web crawling, search indexing, dictionary API, and more. https://vyntr.com

License

Notifications You must be signed in to change notification settings

outpoot/vyntr

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Vyntr.com - the independent search engine.

Privacy Policy | Terms of Service | License | YouTube video

Vyntr is a search engine project with multiple components:

Components

  • Genesis - Web crawler and content analyzer
  • Pulse - Search indexing system using Tantivy
  • Lexicon - WordNet-based dictionary lookup service
  • Website - Frontend interface at vyntr.com

Setup

  1. Create a .env file in the root directory:
# Database
PRIVATE_DB_URL="postgresql://postgres:your_password@serverip:port/postgres"

# AWS S3/Compatible Storage
S3_ENDPOINT="https://s3.eu-central-1.amazonaws.com"
S3_REGION="eu-central-1"
S3_BUCKET="vyntr"
AWS_ACCESS_KEY_ID="your-key-id"
AWS_SECRET_ACCESS_KEY="your-secret-key"
  1. Set up the database:
cd genesis/tools/database
docker compose up -d
  1. Set up individual components:

Pipeline

  1. Genesis crawler collects and analyzes web pages
  2. Data is stored in partitioned JSONL files in S3
  3. Content is cleaned through dataset.
  4. Content is processed through embedding tools (vector), or Pulse (full-text).
  5. Website frontend provides search interface.

Requirements

  • Python with uv package manager
  • Node.js
  • PostgreSQL with pgvector
  • Docker
  • Bun runtime (for Lexicon service)
  • Rust toolchain

Dataset

The Vyntr dataset is not publicly available. For licensing inquiries, please contact contact@outpoot.com.

You may also use the official API provided at https://vyntr.com/api.

License

This project is licensed under the Creative Commons Attribution-NonCommercial 4.0 International License (CC BY-NC 4.0). See the LICENSE file for details.

Individual components may have additional licensing requirements. See their respective directories for specific licensing information.

WordNet data used in Lexicon is subject to the WordNet License.

0