Web Crawler NodeJS

Web Crawler build using NodeJS

Description

A recursive web crawler built using NodeJS that harvest all possible hyperlinks belonging to a particular domain (default: medium.com) and stores them in a CSV files.

Getting Started

To get started, clone this repository locally and move inside the repository directory

Dependencies

NodeJS

Installing

In the project directory, type npm install to install all the packages and dependencies.
Rename .env.example to .env and assign a PORT number (e.g. 3000) to it.
To configure default URL to be crawled, open config.js and update the value of key url to your own URL that you want to crawl.

Executing program

To run the app, type the following command:

node index.js

Then visit localhost:PORT/crawl to start crawling.

The application will create CSV files of the app in data/*.csv directory for every different pages that it will visit.

Help

Are you struck while working with the app? Or still have some doubt of how to work on it something. Please feel free to open an issue anytime.

Authors

Abhijeet Singh

Name		Name	Last commit message	Last commit date
Latest commit History 6 Commits
data		data
.env.example		.env.example
.gitignore		.gitignore
Dockerfile		Dockerfile
README.md		README.md
RENTOMOJO.md		RENTOMOJO.md
config.js		config.js
index.js		index.js
package.json		package.json

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Web Crawler NodeJS

Description

Getting Started

Dependencies

Installing

Executing program

Help

Authors

About

Uh oh!

Releases

Packages

Uh oh!

Languages

abhijeetps/noddler

Folders and files

Latest commit

History

Repository files navigation

Web Crawler NodeJS

Description

Getting Started

Dependencies

Installing

Executing program

Help

Authors

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Languages

Packages