kearch is a distributed search engine. You can set up your own search engine using kearch and connect your search engine to another search engine.
You can access our search engine from https://kearch.info.
There are two types of search engines in kearch. One is specialist search engine and the other is meta search engine. A specialist search engine is a specialized search engine for a topic. For example, a search engine for history, programming language ... anything you want.
On the other hand, a meta search engine is used for connecting specialized search engines. You can conect any specialist search engines using a meta search engine. For example, you can get search engine about some programming languages when you connect specialized search engines about Lisp, Haskell, C#, etc..
If you want to set up your own specialist search engine, please read from 1. Specialist search engine. If you want to set up your own meta search engine, please read from 2. Meta search engine.
First of all, you need to prepare a server for a specialist search engine. Minimum spec for a specialist search engine is as follows.
- RAM: 8GiB
- SSD/HDD: 100GiB
- CPU: Dual core processor
- OS: Ubuntu 18.04
- Global IP adress or domain
- SSH login using public key authentication
You can get a qualified server using Sakura Cloud, AWS, GCP or Microsoft Azure.
Second, deploy a specialist search engine using Ansible. If you don't install Ansible to your local machine, please install it first. You can install Ansible by following commands.
- Debian/Ubuntu:
sudo apt install ansible
- Mac:
brew install ansible
And then clone this repository your local machine by the following command.
~$ git clone https://github.com/kearch/kearch.git
Finally, deploy a specialist search engine using Ansible. Please replace <HOSTNAME>
and <USERNAME>
depending on your environment. (In most cases, <HOSTNAME>
is the IP adress of your server. Don't forget a comma after <HOSTNAME>
. ) This takes some time to finish. I recommend you to take a coffee break.
~/kearch$ ansible-playbook sp-playbook.yml -i <HOSTNAME>, -u <USERNAME> --ask-become-pass -vvv
Please access http://HOSTNAME-OR-IP-ADRESS-OF-YOUR-SERVER:32700. You can see this screen if you succeeded to set up.
The default Username and Password are "root" and "password". We strongly recommend you to update password immdiately after login.
After updating password, Please set engine name here.
And set the global IP adress of your server here.
Now, you can set a topic to your specialist search engine. There are two way to set a topic. One is using word frequency dictionary (Method A) and the other is using URLs (Method B). You must choose one of them. I think word frequency dictionary (Method A) is better.
You must choose a language and then input word frequencies in your crawling topic and Word frequencies in random topic.
You shoud input characteristic words and their ratio in word frequencies in your crawling topic. If you feel troublesome to input, please have a look Appendix4. You can find easy way to generate text to input there.
You should input all words and their ratio in the Web in word frequencies in random topic. But it is very difficult. So I recommend you to check use default dict.
You must choose a language and input some URLs related your own topic in URLs in your crawling topic. And then, input some URLs about random topics in URLs in random topic.
Though this method is easier than frequency dictionary one, it is rougher. This is because I recommend you to use Method A.
Then, you can start crawling from some URLs. Please specify some URLs from here.
Now, you can use your specialist search engine from http://HOSTNAME-OR-IP-ADRESS-OF-YOUR-SERVER:32550.
There are two cases for connecting a specialist search engine and a meta search engine. One is sending a connection request from a specialist search and another is sendinf from a meta search engine.
In this case, you send a connection request from your specialist search engine.
After sending a connection request, the administrator of the meta search engine will approve your request. Then, two search engines are connected. You can confirm it by check here.
In this case, you receive a connection request from a specialist search engine. When a specialist search engine send a connection request to your meta search engine, it is displayed in this way.
You can approve a connection request just pushing approve button.
First of all, you need to prepare a server for a specialist search engine. Minimum spec for a specialist search engine is following.
- RAM: 4GiB
- SSD/HDD: 100GiB
- CPU: Dual core processor
- OS: Ubuntu 18.04
- Global IP adress or domain
- SSH login using public key authentication
You can get a qualified server using Sakura Cloud, AWS, GCP or Microsoft Azure.
Second, deploy a meta search engine using Ansible. If you don't install Ansible to your local machine, please install it first. You can install Ansible by following commands.
- Debian/Ubuntu:
sudo apt install ansible
- Mac:
brew install ansible
And then clone this repository your local machine by the following command.
~$ git clone https://github.com/kearch/kearch.git
Finally, deploy a meta search engine using Ansible. Please replace <HOSTNAME>
and <USERNAME>
depending on your environment. (In most cases, <HOSTNAME>
is the IP adress of your server. Don't forget a comma after <HOSTNAME>
. ) This takes some time to finish. I recommend you to take a coffee brake.
~/kearch$ ansible-playbook me-playbook.yml -i <HOSTNAME>, -u <USERNAME> --ask-become-pass -vvv
Please access http://HOSTNAME-OR-IP-ADRESS-OF-YOUR-SERVER:32700. You can see this screen if you succeeded to set up.
The default Username and Password are "root" and "password". We strongly recommend you to update password immdiately after login.
And set the global IP adress of your server here.
There are two cases for connecting a meta search engine and a specialist search engine. One is sending a connection request from a meta search and another is sending from a specialist search engine.
In this case, you send a connection request from your meta search engine.
After sending a connection request, the administrator of the specialist search engine will approve your request. Then, two search engines are connected. You can confirm it by check here.
In this case, you receive a connection request from a meta search engine. When a meta search engine send a connection request to your specialist search engine, it is displayed in this way.
You can approve a connection request just pushing approve button.
Now, you can use your meta search engine from http://HOSTNAME-OR-IP-ADRESS-OF-YOUR-SERVER:32450.
git clone https://github.com/kearch/kearch.git
cd kearch
./sp_deploy.sh spdb spes all
./me_deploy.sh medb all
- 32700: Admin setting page port of specialist search engines
- 32600: Admin setting page port of meta search engines
- 32500: Gateway port of specialist search engines
- 32400: Gateway port of meta search engines
- 32550: Search engine front page port of specialist search engines
- 32450: Search engine front page port of meta search engines
Check the specialist DB.
./sp_db_checker.sh
Check the meta DB.
./me_db_checker.sh
You can generate frequencies from URLs easily using generate_frequencies_from_URLs.py
in utils
dicrtory.
$ cd utils
$ python3 generate_frequencies_from_URLs.py haskell_list
haskell 213
language 55
programming 43
ghc 42
...
Please replace haskell_list
with your own URL list and generate your frequencies. URL list is just only a text file of newline-separated URLs.