WebGLM: Towards An Efficient Web-enhanced Question Answering System with Human Preference

📃 Paper (KDD 2023) • 🌐 中文 README

This is the official implementation of WebGLM. If you find our open-sourced efforts useful, please 🌟 the repo to encourage our following developement!

demo.mp4

Read this in 中文.

Overview
- Features
Preparation
Try WebGLM
Train WebGLM
- Train Generator
  - Prepare Data
  - Training
- Train Retriever
  - Prepare Data
  - Training
Evaluation
Real Application Cases
Citation

Overview

WebGLM aspires to provide an efficient and cost-effective web-enhanced question-answering system using the 10-billion-parameter General Language Model (GLM). It aims to improve real-world application deployment by integrating web search and retrieval capabilities into the pre-trained language model.

Features

LLM-augmented Retriever: Enhances the retrieval of relevant web content to better aid in answering questions accurately.
Bootstrapped Generator: Generates human-like responses to questions, leveraging the power of the GLM to provide refined answers.
Human Preference-aware Scorer: Estimates the quality of generated responses by prioritizing human preferences, ensuring the system produces useful and engaging content.

Preparation

Prepare Code and Environments

Clone this repo, and install python requirements.

pip install -r requirements.txt

Install Nodejs.

apt install nodejs # If you use Ubuntu

Install playwright dependencies.

playwright install

If browsing environments are not installed in your host, you need to install them. Do not worry, playwright will give you instructions when you first execute it if so.

Prepare SerpAPI Key

In search process, we use SerpAPI to get search results. You need to get a SerpAPI key from here.

Then, set the environment variable SERPAPI_KEY to your key.

export SERPAPI_KEY="YOUR KEY"

Prepare Retriever Checkpoint

Download the checkpoint on Tsinghua Cloud by running the command line below.

You can manually specify the path to save the checkpoint by --save SAVE_PATH.

python download.py retriever-pretrained-checkpoint

Try WebGLM

Before you run the code, make sure that the space of your device is enough.

Export Environment Variables

Export the environment variable WEBGLM_RETRIEVER_CKPT to the path of the retriever checkpoint. If you have downloaded the retriever checkpoint in the default path, you can simply run the command line below.

export WEBGLM_RETRIEVER_CKPT=./download/retriever-pretrained-checkpoint

Run as Command Line Interface

You can try WebGLM-2B model by:

python cli_demo.py -w THUDM/WebGLM-2B

Or directly for WebGLM-10B model:

python cli_demo.py

Name		Name	Last commit message	Last commit date
Latest commit History 10 Commits
assets		assets
data		data
evaluate		evaluate
model		model
scripts		scripts
.gitignore		.gitignore
LICENSE		LICENSE
MODEL_LICENSE		MODEL_LICENSE
README.md		README.md
README_zh.md		README_zh.md
arguments.py		arguments.py
cli_demo.py		cli_demo.py
download.py		download.py
evaluate.py		evaluate.py
requirements.txt		requirements.txt
train_retriever.py		train_retriever.py
web_demo.py		web_demo.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

WebGLM: Towards An Efficient Web-enhanced Question Answering System with Human Preference

Overview

Features

Preparation

Prepare Code and Environments

Prepare SerpAPI Key

Prepare Retriever Checkpoint

Try WebGLM

Export Environment Variables

Run as Command Line Interface

Run as Web Service

Train WebGLM

Train Generator

Prepare Data

Training

Train Retriever

Prepare Data

Training

Evaluation

Real Application Cases

Citation

About

Uh oh!

Releases

Packages

Languages

License

smart-boot/WebGLM

Folders and files

Latest commit

History

Repository files navigation

WebGLM: Towards An Efficient Web-enhanced Question Answering System with Human Preference

Overview

Features

Preparation

Prepare Code and Environments

Prepare SerpAPI Key

Prepare Retriever Checkpoint

Try WebGLM

Export Environment Variables

Run as Command Line Interface

Run as Web Service

Train WebGLM

Train Generator

Prepare Data

Training

Train Retriever

Prepare Data

Training

Evaluation

Real Application Cases

Citation

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages