This repository contains some Python scripts that allow you to
- generate an embedded corpus of knowledge base articles
- query it using natural language questions from the command line
- serve a web app that can also query the corpus
# set your openapi key
$> export OPENAI_API_KEY=sk-q9u...
# install dependencies
$> virtualenv venv
$> source venv/bin/activate
$> pip install -r requirements.txt
# generate knowledge base
$> python generate_corpus.py # generates file 'sample-content/corpus.csv'
# issue natural language questions against knowledge base
$> ./query_corpus.py "what operations exist on the openweathermap api?"
The OpenWeatherMap API offers a variety of operations ...
# Ask a question that the corpus hasn't seen
$> ./query_corpus.py \"what is lilys main goal?\"
I dont know.
# Add a new piece of info to the knowledge base.
# For example, write a new post to posts.csv about Lily the sailor.
# Regenerate the corpus
$> ./generate_corpus.py
id title body
# ...
7 Lily the sailor Once upon a time, there was a young woman name...
# ...
# Ask the original question
$> ./query_corpus.py "what is lilys main goal?"
Lilys main goal is to lead her own crew and set out on her own adventure.
The generate_corpus.py
script reads in CSV, MD, and YAML files, defined in corpus_source_files.txt
, containing knowledge base articles and generates a corpus.csv file that includes the articles and their embeddings generated using the OpenAI GPT-3 language model. The embeddings capture the semantic meaning of the articles, allowing for more accurate matching when querying the knowledge base.
To use generate_corpus.py, simply specify the input files and run the script. The resulting corpus.csv file can be used by the query_corpus.py script to perform natural language queries.
The query_corpus.py
script takes in a natural language question, calculates its embedding using the OpenAI GPT-3 language model, and queries the corpus.csv file to find the most relevant articles. The script then passes the relevant content to the OpenAI Davinci-003 model, which generates an answer to the question.
To use query_corpus.py, simply specify the input question and run the script. The script will output the most relevant article and its answer to the question.
By using these two scripts together, you can easily create and query a knowledge base of articles using natural language questions. The embeddings generated by the OpenAI GPT-3 language model provide a powerful way to match questions with relevant articles, and the OpenAI Davinci-003 model ensures that the answers are accurate and informative.
serve_kbq.py
is a Python script that serves a Flask app for querying the corpus.
The app is built using the Flask web framework and calls query_corpus
.
By default it is served on port 7777, and must be restarted whenever the corpus is regenerated.