Knowledge Base Query

This repository contains some Python scripts that allow you to

generate an embedded corpus of knowledge base articles
query it using natural language questions from the command line
serve a web app that can also query the corpus

Quick start

# set your openapi key
$> export OPENAI_API_KEY=sk-q9u...

# install dependencies
$> virtualenv venv
$> source venv/bin/activate
$> pip install -r requirements.txt

# generate knowledge base
$> python generate_corpus.py # generates file 'sample-content/corpus.csv'

# issue natural language questions against knowledge base
$> ./query_corpus.py "what operations exist on the openweathermap api?"
The OpenWeatherMap API offers a variety of operations ...

The Workflow

# Ask a question that the corpus hasn't seen
$> ./query_corpus.py \"what is lilys main goal?\"
I dont know.

# Add a new piece of info to the knowledge base. 
# For example, write a new post to posts.csv about Lily the sailor.

# Regenerate the corpus
$> ./generate_corpus.py 
   id           title            body
# ... 
   7            Lily the sailor  Once upon a time, there was a young woman name...
# ... 

# Ask the original question
$> ./query_corpus.py "what is lilys main goal?"
Lilys main goal is to lead her own crew and set out on her own adventure.

Generate corpus

The generate_corpus.py script reads in CSV, MD, and YAML files, defined in corpus_source_files.txt, containing knowledge base articles and generates a corpus.csv file that includes the articles and their embeddings generated using the OpenAI GPT-3 language model. The embeddings capture the semantic meaning of the articles, allowing for more accurate matching when querying the knowledge base.

To use generate_corpus.py, simply specify the input files and run the script. The resulting corpus.csv file can be used by the query_corpus.py script to perform natural language queries.

Query corpus

The query_corpus.py script takes in a natural language question, calculates its embedding using the OpenAI GPT-3 language model, and queries the corpus.csv file to find the most relevant articles. The script then passes the relevant content to the OpenAI Davinci-003 model, which generates an answer to the question.

To use query_corpus.py, simply specify the input question and run the script. The script will output the most relevant article and its answer to the question.

By using these two scripts together, you can easily create and query a knowledge base of articles using natural language questions. The embeddings generated by the OpenAI GPT-3 language model provide a powerful way to match questions with relevant articles, and the OpenAI Davinci-003 model ensures that the answers are accurate and informative.

Serve KBQ

serve_kbq.py is a Python script that serves a Flask app for querying the corpus. The app is built using the Flask web framework and calls query_corpus. By default it is served on port 7777, and must be restarted whenever the corpus is regenerated.

Name		Name	Last commit message	Last commit date
Latest commit History 6 Commits
sample-content		sample-content
static		static
templates		templates
.gitignore		.gitignore
README.md		README.md
__init__.py		__init__.py
corpus_source_files.txt		corpus_source_files.txt
generate_corpus.py		generate_corpus.py
generate_corpus_source_files.py		generate_corpus_source_files.py
query_corpus.py		query_corpus.py
requirements.txt		requirements.txt
serve_kbq.py		serve_kbq.py
utils.py		utils.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Knowledge Base Query

Quick start

The Workflow

Generate corpus

Query corpus

Serve KBQ

About

Uh oh!

Releases

Packages

Uh oh!

Languages

mikerjacobi/kbq

Folders and files

Latest commit

History

Repository files navigation

Knowledge Base Query

Quick start

The Workflow

Generate corpus

Query corpus

Serve KBQ

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Languages

Packages