Q&A Bot

Install Docker

The project is based on docker. So in order to run it, one must install it. Here are descriptions for installing Docker on Mac, Linux and Windows.

How to run

Install docker and docker-compose
Download the data files from here and un-tar it.
Unzip the "gaming" zip-file and place it in resources so that the path is like resources/gaming/<all the json-files>
Run Elasticsearch by issuing docker-compose up -d elasticsearch
Build all the packages by issuing docker-compose build
Index the questions by issuing docker-compose run --rm indexer gaming answers
Index the questions by issuing docker-compose run --rm indexer gaming questions
Run the server by issuing docker-compose up -d server
Open the html-client (client/index.html) in a web-browser and search for something.

Project Description

From project description:

There are so many forums out there where people answer each other’s questions. Do we really still need people to do that? Aren't most questions answered already? For some topics it sure feels like all questions must have been answered by now. By eliminating the need for human interaction, your answer should be available way quicker! The idea behind this project is to create a page where you state your question and get an automatic answer (within a specific area) automatically, by indexing questions and answers already written by others in a forum.

The assignment:

Fetch questions and answers ( I have a large batch of familjeliv.se data collected that can be used).
Use elasticsearch ( https://github.com/elastic/elasticsearch), or another search engine of your choice, to index the questions and answers.
Build a query analyzer that can help you create a query to your search engine by, for example, selecting the most important words, or comparing the entir e question to questions in the index.
Query your search engine and select the best answer available.
Present answer in a web interface.

An interesting extension is to try to devise a scoring function that determines how sure the system is of the answer.

Project structure

indexer/ contains the part that reads raw data and puts it in Elasticsearch.

server/ contains the actual searcher, it also contains some form of http-server in order to be able to communicate with the client.

client/ contains the front-end that connects to the server and displays the answer to a given question.

report/ contains the report (for now written in LaTeX)

Docker

Docker should be able to help us run things without getting weird dependency bugs. Setting up docker and running the hello world takes 30 minutes at most. This page was very helpful in explaining how to write Dockerfiles and how to run containers.

Compile

To compile with docker-compose one can issue the command docker-compose build <packagename>. In order to build everything just leave out the package name.

Elasticsearch

To get Elasticsearch up and running install docker-compose and issue the command:

docker-compose start elasticsearch

in the root of the project (where the docker-compose.yml file is).

Note: The start command might not work first time running the docker-compose, in that case try docker-compose up -d elasticsearch.

Run indexer

To run indexer, make sure that Elasticsearch container is running. This can be done by issuing docker ps.

To index, check docker-compose run --rm indexer --help

Note the --rm this flag is present so that docker does not create one container per execution but removes it after indexer has shut down.

Data

Please download CQADupStack, put the uncompressed to resources (see the indexer.py, and you'll know what I mean)

TODO (some idea)

Use topic model (trained offline) to decide the probability over all forums that the query might belong in, and search those forums which probability > threshold (but need to index all 14 forums first, which might take roughly 3 hr)
Well-design the query in find_similar_query()
Use link_analysis like PageRank to give global score of each user(id), consider this to our question ranking
Discard comment at the moment, maybe they are useful?

Reference

Scripts for querying CQADupStack data
Supervised Learning of Universal Sentence Representations

Name		Name	Last commit message	Last commit date
Latest commit History 66 Commits
client		client
indexer		indexer
report		report
resources		resources
server		server
.gitignore		.gitignore
Precision-recall-InferSent.png		Precision-recall-InferSent.png
Precision-recall.ods		Precision-recall.ods
Precision-recall.png		Precision-recall.png
README.md		README.md
docker-compose.yml		docker-compose.yml
f1_score.txt		f1_score.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Q&A Bot

Install Docker

How to run

Project Description

Project structure

Docker

Compile

Elasticsearch

Run indexer

Data

TODO (some idea)

Reference

About

Uh oh!

Releases

Packages

Uh oh!

Contributors 4

Uh oh!

Languages

gabrieltigerstrom/QnA_bot

Folders and files

Latest commit

History

Repository files navigation

Q&A Bot

Install Docker

How to run

Project Description

Project structure

Docker

Compile

Elasticsearch

Run indexer

Data

TODO (some idea)

Reference

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors 4

Uh oh!

Languages

Packages