8000 GitHub - ke-01/CitaLaw
[go: up one dir, main page]
More Web Proxy on the site http://driver.im/
Skip to content

ke-01/CitaLaw

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

5 Commits
 
 
 
 
 
 
 
 

Repository files navigation

Implementation of CitaLaw

This is the implementation of the paper "CitaLaw: Enhancing LLM with Citations in Legal Domain" based on PyTorch.

File Structure

.
├── dataset  # * dataset path
│   ├── layperson # * question path 
│   ├── practitioner # * question path 
│   └── corpus
│       ├── law article  # * corpus path and index path
│       └── precedent case # * corpus path and index path
└── benchmark  # * evaluation benchmark
    ├── models  # * evaluation models
    │   ├── flashrag # * requirements codes from flashrag
    │   ├── citation_attach # * code for adding citations
    │   ├── closebook.py # * code for CloseBook
    │   ├── cgg.py # * code for citation-guided generation
    │   ├── arg.py # * code for answer refinement generation
    │   └── utils # * codes for utils
    ├── evaluation  # * codes for evaluation
    │   ├── global-level # * code for global-level metrics
    │   └── syllogism-level # * codes for syllogism-level metrics
    └── shell  # * script for quick evaluation

Satisfy the requirements

You need to check it according to the requirements file.

conda create -n CitaLaw python=3.10
conda activate CitaLaw
pip install -r requirements.txt

Quick Start

Model preparation

The bge-base-en-v1.5, Llama3-8b-Instruct, Qwen2-7b-Instruct can be downloaded from huggingface.

For Legal LLMs, download from their links:

Data preparation.

Check folder datasets for details.

Retrieve and Generation

cd benchmark/shell
# for layperson
sh layperson.sh

# for practitioner
sh practitioner.sh

Examples of Layperson dataset:

# legal LLMs
models=("lexilaw" "lawgpt_zh" "fuzi" "hanfei" "tailing" "zhihai" "disc_lawllm")
model_paths=("lexilaw_path" "lawgpt_zh_path"  "fuzi_path" "haifei_path" "tailing_path" "zhihai_path" "disc_path")
# Null if there is no lora path
lora_paths=("lexilaw_lora_path" "lawgpt_zh_lora_path"  "fuzi_lora_path" "haifei_lora_path" "tailing_lora_path" "zhihai_lora_path" "disc_lora_path") 

len=${#models[@]}

for ((i=0; i<$len; i++)); do
  model=${models[$i]}
  model_path=${model_paths[$i]}
  lora_path=${lora_paths[$i]}
  
  echo "Running model: $model with model_path: $model_path and $lora_path" 
  
  python cgg.py \
    --data_dir ../../datasets/layperson \
    --dataset_name layperson \
    --split "layperson_test" \
    --index_path ../../datasets/corpus/bge_law_article.index \
    --corpus_path ../../datasets/corpus/law_article_corpus.jsonl \
    --gpu_id 2 \
    --model_path "$model_path" \
    --generator_model "$model" \
    --generator_lora_path "$lora_path"
done

# open domain LLM
# qwen2
# closebook
python closebook.py   --data_dir ../../datasets/layperson --dataset_name layperson --split "layperson_test"  --gpu_id 2 --model_path Qwen2_path --generator_model Qwen2

# cgg
python cgg.py   --data_dir ../../datasets/layperson --dataset_name layperson --split "layperson_test" --index_path ../../datasets/corpus/bge_law_article.index --corpus_path ../../datasets/corpus/law_article_corpus.jsonl  --gpu_id 2 --model_path Qwen2_path --generator_model Qwen2

# arg-q
python arg.py --input_file closebook_ouput_file  --output_file /qwen_arg_q_lay.json

# arg-qa
# step1: retrieve using q+a
python closebook.py   --data_dir ../../datasets/layperson --dataset_name layperson --split "layperson_test_qa"  --gpu_id 2 --model_path Qwen2_path --generator_model Qwen2

# step2: arg
python arg.py --input_file closebook_qa_ouput_file  --output_file /qwen_arg_qa_lay.json

Check folder models for details.

Citation attachment

Place the result file in the specified location first, then conduct the citation attachment.

cd benchmark/shell
sh citation_attach.sh

Check folder models/citation_attach for details.

Evaluation

Place the result file in the specified location first, then get the evaluation.

cd benchmark/shell
sh evaluation.sh

Check folder evaluation for details.

Reference

The CitaLaw is built based on the following projects:

Environments

We conducted the experiments based on the following environments:

  • CUDA Version: 11.4
  • torch version: 2.2.0
  • OS: Ubuntu 18.04.5 LTS
  • GPU: NVIDIA Geforce RTX A6000
  • CPU: Intel(R) Xeon(R) Silver 4214 CPU @ 2.20GHz

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published
0