8000 GitHub - SCT-Bench/sctpublic
[go: up one dir, main page]
More Web Proxy on the site http://driver.im/
Skip to content

SCT-Bench/sctpublic

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

22 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

sctpublic

sctpublic is an open-source project that provides a scalable framework to evaluate clinical reasoning in large language models (LLMs) using Script Concordance Tests (SCTs). In this project, we compare the performance of various state-of-the-art LLMs (including GPT-4o, o1-preview, Claude 3.5 Sonnet, and Gemini-1.5-Pro) against clinician benchmarks on a diverse set of SCT questions.

Overview

Script Concordance Testing is a validated medical assessment tool designed to evaluate clinical reasoning under uncertainty. Unlike traditional multiple-choice questions, SCTs measure how new information alters diagnostic and treatment hypotheses—a critical aspect of real-world clinical decision-making.

Key highlights of this project:

  • Benchmark Composition: 750 SCT questions drawn from diverse international datasets.
  • Model Evaluation: Analysis of LLM performance (zero-shot and few-shot, with or without reasoning).
  • Human Comparison: Comparisons against performance metrics of medical students, residents, and attending physicians.

SCT Question Access & Data Clarification

This public repository distributes SCT questions exclusively from the Open Medical SCT and Adelaide SCT datasets. These questions are openly available for use and distribution.

Access to the full set of SCT questions, including additional proprietary or sensitive datasets, is not provided here. Please refer to the competition guidelines on our project paper for further instructions on how submit models for testing against this final set.

Installation

  1. Clone the repository:

    git clone https://github.com/yourusername/sctpublic.git
    cd sctpublic
  2. Set up a virtual environment (optional but recommended):

    python -m venv venv
    source venv/bin/activate       # On Windows use: venv\Scripts\activate
  3. Install the required packages:

    pip install -r requirements.txt

    If you are using a notebook environment (e.g., Colab), install additional packages as mentioned at the top of the notebooks.

  4. Environment Variables:

    Create a .env file in the project root with the following keys:

    • OPENAI_API_KEY
    • ANTHROPIC_API_KEY
    • GOOGLE_APPLICATION_CREDENTIALS (should point to your JSON credentials file for Google Vertex AI)

Usage

The project is structured as a combination of Python scripts and Jupyter notebooks:

  • Data Processing and Prompt Generation:
    Check out modeling.ipynb or run modeling.py to load SCT data, generate prompt templates, and process prompts for each question.

  • Model Evaluation:
    Use the notebooks (e.g., modeling.ipynb and finalizer.ipynb) to send prompts to your LLM endpoints and record responses.

  • Analysis:
    dataanalysis.ipynb provides tools to compute statistics, compare model performances, and generate visualizations.

You can also run evaluation scripts from the command line if desired.

Contributing

Contributions are welcome! Feel free to open issues or submit pull requests. When contributing, please follow the coding conventions and ensure your changes are covered by tests where applicable.

License

This project is licensed under the MIT License. See the LICENSE file for more details.

Acknowledgements

We extend our gratitude to the research teams and medical experts who have contributed their expertise and data. Special thanks to all the authors and collaborators whose support has enabled this work.

Releases

No releases published

Packages

No packages published

Contributors 2

  •  
  •  
0