sctpublic

sctpublic is an open-source project that provides a scalable framework to evaluate clinical reasoning in large language models (LLMs) using Script Concordance Tests (SCTs). In this project, we compare the performance of various state-of-the-art LLMs (including GPT-4o, o1-preview, Claude 3.5 Sonnet, and Gemini-1.5-Pro) against clinician benchmarks on a diverse set of SCT questions.

Overview

Script Concordance Testing is a validated medical assessment tool designed to evaluate clinical reasoning under uncertainty. Unlike traditional multiple-choice questions, SCTs measure how new information alters diagnostic and treatment hypotheses—a critical aspect of real-world clinical decision-making.

Key highlights of this project:

Benchmark Composition: 750 SCT questions drawn from diverse international datasets.
Model Evaluation: Analysis of LLM performance (zero-shot and few-shot, with or without reasoning).
Human Comparison: Comparisons against performance metrics of medical students, residents, and attending physicians.

SCT Question Access & Data Clarification

This public repository distributes SCT questions exclusively from the Open Medical SCT and Adelaide SCT datasets. These questions are openly available for use and distribution.

Access to the full set of SCT questions, including additional proprietary or sensitive datasets, is not provided here. Please refer to the competition guidelines on our project paper for further instructions on how submit models for testing against this final set.

Installation

Clone the repository:

git clone https://github.com/yourusername/sctpublic.git
cd sctpublic

Set up a virtual environment (optional but recommended):

python -m venv venv
source venv/bin/activate       # On Windows use: venv\Scripts\activate

Install the required packages:
```
pip install -r requirements.txt
```
If you are using a notebook environment (e.g., Colab), install additional packages as mentioned at the top of the notebooks.
Environment Variables:

Create a .env file in the project root with the following keys:
- OPENAI_API_KEY
- ANTHROPIC_API_KEY
- GOOGLE_APPLICATION_CREDENTIALS (should point to your JSON credentials file for Google Vertex AI)

Usage

The project is structured as a combination of Python scripts and Jupyter notebooks:

Data Processing and Prompt Generation:
Check out modeling.ipynb or run modeling.py to load SCT data, generate prompt templates, and process prompts for each question.
Model Evaluation:
Use the notebooks (e.g., modeling.ipynb and finalizer.ipynb) to send prompts to your LLM endpoints and record responses.
Analysis:
dataanalysis.ipynb provides tools to compute statistics, compare model performances, and generate visualizations.

You can also run evaluation scripts from the command line if desired.

Contributing

Contributions are welcome! Feel free to open issues or submit pull requests. When contributing, please follow the coding conventions and ensure your changes are covered by tests where applicable.

License

This project is licensed under the MIT License. See the LICENSE file for more details.

Acknowledgements

We extend our gratitude to the research teams and medical experts who have contributed their expertise and data. Special thanks to all the authors and collaborators whose support has enabled this work.

Name		Name	Last commit message	Last commit date
Latest commit History 22 Commits
data		data
site		site
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
dataanalysis.ipynb		dataanalysis.ipynb
env.yml		env.yml
modeling.ipynb		modeling.ipynb
modeling.py		modeling.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

sctpublic

Overview

SCT Question Access & Data Clarification

Installation

Usage

Contributing

License

Acknowledgements

About

Uh oh!

Releases

Packages

Uh oh!

Contributors 2

Uh oh!

Languages

License

SCT-Bench/sctpublic

Folders and files

Latest commit

History

Repository files navigation

sctpublic

Overview

SCT Question Access & Data Clarification

Installation

Usage

Contributing

License

Acknowledgements

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors 2

Uh oh!

Languages

Packages