Official repository of the paper "Aligning Language Models with Observational Data: Opportunities and Risks from a Causal Perspective". The project explores methods to fine-tune large language models (LLMs) using observational data, tackling challenges like spurious correlations and confounding. We propose DECONFOUNDLM, a novel approach to mitigate these issues by correcting for known confounders in the reward signals.
You can find the project webpage at deconfoundlm.github.io.