Cultural LLM Research Resources

This is the list of tutorials, workshops, papers, and resources on computational linguistic approaches to cultural research. The list will be updated over the time. You are welcome to send a pull request for updating the list and be one of the contributors!

📌 I plan to collect theses and books on Cultural LLM research and list them here. If you have one, don't hesitate to contact Me (Farid) or send a pull request!

🚀 Highlights

If you are new on Cultural LLM research or looking for a new research direction, we have written a comprehensive survey paper on Cultural LLM: Towards Measuring and Modeling "Culture" in LLMs: A Survey [Paper]. Feel free to read and let us know if you have any suggestions! Thanks to Sagnik Mukherjee, Pradhyumna Lavania, Siddhant Singh, Ashutosh Dwivedi, Alham Fikri Aji, Jacki O'Neill, Ashutosh Modi, and Monojit Choudhury to make this possible 😊

⭐ Citation

If you find this paper and repo helpful for your research, please cite it below:

@misc{adilazuarda2024measuring,
      title={Towards Measuring and Modeling "Culture" in LLMs: A Survey}, 
      author={Muhammad Farid Adilazuarda and Sagnik Mukherjee and Pradhyumna Lavania and Siddhant Singh and Ashutosh Dwivedi and Alham Fikri Aji and Jacki O'Neill and Ashutosh Modi and Monojit Choudhury},
      year={2024},
      eprint={2403.15412},
      archivePrefix={arXiv},
      primaryClass={cs.CY}
}

Content

1. 🏫 Workshops
2. 🌳 Culture Taxonomy
- Taxonomy Based on the Definition of Culture
- Taxonomy Based on the Methods Used
  - A. Black-Box Approaches
  - B. White-Box Approaches
3. 📚 Books

🏫 Workshops

This is the list of the Cultural LLM workshop series:

First Workshop on Cross-Cultural Considerations in NLP (C3NLP), EACL 2023 [Website]
Second Workshop on Cross-Cultural Considerations in NLP, ACL 2024 [Website]
Cultures in AI/AI in Culture, NeurIPS 2022 [Website]

🌳 Culture Taxonomy in LLM Studies

As defined in Adilazuarda, et al. (2024), we structure this curated list into three distinct taxonomies under the "Culture" umbrella. Note that the taxonomies are based on existing work, is not exhaustive and should not be taken as a roadmap.

Furthermore, “Linguistic culture interaction” and the the proxies defined here are not mutually exclusive. They are different categorizations of how the community approaches studies of culture. For example, Common Ground ( i.e. what information is shared between people amongst a certain culture ) can be studied through the lens of regionality (a demographic proxy) or across religions (another demographic proxy).

1. Taxonomy Based on the Definition of Culture

A. Semantic Proxies

Emotions and Values

Hershcovich, et al. (2022) Challenges and Strategies in Cross-Cultural NLP. ACL [Paper]
Kovac, et al. (2023) Large Language Models as Superpositions of Cultural Perspectives. ArXiv [Paper]
Koto, et al. (2023) Large language models only pass primary school exams in Indonesia: A comprehensive test on IndoMMLU. EMNLP [Paper]
Wibowo, et al. (2023) COPAL-ID: Indonesian Language Reasoning with Local Culture and Nuances. ArXiv [Paper]
Cao, et al. (2023) Assessing Cross-Cultural Alignment between ChatGPT and Human Societies: An Empirical Study. Proceedings of the First Workshop on Cross-Cultural Considerations in NLP (C3NLP)[Paper]
Johnson, et al. (2022) The Ghost in the Machine has an American accent: value conflict in GPT-3. ArXiv [Paper]
Wan, et al. (2023) Are personalized stochastic parrots more dangerous? evaluating persona biases in dialogue systems. EMNLP [Paper]
Tanmay, et al. (2023) Probing the Moral Development of Large Language Models through Defining Issues Test. ArXiv [Paper]
Zhang, et al. (2023) The Skipped Beat: A Study of Sociopragmatic Understanding in LLMs for 64 Languages. EMNLP [Paper]
Tanmay, et al. (2023) The Skipped Beat: A Study of Sociopragmatic Understanding in LLMs for 64 Languages. EMNLP [Paper]
Shaikh, et al. (2023) Modeling Cross-Cultural Pragmatic Inference with Codenames Duet. ACL Findings [Paper]
Jiang, et al. (2022) Can Machines Learn Morality? The Delphi Experiment. ArXiv [Paper]
Talat, et al. (2022) A Word on Machine Ethics: A Response to Jiang et al (2021). ArXiv [Paper]
Huang and Yang (2023) Culturally Aware Natural Language Inference. EMNLP Findings [Paper]
Naous, et al (2023) Having Beer after Prayer? Measuring Cultural Bias in Large Language Models. ArXiv [Paper]
Wu, et al (2023) Cross-Cultural Analysis of Human Values, Morals, and Biases in Folk Tales. EMNLP Main [Paper]
Fung, et al (2023) NORMSAGE: Multi-Lingual Multi-Cultural Norm Discovery from Conversations On-the-Fly. EMNLP Main [Paper]
Mukherjee, et al (2023) Global Voices, Local Biases: Socio-Cultural Prejudices across Languages. EMNLP Main [Paper]
Santy, et al (2023) NLPositionality: Characterizing Design Biases of Datasets and Models. ACL Main [Paper]

Food and Drink

Palta and Rudinger (2023) FORK: A Bite-Sized Test Set for Probing Culinary Cultural Biases in Commonsense Reasoning Models. ACL Findings [Paper]
Cao, et al (2024) Cultural Adaptation of Recipes. TACL [Paper]

Social and Political Relations

Wang, et al (2024) SeaEval for Multilingual Foundation Models: From Cross-Lingual Alignment to Cultural Reasoning. ArXiv [Paper]
Johnson, et al. (2022) The Ghost in the Machine has an American accent: value conflict in GPT-3. ArXiv [Paper]
Durmus, et al. (2023) Towards Measuring the Representation of Subjective Global Opinions in Language Models. ArXiv [Paper]
Shaikh, et al. (2023) Modeling Cross-Cultural Pragmatic Inference with Codenames Duet. ACL Findings [Paper]
Bauer, et al. (2023) Social Commonsense for Explanation and Cultural Bias Discovery. EACL Main [Paper]
Feng, et al. (2023) From Pretraining Data to Language Models to Downstream Tasks: Tracking the Trails of Political Biases Leading to Unfair NLP Models. ACL [Paper]

Basic Actions and Technology

Durmus, et al. (2023) Towards Measuring the Representation of Subjective Global Opinions in Language Models. ArXiv [Paper]
Bauer, et al. (2023) Social Commonsense for Explanation and Cultural Bias Discovery. EACL Main [Paper]

Names

Wu, et al (2023) Cross-Cultural Analysis of Human Values, Morals, and Biases in Folk Tales. EMNLP Main [Paper]
Quan, et al (2020) RiSAWOZ: A Large-Scale Multi-Domain Wizard-of-Oz Dataset with Rich Semantic Annotations for Task-Oriented Dialogue Modeling. EMNLP Main [Paper]

B. Demographic Proxies

Ethnicity

Koto, et al. (2023) Large language models only pass primary school exams in Indonesia: A comprehensive test on IndoMMLU. EMNLP [Paper]
Johnson, et al. (2022) The Ghost in the Machine has an American accent: value conflict in GPT-3. ArXiv [Paper]
Wan, et al. (2023) Are personalized stochastic parrots more dangerous? evaluating persona biases in dialogue systems. EMNLP [Paper]
Durmus, et al. (2023) Towards Measuring the Representation of Subjective Global Opinions in Language Models. ArXiv [Paper]
Shaikh, et al. (2023) Modeling Cross-Cultural Pragmatic Inference with Codenames Duet. ACL Findings [Paper]
Bauer, et al. (2023) Social Commonsense for Explanation and Cultural Bias Discovery. EACL Main [Paper]
Feng, et al. (2023) From Pretraining Data to Language Models to Downstream Tasks: Tracking the Trails of Political Biases Leading to Unfair NLP Models. ACL [Paper]

Education

Koto, et al. (2023) Large language models only pass primary school exams in Indonesia: A comprehensive test on IndoMMLU. EMNLP [Paper]
Quan, et al (2020) RiSAWOZ: A Large-Scale Multi-Domain Wizard-of-Oz Dataset with Rich Semantic Annotations for Task-Oriented Dialogue Modeling. EMNLP Main [Paper]
Shaikh, et al. (2023) Modeling Cross-Cultural Pragmatic Inference with Codenames Duet. ACL Findings [Paper]
Bauer, et al. (2023) Social Commonsense for Explanation and Cultural Bias Discovery. EACL Main [Paper]
Wu, et al (2023) Cross-Cultural Analysis of Human Values, Morals, and Biases in Folk Tales. EMNLP Main [Paper]
Santy, et al (2023) NLPositionality: Characterizing Design Biases of Datasets and Models. ACL Main [Paper]

Religion

Koto, et al. (2023) Large language models only pass primary school exams in Indonesia: A comprehensive test on IndoMMLU. EMNLP [Paper]
Durmus, et al. (2023) Towards Measuring the Representation of Subjective Global Opinions in Language Models. ArXiv [Paper]
Bauer, et al. (2023) Social Commonsense for Explanation and Cultural Bias Discovery. EACL Main [Paper]
Das, et al. (2023) Toward Cultural Bias Evaluation Datasets: The Case of Bengali Gender, Religious, and National Identity . Proceedings of the First Workshop on Cross-Cultural Considerations in NLP (C3NLP) [Paper]

Race

Johnson, et al. (2022) The Ghost in the Machine has an American accent: value conflict in GPT-3. ArXiv [Paper]
Durmus, et al. (2023) Towards Measuring the Representation of Subjective Global Opinions in Language Models. ArXiv [Paper]

Gender

Cao, et al. (2023) Assessing cross-cultural alignment between ChatGPT and human societies: An empirical study. ACL [Paper]
Johnson, et al. (2022) The Ghost in the Machine has an American accent: value conflict in GPT-3. ArXiv [Paper]
Wan, et al. (2023) Are personalized stochastic parrots more dangerous? evaluating persona biases in dialogue systems. EMNLP [Paper]
An, et al. (2023) SODAPOP: Open-Ended Discovery of Social Biases in Social Commonsense Reasoning Models. EMNLP [Paper]
Wu, et al (2023) Cross-Cultural Analysis of Human Values, Morals, and Biases in Folk Tales. EMNLP Main [Paper]

Language

Hershcovich, et al. (2022) Challenges and Strategies in Cross-Cultural NLP. ACL [Paper]
Koto, et al. (2023) Large language models only pass primary school exams in Indonesia: A comprehensive test on IndoMMLU. EMNLP [Paper]
Kovac, et al. (2023) Large Language Models as Superpositions of Cultural Perspectives. ArXiv [Paper]
Wibowo, et al. (2023) COPAL-ID: Indonesian Language Reasoning with Local Culture and Nuances. ArXiv [Paper]
Cao, et al. (2023) Assessing cross-cultural alignment between ChatGPT and human societies: An empirical study. ACL [Paper]
Johnson, et al. (2022) The Ghost in the Machine has an American accent: value conflict in GPT-3. ArXiv [Paper]
Tanmay, et al. (2023) Probing the Moral Development of Large Language Models through Defining Issues Test. ArXiv [Paper]
Huang and Yang (2023) Culturally Aware Natural Language Inference. EMNLP Findings [Paper]
Zhang, et al. (2023) The Skipped Beat: A Study of Sociopragmatic Understanding in LLMs for 64 Languages. EMNLP [Paper]
Kabra, et al. (2023) Multi-lingual and Multi-cultural Figurative Language Understanding. ACL Findings [Paper]
Naous, et al (2023) Having Beer after Prayer? Measuring Cultural Bias in Large Language Models. ArXiv [Paper]
Shaikh, et al. (2023) Modeling Cross-Cultural Pragmatic Inference with Codenames Duet. ACL Findings [Paper]
Zhou, et al. (2023) Cultural Compass: Predicting Transfer Learning Success in Offensive Language Detection with Cultural Features . EMNLP Findings [Paper]
Zhou, et al. (2023) Cultural Compass: Predicting Transfer Learning Success in Offensive Language Detection with Cultural Features . EMNLP Findings [Paper]
Mukherjee, et al (2023) Global Voices, Local Biases: Socio-Cultural Prejudices across Languages. EMNLP Main [Paper]
CH-Wang, et al (2023) Sociocultural Norm Similarities and Differences via Situational Alignment and Explainable Textual Entailment. EMNLP Main [Paper]
Dev, et al (2023) Building Socio-culturally Inclusive Stereotype Resources with Community Engagement. ArXiv [Paper]
Dev, et al (2023) Building Socio-culturally Inclusive Stereotype Resources with Community Engagement. ArXiv [Paper]
Khanuja, et al (2023) Evaluating the Diversity, Equity, and Inclusion of NLP Technology: A Case Study for Indian Languages. ArXiv [Paper]
Santy, et al (2023) NLPositionality: Characterizing Design Biases of Datasets and Models. ACL Main [Paper]
Das, et al. (2023) Toward Cultural Bias Evaluation Datasets: The Case of Bengali Gender, Religious, and National Identity . Proceedings of the First Workshop on Cross-Cultural Considerations in NLP (C3NLP) [Paper]

Region

Seth, et al. (2024) DOSA: A Dataset of Social Artifacts from Different Indian Geographical Subcultures. LREC-COLING [Paper]
Hershcovich, et al. (2022) Challenges and Strategies in Cross-Cultural NLP. ACL [Paper]
Koto, et al. (2023) Large language models only pass primary school exams in Indonesia: A comprehensive test on IndoMMLU. EMNLP [Paper]
Wibowo, et al. (2023) COPAL-ID: Indonesian Language Reasoning with Local Culture and Nuances. ArXiv [Paper]
Wang, et al (2024) SeaEval for Multilingual Foundation Models: From Cross-Lingual Alignment to Cultural Reasoning. ArXiv [Paper]
Johnson, et al. (2022) The Ghost in the Machine has an American accent: value conflict in GPT-3. ArXiv [Paper]
Wan, et al. (2023) Are personalized stochastic parrots more dangerous? evaluating persona biases in dialogue systems. EMNLP [Paper]
An, et al. (2023) SODAPOP: Open-Ended Discovery of Social Biases in Social Commonsense Reasoning Models. EMNLP [Paper]
Zhang, et al. (2023) The Skipped Beat: A Study of Sociopragmatic Understanding in LLMs for 64 Languages. EMNLP [Paper]
Durmus, et al. (2023) Towards Measuring the Representation of Subjective Global Opinions in Language Models. ArXiv [Paper]
Jha, et al. (2023) . SeeGULL: A stereotype benchmark with broad geo-cultural coverage leveraging generative models. ACL [Paper]
Ramezani and Xu (2023) . Knowledge of cultural moral norms in large language models. ACL [Paper]
Kabra, et al. (2023) Multi-lingual and Multi-cultural Figurative Language Understanding. ACL Findings [Paper]
Zhou, et al. (2023) Cultural Compass: Predicting Transfer Learning Success in Offensive Language Detection with Cultural Features . EMNLP Findings [Paper]
Mukherjee, et al (2023) Global Voices, Local Biases: Socio-Cultural Prejudices across Languages. EMNLP Main [Paper]
CH-Wang, et al (2023) Sociocultural Norm Similarities and Differences via Situational Alignment and Explainable Textual Entailment. EMNLP Main [Paper]
Dev, et al (2023) Building Socio-culturally Inclusive Stereotype Resources with Community Engagement. ArXiv [Paper]
Khanuja, et al (2023) Evaluating the Diversity, Equity, and Inclusion of NLP Technology: A Case Study for Indian Languages. ArXiv [Paper]
Santy, et al (2023) NLPositionality: Characterizing Design Biases of Datasets and Models. ACL Main [Paper]
Cao, et al. (2023) Assessing cross-cultural alignment between ChatGPT and human societies: An empirical study. ACL [Paper]
Dwivedi, et al. (2023) EtiCor: Corpus for Analyzing LLMs for Etiquettes. EMNLP Main [Paper]

C. Linguistic-Culture Interaction

An approach to categorize the study of culture in language technology, as proposed by Hershcovich et al. (2022), defines the interactions between language and culture through three key dimensions:

Aboutness: Focuses on culturally relevant topics, showing how interests and interpretations vary across cultures. This impacts sentiment analysis and domain relevance in NLP, highlighting how cultural contexts shape communication content.
Common Ground: Refers to shared knowledge and assumptions within a culture, affecting language conceptualization and semantic categories (like kinship and spatial relations). It includes the understanding of traditions, social norms, and expected behaviors.
Objectives and Values: Concerns the goals and ethical principles guiding behavior and communication. This dimension underscores the need for NLP technologies to align with cultural ethics and values, addressing fairness and bias minimization.

Aboutness

No studies has been found for this taxon.

Common Ground

Koto, et al. (2023) Large language models only pass primary school exams in Indonesia: A comprehensive test on IndoMMLU. EMNLP [Paper]
Wibowo, et al. (2023) COPAL-ID: Indonesian Language Reasoning with Local Culture and Nuances. ArXiv [Paper]
Wang, et al (2024) SeaEval for Multilingual Foundation Models: From Cross-Lingual Alignment to Cultural Reasoning. ArXiv [Paper]
Johnson, et al. (2022) The Ghost in the Machine has an American accent: value conflict in GPT-3. ArXiv [Paper]
Quan, et al (2020) RiSAWOZ: A Large-Scale Multi-Domain Wizard-of-Oz Dataset with Rich Semantic Annotations for Task-Oriented Dialogue Modeling. EMNLP Main [Paper]
Zhang, et al. (2023) The Skipped Beat: A Study of Sociopragmatic Understanding in LLMs for 64 Languages. EMNLP [Paper]
Bauer, et al. (2023) Social Commonsense for Explanation and Cultural Bias Discovery. EACL Main [Paper]
Naous, et al (2023) Having Beer after Prayer? Measuring Cultural Bias in Large Language Models. ArXiv [Paper]
Wu, et al (2023) Cross-Cultural Analysis of Human Values, Morals, and Biases in Folk Tales. EMNLP Main [Paper]
Nguyen, et al (2023) Extracting Cultural Commonsense Knowledge at Scale. ACM Web Conference [Paper]
Huang and Yang (2023) Culturally Aware Natural Language Inference. EMNLP Findings [Paper]

Objectives and Values

Kovac, et al. (2023) Large Language Models as Superpositions of Cultural Perspectives. ArXiv [Paper]
Cao, et al. (2023) Assessing cross-cultural alignment between ChatGPT and human societies: An empirical study. ACL [Paper]
Johnson, et al. (2022) The Ghost in the Machine has an American accent: value conflict in GPT-3. ArXiv [Paper]
Dev, et al (2023) Building Socio-culturally Inclusive Stereotype Resources with Community Engagement. ArXiv [Paper]
Gupta, et al (2024) Self-Assessment Tests are Unreliable Measures of LLM Personality. ArXiv [Paper]
Wan, et al. (2023) Are personalized stochastic parrots more dangerous? evaluating persona biases in dialogue systems. EMNLP [Paper]
Tanmay, et al. (2023) Probing the Moral Development of Large Language Models through Defining Issues Test. ArXiv [Paper]
An, et al. (2023) SODAPOP: Open-Ended Discovery of Social Biases in Social Commonsense Reasoning Models. EMNLP [Paper]
Sorensen, et al. (2023) Value Kaleidoscope: Engaging AI with Pluralistic Human Values, Rights, and Duties. ArXiv [Paper]
Durmus, et al. (2023) Towards Measuring the Representation of Subjective Global Opinions in Language Models. ArXiv [Paper]

2. Taxonomy Based on the Methods Used

A. Black-box Approaches

Generative Probing

Nadeem, et al. (2021) StereoSet: Measuring stereotypical bias in pretrained language models. ACL [Paper]
Nangia, et al. (2020) CrowS-pairs: A challenge dataset for measuring social biases in masked language models. EMNLP [Paper]
Wan, et al. (2023) Are personalized stochastic parrots more dangerous? evaluating persona biases in dialogue systems. EMNLP [Paper]
Jha, et al. (2023) . SeeGULL: A stereotype benchmark with broad geo-cultural coverage leveraging generative models. ACL [Paper]

Discriminative Probing

Cao, et al. (2023) Assessing cross-cultural alignment between ChatGPT and human societies: An empirical study. ACL [Paper]
Tanmay, et al. (2023) Probing the Moral Development of Large Language Models through Defining Issues Test. ArXiv [Paper]
Rao, et al. (2023) Ethical reasoning over moral alignment: A case and framework for in-context ethical policies in LLMs. EMNLP [Paper]
Kovac, et al. (2023) Large Language Models as Superpositions of Cultural Perspectives. ArXiv [Paper]

B. White-box Approaches

Mechanistic Interpretability

Wichers, et al. (2024) Gradient-Based Language Model Red Teaming. ArXiv [Paper]

📚 Books

Inglehart, R & C. Welzel. 2005. Modernization, Cultural Change and Democracy: The Human Development Sequence. New York: Cambridge University Press
David Edmonds (2016). Philosophers Take on the World. Oxford University Press UK.

Name		Name	Last commit message	Last commit date
Latest commit History 7 Commits
assets		assets
LICENSE		LICENSE
README.md		README.md

License

faridlazuarda/cultural-llm-papers

Folders and files

Latest commit

History

Repository files navigation