Many environmental data scientists primarily engage with large language models (LLMs) through simple prompting and chat interactions with popular artificial intelligence (AI) providers like OpenAI ChatGPT and Microsoft Copilot. However, there are additional untapped applications for LLMs in environmental data science that can enhance capacity and improve data pipelines.
The unfamiliarity with LLMs within environmental data science limits scientists' ability to leverage these tools for coding and data analysis. This project seeks to bridge that knowledge gap and open the black box of LLM application in environmental science by providing resources and real-world use cases of LLMs in scientific workflows... and show that LLMs can be more than a chatbot 🤖
We know that every person has a different level of experience with AI, data analysis, and coding, so we have targeted resources for three broad groups based on your experience. Use this definition to determine your level:
- Level 1: An expert in your field, but little or no experience using AI tools and writing code
- Level 2: You have dabbled in coding and feel comfortable in at least one language. You may have also tested the web-interface of some large language models, and want to learn more. You know that IDE stands for integrated development environment for coding and you want to know how to enhance your IDE with AI
- Level 3: You use coding to enhance your workflows already, and you want to learn how to incorporate AI models into your code.
This webpage was brought to you by the "More Than a Chatbot" breakout group formed during the 3rd Annual Environmental Data Science (EDS) Summit hosted by the National Center for Ecological Analysis and Synthesis (NCEAS) in Santa Barbara, CA from February 4-6, 2025.
The Summit's theme was The Future of AI in Conservation & Management.
Team Members during the EDS Summit:
- Andrew Huang, Anaconda, a.holoviz.dev@gmail.com
- Kirk Klausmeyer, The Nature Conservancy, kklausmeyer@tnc.org
- Amy Kendig, Minnesota Department of Natural Resources, amy.kendig@state.mn.us
- Kelly van Woesik, North Carolina State University, kjvanwoe@ncsu.edu
- Wenxin Yang, University of California Santa Barbara, wenxinyang@ucsb.edu
- Anish Dulal, University of Oregon, anishd@uoregon.edu
- Raissa Mendonca, Kent State University, rmarques@kent.edu
- Trishala Thakur, University of Colorado Boulder, trishala.thakur@colorado.edu
- Aji John, University of Washington, ajijohn@uw.edu
- Kelly Easterday, The Nature Conservancy, kelly.easterday@tnc.org
- Glenn Moncrieff, The Nature Conservancy, glenn.moncrieff@tnc.org
We welcome and appreciate community contributions to this documentation! Whether you're discovering novel prompting techniques, finding interesting edge cases, or developing new tools around LLMs, your insights can help others.
If you encounter a bug, have a feature request, or would like to share feedback, please open an issue on our GitHub issues page. Your input helps us understand how we can improve and adapt to the rapidly evolving LLM landscape.
Please note that this is very much a work in progress. As we explore new capabilities and integrate emerging technologies, there may be occasional rough edges. Your patience and constructive feedback are invaluable as we work to refine and enhance this documentation.