Goodnight Moon, Hello Early Literacy Screening

Goal of the Competition

Literacy—the ability to read, write, and comprehend language—is a fundamental skill that underlies personal development, academic success, career opportunities, and active participation in society. Many children in the United States need more support with their language skills. In order to provide effective early literacy intervention, teachers must be able to reliabl 6FE9 y identify the students who need support. Currently, teachers across the U.S. are tasked with administering and scoring literacy screeners, which are written or verbal tests that are manually scored following detailed rubrics. Manual scoring methods not only take time, but they may be unreliable, producing different results depending on who is scoring the test and how thoroughly they were trained.

In this challenge, solvers implemented machine learning models to score audio recordings from literacy screener exercises completed by students in kindergarten through 3rd grade. The models can help teachers quickly and reliably identify children in need of early literacy intervention.

This competition also included a bonus for explainability write-ups that described methods to determine where in the audio stream error(s) occur and provided insight into the model decision-making rationale.

What's in this Repository

This repository contains code from winning competitors in the Goodnight Moon, Hello Early Literacy Screening DrivenData challenge. Code for all winning solutions are open source under the MIT License.

Winning code for other DrivenData competitions is available in the competition-winners repository.

Winning Submissions

Place	User	Public Score	Private Score	Summary of Model	Explainability Bonus
1	sheep	0.2011	0.2042	Fine-tuned OpenAI’s Whisper model with a custom loss function. For each word in `expected_text`, its token was inputted into the decoder and the binary cross-entropy (BCE) loss between the token’s logit and its corresponding score was computed.	1st Place
2	dylanliu	0.2088	0.2098	Combined various multimodal classification models, including Whisper-medium (only the encoder part) as the speech feature extraction model and bge-large as the text feature extraction model, as well as a text-to-speech model. Used a cv-based weight searching script to automatically search for weights.	2nd Place
3	vecxoz	0.2156	0.2137	Used 4 modalities of the data: original audio, audio generated from expected text (by SpeechT5), original expected text, and text transcribed from original audio (by Whisper finetuned on positive examples). Built a multimodal classifier which used the encoder from the finetuned Whisper model as audio feature extractor and Deberta-v3 as text feature extractor.

Additional solution details can be found in the reports folder inside the directory for each submission.

Winners Blog Post: Meet the Winners of the Goodnight Moon, Hello Literacy Screening Challenge

Benchmark Blog Post: Goodnight Moon, Hello Early Literacy Screening Benchmark

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
1st Place		1st Place
2nd Place		2nd Place
3rd Place		3rd Place
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Goodnight Moon, Hello Early Literacy Screening

Goal of the Competition

What's in this Repository

Winning Submissions

About

Uh oh!

Releases

Packages

Languages

License

drivendataorg/goodnight-moon

Folders and files

Latest commit

History

Repository files navigation

Goodnight Moon, Hello Early Literacy Screening

Goal of the Competition

What's in this Repository

Winning Submissions

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages