Dataset engineering · Public ML challenge
Pediatric Speech Dataset
A curated child speech dataset and preprocessing pipeline for automated early literacy assessment.
- Challenge
- Public ML competition
- Submissions
- 315
- Top score
- 0.97 AUROC
Project Snapshot
This project centered on converting anonymized pediatric speech recordings into a usable machine learning challenge dataset. The work involved dataset curation, preprocessing, feature extraction, validation, and packaging data so outside teams could build models for automated speech recognition and early literacy assessment.