Back to projects

Dataset engineering · Public ML challenge

Pediatric Speech Dataset

A curated child speech dataset and preprocessing pipeline for automated early literacy assessment.

Challenge
Public ML competition
Submissions
315
Top score
0.97 AUROC

Project Snapshot

This project centered on converting anonymized pediatric speech recordings into a usable machine learning challenge dataset. The work involved dataset curation, preprocessing, feature extraction, validation, and packaging data so outside teams could build models for automated speech recognition and early literacy assessment.