TSD 2022 Tutorial

Speech recognition on the edge
Prof. Daniel Hromada; Hyungjoong Kim
DigiEduBerlin team :: University of the Arts / Einstein Center Digital Future

key words: Speech-to-text; DeepSpeech; RaspberryPi; NVIDIA Jetson; Python; Linux; Websockets

During this tutorial, participants will be introduced to diverse ways how speech-to-text (STT) inferences can be realized on non-cloud, local (i.e. edge-computing) architectures. Participants will acquire knowledge and competence concerning intricacies and nuances of execution of two different types of ASR systems (DeepSpeech and Random Forests) on three different hardware architectures (e.g. RaspberryPiZero (armv6); RaspberryPi 4 (armv7 without CUDA) and NVIDIA Jetson Xavier (armv8 / aarch64 with CUDA). Thus, in 90 minutes of a hands-on tutorial participants will experience the transformation of all three hardware platforms into a low-cost local STT inference engine.

TSD 2021 | TSD 2020 | TSD 2019