UWC research team one step closer to revolutionary translator
A team of computer scientists from the University of the Western Cape (UWC), part of the Assistive Technologies Research Group (ATRG), is developing what it describes as a real-time intelligent translator for mobile phones.
This research focuses on recognising sign language in ordinary video by locating the signer in a video, tracking the signer’s hands, and identifying the smaller sub-units of sign language gestures and facial expressions - all using a normal inexpensive webcam.
The technology will revolutionise how South Africa’s hearing-impaired and hearing-abled communicate with one another.
The system will allow a hearing-impaired person to record sign language via a mobile phone’s camera, which will be instantly converted into English audio. Allowing for fully bi-directional communication, the user will be able to convert English audio through a 3D-modeled avatar that performs sign gestures.
The project will assist communication between the 700 000 to two million South Africans believed to be able to use South African Sign Language (SASL), and those unable to do so.
“Considerable research has been conducted in this field,” team leader Mehrdad Ghaziasgar has noted, “but none of the research has managed to produce full-fledged, or even partial translation systems.”
One difficulty for developers is the sheer complexity of sign language.
Sign language essentially comprises five ‘parameters’ – hand shape (often thought to be the only component), hand orientation, hand motion, hand location and facial expressions. A working system would require that a phone camera pick up the signer, track the signer’s hands, and then recognise hand gestures and facial expressions.
UWC alumnus, Kurt Jacobs, recently completed his master’s thesis within the project - his study, South African Sign Language Hand Shape and Orientation Recognition on Mobile Devices Using Deep Learning, focused on converting sign language into audio.
Kurt worked on the identification of hand shapes and hand orientation. Specifically, he sought to test systems that could recognise six hand shapes, each in five different orientations.
His work built on other research within the UWC group, employing machine-learning techniques that were designed to mimic learning in the human brain, as opposed to just following step-by-step instructions, such as deep learning and convolutional neural networks (CNN).
“The use of machine-learning techniques to recognise SASL gestures is necessary to convert gestures, using image processing, into sign writing, which will then be converted into spoken English,” explains Kurt.
CNN has been successfully tested in analysing visual imagery - something a sign-language translator would have to do, naturally.
Among his objectives, Kurt wanted to test how accurate CNNs were at classifying images of hand orientation and hand shape. He conducted testing on Apple iOS.
His results on techniques of machine learning such as CNN, dropouts and Generative Adversarial Networks are promising.
“I concluded that deep learning - and specifically CNNs - perform very well in recognising hand shapes across multiple hand orientations in mobile devices,” Kurt says.
Although he is now a software engineer with 24.com and won’t continue with the work anytime soon, Kurt’s findings will feed into future research on the SASL translator at the ATRG.