- Title
- Spatiotemporal Convolutions and Video Vision Transformers for Signer-Independent Sign Language Recognition
- Creator
- Marais, Marc, Brown, Dane L, Connan, James, Boby, Alden
- Subject
- To be catalogued
- Date
- 2023
- Type
- text
- Type
- article
- Identifier
- http://hdl.handle.net/10962/463478
- Identifier
- vital:76412
- Identifier
- xlink:href="https://ieeexplore.ieee.org/abstract/document/10220534"
- Description
- Sign language is a vital tool of communication for individuals who are deaf or hard of hearing. Sign language recognition (SLR) technology can assist in bridging the communication gap between deaf and hearing individuals. However, existing SLR systems are typically signer-dependent, requiring training data from the specific signer for accurate recognition. This presents a significant challenge for practical use, as collecting data from every possible signer is not feasible. This research focuses on developing a signer-independent isolated SLR system to address this challenge. The system implements two model variants on the signer-independent datasets: an R(2+ I)D spatiotemporal convolutional block and a Video Vision transformer. These models learn to extract features from raw sign language videos from the LSA64 dataset and classify signs without needing handcrafted features, explicit segmentation or pose estimation. Overall, the R(2+1)D model architecture significantly outperformed the ViViT architecture for signer-independent SLR on the LSA64 dataset. The R(2+1)D model achieved a near-perfect accuracy of 99.53% on the unseen test set, with the ViViT model yielding an accuracy of 72.19 %. Proving that spatiotemporal convolutions are effective at signer-independent SLR.
- Format
- computer, online resource, application/pdf, 1 online resource (5 pages), pdf
- Publisher
- IEEE Xplore
- Language
- English
- Relation
- 2023 International Conference on Artificial Intelligence, Big Data, Computing and Data Communication Systems (icABCD), Marais, M., Brown, D., Connan, J. and Boby, A., 2023, August. Spatiotemporal Convolutions and Video Vision Transformers for Signer-Independent Sign Language Recognition. In 2023 International Conference on Artificial Intelligence, Big Data, Computing and Data Communication Systems (icABCD) (pp. 1-6). IEEE, 2023 International Conference on Artificial Intelligence, Big Data, Computing and Data Communication Systems (icABCD) p. 1 2023
- Rights
- Publisher
- Rights
- Use of this resource is governed by the terms and conditions of the IEEE Xplore Terms of Use Statement (https://ieeexplore.ieee.org/Xplorehelp/overview-of-ieee-xplore/terms-of-use)
- Rights
- Closed Access
- Hits: 31
- Visitors: 32
- Downloads: 1
Thumbnail | File | Description | Size | Format | |||
---|---|---|---|---|---|---|---|
View Details | SOURCE1 | Spatiotemporal Convolutions and Video Vision Transformers for Signer-Independent Sign Language Recognition.pdf | 657 KB | Adobe Acrobat PDF | View Details |