Spatiotemporal Convolutions and Video Vision Transformers for Signer-Independent Sign Language Recognition

Marais, Marc; Brown, Dane L; Connan, James; Boby, Alden

Title: Spatiotemporal Convolutions and Video Vision Transformers for Signer-Independent Sign Language Recognition
Creator: Marais, Marc, Brown, Dane L, Connan, James, Boby, Alden
Subject: To be catalogued
Date: 2023
Type: text
Type: article
Identifier: http://hdl.handle.net/10962/463478
Identifier: vital:76412
Identifier: xlink:href="https://ieeexplore.ieee.org/abstract/document/10220534"
Description: Sign language is a vital tool of communication for individuals who are deaf or hard of hearing. Sign language recognition (SLR) technology can assist in bridging the communication gap between deaf and hearing individuals. However, existing SLR systems are typically signer-dependent, requiring training data from the specific signer for accurate recognition. This presents a significant challenge for practical use, as collecting data from every possible signer is not feasible. This research focuses on developing a signer-independent isolated SLR system to address this challenge. The system implements two model variants on the signer-independent datasets: an R(2+ I)D spatiotemporal convolutional block and a Video Vision transformer. These models learn to extract features from raw sign language videos from the LSA64 dataset and classify signs without needing handcrafted features, explicit segmentation or pose estimation. Overall, the R(2+1)D model architecture significantly outperformed the ViViT architecture for signer-independent SLR on the LSA64 dataset. The R(2+1)D model achieved a near-perfect accuracy of 99.53% on the unseen test set, with the ViViT model yielding an accuracy of 72.19 %. Proving that spatiotemporal convolutions are effective at signer-independent SLR.
Format: computer, online resource, application/pdf, 1 online resource (5 pages), pdf
Publisher: IEEE Xplore
Language: English
Relation: 2023 International Conference on Artificial Intelligence, Big Data, Computing and Data Communication Systems (icABCD), Marais, M., Brown, D., Connan, J. and Boby, A., 2023, August. Spatiotemporal Convolutions and Video Vision Transformers for Signer-Independent Sign Language Recognition. In 2023 International Conference on Artificial Intelligence, Big Data, Computing and Data Communication Systems (icABCD) (pp. 1-6). IEEE, 2023 International Conference on Artificial Intelligence, Big Data, Computing and Data Communication Systems (icABCD) p. 1 2023
Rights: Publisher
Rights: Use of this resource is governed by the terms and conditions of the IEEE Xplore Terms of Use Statement (https://ieeexplore.ieee.org/Xplorehelp/overview-of-ieee-xplore/terms-of-use)
Rights: Closed Access

Hits: 66
Visitors: 68
Downloads: 2

Collections

Connan, James (Mr)

Brown, Dane A (Associate Prof)

RU Department of Computer Science

SDG 09. Industry, Innovation and Infrastructure

		Thumbnail	File	Description	Size	Format
View Details Download			SOURCE1	Spatiotemporal Convolutions and Video Vision Transformers for Signer-Independent Sign Language Recognition.pdf	657 KB	Adobe Acrobat PDF	View Details Download