How2Sign Dataset
Continuous American Sign Language Multiview videos Depth data 2D & 3D skeletons Gloss annotation English translation

First large-scale multimodal and multiview continuous American Sign Language dataset

Download Sample "      " Download Full Dataset

ABOUT

We introduce How2Sign, a multimodal and multiview continuous American Sign Language (ASL) dataset, consisting of a parallel corpus of more than 80 hours of sign language videos and a set of corresponding modalities including speech, English transcripts, and depth.
A three-hour subset was further recorded in the Panoptic studio enabling detailed 3D pose estimation.
This dataset is publicly available for research purposes only.

Download

This section is under construction.
We will be releasing the other modalities soon!
The dataset is publicly available for research purposes only.

Download the videos, annotations and metadata separately


You can also use our script to automatic create the folder structure and download the necessary modalities

Green Screen RGB videos (frontal view)
Green Screen RGB videos (side view)
Green Screen RGB clips* (frontal view)
Green Screen RGB clips* (side view)
B-F-H 2D Keypoints clips* (frontal view)
English Translation (manually re-aligned)

Copyright

The dataset on this webpage is copyright by us and published under the Creative Commons Attribution-NonCommercial 4.0 International License. This means that you must give appropriate credit, provide a link to the license, and indicate if changes were made. You may do so in any reasonable manner, but not in any way that suggests the licensor endorses you or your use. You may NOT use the material for commercial purposes.

Disclaimer

The How2Sign dataset was collected as a tool for research, however, it is worth noting that the dataset may have unintended biases (including those of a societal, gender, or racial nature). For more information about the bias that the dataset might present, please refer to the published paper.

Green Screen RGB clips*

The Green Screen RGB clips were segmented using the original timestamps from the How2 dataset. Each clip corresponds to one sentence of the English translation. Note that this may not have a perfect alignment between the ASL video and the English translation due to the differences between both languages.
The manual re-aligned clips can be obtained by segmenting the *Green Screen RGB videos* with the timestamps available in the *English Translation (manually re-aligned)* file.

How2 Data*

For copyright reasons, we are not allowed to redistribute the How2 data. Please refer to the original repository to request and download the entire How2 dataset if needed.
Note that the How2Sign follows the original [train/validation/test] splits of the How2 dataset.

Reporting Issues

If you have any problems with the data, please let us know by creating a " New issue" on the Github repo.

Publication

How2Sign: A Large-scale Multimodal Dataset for Continuous American Sign Language

Amanda Duarte, Shruti Palaskar, Lucas Ventura, Deepti Ghadiyaram, Kenneth DeHaan, Florian Metze, Jordi Torres, and Xavier GirĂ³-i-Nieto
CVPR, 2021
[PDF] [1' video] [Poster]

When using the How2Sign Dataset please reference:

@inproceedings{Duarte_CVPR2021,
    title={{How2Sign: A Large-scale Multimodal Dataset for Continuous American Sign Language}},
    author={Duarte, Amanda and Palaskar, Shruti and Ventura, Lucas and Ghadiyaram, Deepti and DeHaan, Kenneth and
                   Metze, Florian and Torres, Jordi and Giro-i-Nieto, Xavier},
    booktitle={Conference on Computer Vision and Pattern Recognition (CVPR)},
    year={2021}
}
  

Video Summary



Developed by

     


Supported by