Data Sets

The competition will be carried out on a subset of the QUWI database which comprises 4,068 handwritten text images contributed by 1,017 different writers. Each writer provided four samples, two in Arabic and two in English. The first page comprises Arabic handwritten text that comes from writer’s own imagination while the second page contains a pre-defined Arabic text copied by all writers. In a similar fashion, page three of each writer contains an arbitrary English text while page four comprises a fixed text written by all writers.

Database Distribution for Writer Identification Tasks (1A – 1D)

Tasks on writer identification (1A - 1B) will be carried out on writing samples of 800 writers from the database, 400 writers each in the four proposed tasks. 100 writing samples will be provided as validation dataset for each task while 200 test samples per task will be used to evaluate the system performance.

The training data comprises 400 samples in Arabic and 400 in English text from a total of 800 different writers while the validation data contains Arabic and English handwriting samples of 100 writers each. The naming convention of the images is AAAA_B, where AAAA represents the writer ID while B represents the sample number. The training and validation data will be grouped as a function of the tasks.

The test set will comprise 500 unlabeled handwritten images, 250 in Arabic and 250 in English. The test data will also be grouped as a function of the tasks and will be provided to the participants to evaluate their systems and submit the results.

Database Distribution for Gender Classification Tasks (2A – 2D)

For each of the gender classification tasks, 500 writing samples will be provided as training, 250 as validation and 250 as test set. In addition to the handwriting images, a separate text file containing gender information of the training and validation images will also be provided where a 1 represents male and a 0 represents female writings.

For tasks 2A and 2B, the training, validation and test data sets will be in the same script (English or Arabic) while for tasks 2C and 2D, the training data will comprise writings in one script while the validation and test data will contain writings in the other script.

Database Distribution for Handedness Classification Tasks (3A – 3D)

For each of the handedness detection tasks, 52 writing samples will be provided as training, 38 as validation and 38 as test set. In addition to the handwriting images, a separate text file containing handedness information of the training and validation images will also be provided where a 1 represents a right-handed writer and a 0 represents a left-handed writer.

For tasks 3A and 3B, the training, validation and test data sets will be in the same script (English or Arabic) while for tasks 3C and 3D, the training data will comprise writings in one script while the validation and test data will contain writings in the other script.

Online user: 1