GRK-Papyri: A Dataset of Greek Handwriting on Papyri
This dataset is derived directly from research questions in the field of Papyrology, and the samples are selected by experts from the respective field of research. This dataset consists of 50 handwriting samples in Greek on papyri approximately from the 6th century A.D., which belong to 10 different scribes. It is prepared and made freely available for non-commercial research along with their confirmed groundtruth information related to the task of writer identification. Writer identification is cited as a use case in order to guide the selection of data; nevertheless, this dataset can be used for different research topics such as image enhancement, binarisation, and line/word segmentation.
Details about the images
All samples in the proposed dataset are in JPG format, with image resolution ranging from height of 796 to 6818 pixels and from width of 177 to 7938 pixels. Some images are grey-scale images and others are RGB- colour space images. All samples suffer from heavy degradation including low contrast, several holes and bending marks, and even reflection of glass which covers some samples for preservation purposes. The samples are not equally distributed over the 10 scribes. The minimum and maximum number of samples per scribe are 4 and 7 respectively. Further information in the article mentioned below under "Reference", section 2.2.
The correspondence with the usual denominations of the papyri can be found here download excel file. The bibliographical references mentioned are:
J.M. DIETHART and K.A. WORP, Notarsunterschriften im Byzantinischen Ägypten, 1986.
L. DEL CORSO, "Le scritture di Dioscoro" in J.-L. Fournet (ed.), Les archives de Dioscore d'Aphrodité cent ans après leur découverte. Histoire et culture dans l'Égypte byzantine, 2008.
The dossier of Mênas, Hermopolite notary, will be published by I. Marthot-Santaniello (forthcoming).
Except Mênas 1 and 2 (provided by BNUS, Strasbourg) and Mênas 3, 4, and 5 (British Library), all images are taken from The Bank of Papyrus Images of Byzantine Aphrodite BIPAb: BIPAb is created by Prof. Jean-Luc Fournet and funded by the Association Internationale de Papyrologues and Strasbourg University. The right of reproduction and online display have been acquired and each set of images is accompanied with the photographical credit and copyright of the owner.
H. Mohammed, I. Marthot-Santaniello and V. Märgner, "GRK-Papyri: A Dataset of Greek Handwriting on Papyri for the Task of Writer Identification," 2019 International Conference on Document Analysis and Recognition (ICDAR), Sydney, Australia, 2019, pp. 726-731.
Links to datasets
The "Training-test" version containing two folders (training folder of 20 images, two images for each scribe; test folder contains 30 images with different number of samples per scribe) is available here Training-test.zip