Dataset (summer 2024 version) | Ancient History | D-Scribes

This dataset represents the older version of the Hell-Date dataset. For the latest version, see https://zenodo.org/records/15083590

This older verion counts 187 images of 155 papyri that are precisely dated (within two years) from the Hellenistic period (3rd to 1st c. BCE, more precisely from -310 to -3). Download.

For each papyrus, the following identifiers are used:

TM numbers: unique identifiers of texts according to the Trismegistos database (https://www.trismegistos.org/about_how_to_cite.php)
Checklist identifiers: short abbreviations indicating the publication of the edition of the papyrus (https://papyri.info/docs/checklist)

The Hell-Date.zip archive contains the following files:

data.csv gives access to the 187 images with, for each image, a standard name, the location, collection name, inventory number, and the link to access online the file.
- Names are standardised across the csv as TMnumber_checklistAbbreviation. Some papyri are in more than one image, in that case the name contains additional information to distinguish the various images (e.g., two fragments of the same papyrus preserved in different collections, or the recto and verso of the same papyrus);
- A python script is joint with the .csv to automatize the download process.
metadata.csv contains metadata for each image. Each column of the file represents the following metadata:
- image_name: name of the file for the image of the papyrus;
- checklist: checklist identifier of the papyrus (usual way to refer to the papyrus in papyrology);
- TM: TM number as unique identifier of the text;
- Year post: i.e. the year before which the papyrus cannot have been written;
- Year ante quem: i.e. the year after which the papyrus cannot have been written;
- Production Nome (supposed): the geographical region where the papyrus was written;
- Function: the type of document (e.g. a contract, or a letter. This item could be a comma separated list).
downloader.py allows to download automatically all the images of the dataset taking each of them from the original archive.
How to download the dataset.pdf briefly describe the simple procedure to download the images using the downloader.py script.
Requirements.txt describe the requirements for the python environment to run the script correctly.

Some caveats concerning the images:

The images are downloaded from the WWW, notably from the following collections:
The images are not pre-processed nor harmonised in format, resolution, colour scale, scale.
Five images, labelled _GreekOnly, were cropped to remove their large Egyptian Demotic text and focus on the Greek text.
The image of TM3563 was divided into nine individual images, one for each column of text.
If the images are not of sufficient quality, there is in some cases the possibility to have better-quality images through iiif servers.
As already mentioned, one papyrus can have two images.
For some papyri, one of the two images has very little text on it. If pertinent, one can exclude it from the dataset.
Some texts were written by more than one scribe, yet we have not differentiated between hands, neither on the images nor in the spreadsheet.

Concerning provenance, most documents come from Egypt, but there are a few outsiders from Near East.

The chronological coverage is balanced around 50 papyri per century over the considered period (III – I BCE); only the earliest decades are not covered, and the decade 250s is overrepresented.

Users of this dataset must comply with the licenses provided by the various websites that give access to the images. Please take note that some of them do not allow reuse, or commercial reuse, of the images, and that credits are mostly required. By using this dataset, you confirm that you have read and understood the following licenses:

Ann Arbor, Michigan University: Creative Commons “Public Domain 1.0”
Berkeley, University of California: https://www.lib.berkeley.edu/about/permissions-policies
Berlin, Staatliche Museen zu Berlin: https://berlpap.smb.museum/nutzungshinweise/
Cairo, Egyptian Museum Cairo: unknown license
Cologne, Universität zu Köln: Creative Commons “BY 4.0”
Durham (NC), Duke University: Creative Commons “BY-NC 3.0”
Florence, Biblioteca Medicea Laurenziana: https://psi-online.it/rightpermission
Genova, Università di Genova: http://www.pug.unige.net/Home/Contatti
Hamburg, Staats- und Universitätsbibliothek Hamburg: Creative Commons “Public Domain 1.0”
Heidelberg, Universität Heidelberg: “free access, no reuse”
London, British Library: Public Domain
Manchester, John Rylands Library: unknown license
New York, Columbia University: Creative Commons “BY-NC 3.0”
New York, Pierpoint Morgan Library: unknown license
Oxford, Art Archaeology and Ancient World Library: https://rightsstatements.org/page/InC/1.0/?language=en
Paris, Musée du Louvre - Antiquités égyptiennes: https://collections.louvre.fr/en/page/cgu
Paris, Sorbonne Université - Institut de Papyrologie: Creative Commons “BY-NC 4.0” and https://papyrologie.sorbonne-universite.fr/la-collection/conditions-dutilisation-des-photos/
St. Louis, Washington University: Creative Commons “Public Domain 1.0”
Turin, Museo Egizio: Creative Commons “CC0”
Vienna, Österreichische Nationalbibliothek: https://www.onb.ac.at/nutzung
Warsaw, University of Warsaw: http://www.papyrology.uw.edu.pl/copyright.htm

The research behind Hell-Date would not have been possible without the data provided by Papyri.info (https://papyri.info/, CC BY 3.0), Trismegistos (https://www.trismegistos.org/, CC BY-SA 4.0) and PapPal (https://pappal.info/ - many thanks to Rodney Ast for sharing their data).

Social Media