We regularly organise contests open to researchers and others. The aim is to compare state-of-the-art document image analysis methods. We also put the submissions in context with leading comercial and open source tools.
Evaluation of page segmentation and region classification methods for documents with complex layouts. The images and ground truth were taken from the PRImA Layout Analysis dataset, containing a wide selection of contemporary documents (with complex as well as simple layouts) together with extensive metadata. Emphasis is placed on magazines (mostly) and technical articles, which are likely to be the focus of digitisation efforts.
RDCL is now a continuous competition. The evaluation is integrated in the Aletheia tool and can be used offline. New results can be submitted at any time and will be published once validated.
The British Library’s collection of Arabic manuscripts is internationally recognised as one of the largest and finest in Europe and North America, comprising almost 15,000 works in some 14,000 volumes. Since 2012, the Library, in partnership with The Qatar Foundation and Qatar National Library, has digitised and made freely available over 950,000 images and counting, featuring the cultural and historical heritage of the Gulf and wider region, on Qatar Digital Library (QDL).
Ranging from the early eighth century CE to the nineteenth century, the manuscripts are drawn from both Arab countries and other countries with Arab or Muslim communities including India, China, Indonesia, Malaysia, and West Africa, and they display fascinating variations in style and script.
As part of this project we would like to pose a challenge focussing on finding an optimal solution for accurately and automatically transcribing our vast and growing digital archive of historical Arabic scientific handwritten manuscripts within the QDL. Our aim is to improve accessibility of this rich content by enabling full-text search and discovery, as well as enabling large-scale text analysis.
The British Library is currently undertaking a ground breaking project, Two Centuries of Indian Print, to digitise and make available as open access 4,000 early printed Indian books (1713-1914) written in Bengali. Complementary material, the Quarterly Lists, consist of catalogue records for all books published in India between 1867 and 1967, will also be made openly available through the project.
As part of this project we would like to pose a challenge to find an optimal solution for accurately and automatically transcribing the Bengali books and Quarterly Lists, to form a unique dataset that can be used with computational tools and methods, and to enable full-text search and discovery.
Historical books represent a large proportion of libraries’ holdings and continue to be the focus of large-scale digitisation projects. A number of distortions frequently manifest themselves in scans of historical books, hindering layout analysis and text recognition. The motivation of the competition is to evaluate existing approaches using a realistic dataset and an objective performance analysis system.
HBR2013 followed the successful running of all previous ICDAR Page Segmentation competitions (2001, 2003, 2005, 2007, 2009 and 2011). The competition expanded the scope to historical books with distortions (the historical documents in the dataset of the ICDAR2011 competition were largely distortion free – in order to better evaluate the segmentation step on its own). Furthermore, the breadth of the competition was increased to cover recognition as well.
Historical newspapers pose a series of challenges due to the method of their production (inexpensive paper, inconsistent inking, varying layout etc.) as well as the presence of ageing and use artefacts. Newspapers are increasingly the major focus of large-scale digitisation projects (e.g. Europeana Newspapers) as they contain information that is widely interesting to the general public and, at the same time, are rapidly deteriorating in storage. The motivation of the competition is to evaluate existing approaches using a realistic dataset and an objective performance analysis system.
HNLA2013 followed the successful running of all previous ICDAR Page Segmentation competitions (2001, 2003, 2005, 2007, 2009 and 2011). The competition expanded the scope to historical newspapers.
Biennial ICDAR competitions since 2001, providing snapshots of page recognition methods.
2011 - Historical Document Layout Analysis Competition ICDAR Publication »
2009 - Page Segmentation Competition ICDAR Publication »
2007 - Handwriting Segmentation Contest ICDAR Publication »
2007 - Page Segmentation Competition ICDAR Publication »
2005 - Page Segmentation Competition ICDAR Publication »
2003 - Page Segmentation Competition ICDAR Publication »
2001 - First International Newspaper Page Segmentation Contest ICDAR Publication »