Digitisation of the Complete Works of E. Kriaras in the Portal for the Greek Language

16
Digitisation of the Complete Works of E. Kriaras in the Portal for the Greek Language John N. Kazazis, Rania Voskaki, Athanasia Margoni, Christos Andras International Document Image Processing Summer School Fourni, Greece, 3-7 June 2013

Transcript of Digitisation of the Complete Works of E. Kriaras in the Portal for the Greek Language

Digitisation of the Complete

Works of E. Kriaras in the

Portal for the Greek

Language

John N. Kazazis, Rania Voskaki,

Athanasia Margoni, Christos Andras

International Document Image Processing Summer School

Fourni, Greece, 3-7 June 2013

Centre for the Greek Language

Portal for the Greek Language

Main aim: overall support and promotion of

the Greek language in Greece

and abroad

Ancient, Medieval, Modern Greek

online dictionaries

text corpora

anthologies

studies

Objectives

�Digitisation of scholarly works of E. Kriaras

�Categorisation of the uploaded material in

thematic units

�Representation of the digitised material in

human readable format

�Creation of an electronic index

Ultimate goal: easy access to Kriaras’s works for

both specialised researcher and

average reader

Thematic Units

� Dictionary of Medieval Vulgar Greek Literature

� Medieval Studies

� On Language

� Correspondence – Autobiographical Works –Other Documents

• books with E. Kriaras’s correspondence

• autobiographical works

• rare documents

� Monographs – Book reviews

� Journals

� Audio-visual material

Digitised data

a. QuantityThematic Units Number

of Works

Pages in total Page average per Work

Dictionary of MedievalVulgar Greek Literature

17 6,852 403

Medieval Studies 13 3,451 265

On language 27 7,948 294

Correspondence,

Autobiographical works,Other documents

15 3,556 237

Monographs,Book Reviews

13 1,380 106

Journals 238 2,182 9

Total 323 25,369 78.541

Digitised data

b. Quality

• Medium resolution: 300 dpi

• Pages: book size, scanned one at the time

• Evenly lighted images

• File format: png

• Scanned pages: mostly deskewed, except of some slightly skewed

• Overall material quality:

mostly in good quality, except of a few old printing of bad quality

Initial SQL database

1/3

Initial SQL database 2/3

Initial SQL database 3/3

Greek Portal’s database

1/2

Greek Portal’s database

2/2

Terms search machine

163,000 indices of terms

�Retrieval of a given term out of the entire online material

�Retrieval of the terms included in a specific work

�Link to the page and/or pages of the work that they refer to

Sample of obtained results

1/2

Sample of obtained results 2/2

Evaluation

New Visitors: 68,656

Visits: 43.6%

Returning Visitors: 88,859

Visits: 56.4%

Prospects

�Improvement of the image processing

technique

�Qualitative conversion of image files to

machine-readable files

�Optimisation of seeking performances

�Integration of the Dictionary of Medieval

Vulgar Greek Literature along with Greek

Portal’s online dictionaries