Document Engineering for a Digital Library: PDF recompression using JBIG2 and other optimization of PDF documents
The result's identifiers
Result code in IS VaVaI
<a href="https://www.isvavai.cz/riv?ss=detail&h=RIV%2F00216224%3A14330%2F10%3A00040554" target="_blank" >RIV/00216224:14330/10:00040554 - isvavai.cz</a>
Result on the web
—
DOI - Digital Object Identifier
—
Alternative languages
Result language
angličtina
Original language name
Document Engineering for a Digital Library: PDF recompression using JBIG2 and other optimization of PDF documents
Original language description
Several innovative document transformations and tools developed in the process of building the Digital Mathematical Library DML-CZ http://dml.cz are described. The main result is our new PDF re-compression tool, developed using a enhanced jbig2enc library. Together with pdfsizeopt.py by Péter Szabó, we have managed to decrease PDF storage size and transmission needs by 62%: using both programs we reduced the size of the original already compressed PDFs to 38%. We briefly describe workflow and tools developed for creating the digital library. The batch digital signature stamper, the document similarity metrics which uses four different methods, a [meta]data validation process and math OCR tools represent some of the main [by]products. Such document engineering, together with Google Scholar indexing optimization, have led to the success of serving digitized and born-digital scientific math documents to the public in DML-CZ, and are being employed also in The European Digital Mathematics
Czech name
—
Czech description
—
Classification
Type
D - Article in proceedings
CEP classification
IN - Informatics
OECD FORD branch
—
Result continuities
Project
Result was created during the realization of more than one project. More information in the Projects tab.
Continuities
P - Projekt vyzkumu a vyvoje financovany z verejnych zdroju (s odkazem do CEP)<br>S - Specificky vyzkum na vysokych skolach
Others
Publication year
2010
Confidentiality
S - Úplné a pravdivé údaje o projektu nepodléhají ochraně podle zvláštních právních předpisů
Data specific for result type
Article name in the collection
Proceedings of DocEng 2010 conference
ISBN
978-1-4503-0231-9
ISSN
—
e-ISSN
—
Number of pages
10
Pages from-to
—
Publisher name
ACM
Place of publication
Manchester, UK
Event location
Manchester, UK
Event date
Sep 21, 2010
Type of event by nationality
WRD - Celosvětová akce
UT code for WoS article
000286949400002