Towards New Czechoslovak Hyphenation Patterns
Zpravodaj Československého sdružení uživatelů TeXu, Tome 30 (2020) no. 3-4, pp. 118-126
Cet article a éte moissonné depuis la source Czech Digital Mathematics Library

Voir la notice de l'article

Space- and time-effective segmentation and hyphenation of natural languages stay at the core of every document preparation system, web browser, or mobile rendering system. Recently, the unreasonable effectiveness of pattern generation has been shown - it is possible to use hyphenation patterns to solve the dictionary problem for a single language without compromise. In this article, we will show how we applied the marvelous effectiveness of patgen for the generation of the new Czechoslovak hyphenation patterns that cover two languages. We show that the development of more universal hyphenation patterns is feasible, allows for significant quality improvements and space savings. We evaluate the new approach and the new Czechoslovak hyphenation patterns.
Space- and time-effective segmentation and hyphenation of natural languages stay at the core of every document preparation system, web browser, or mobile rendering system. Recently, the unreasonable effectiveness of pattern generation has been shown - it is possible to use hyphenation patterns to solve the dictionary problem for a single language without compromise. In this article, we will show how we applied the marvelous effectiveness of patgen for the generation of the new Czechoslovak hyphenation patterns that cover two languages. We show that the development of more universal hyphenation patterns is feasible, allows for significant quality improvements and space savings. We evaluate the new approach and the new Czechoslovak hyphenation patterns.
DOI : 10.5300/2020-3-4/118
Keywords: hyphenation; hyphenation patterns; patgen; syllabification; syllabic hyphenation; Czech; Slovak; Czechoslovak patterns; patgen; vzory dělení slov; československé dělení; efektivní segmentace; slabičné dělení pro více jazyků
@article{10_5300_2020_3_4_118,
     author = {Sojka, Petr and Sojka, Ond\v{r}ej},
     title = {Towards {New} {Czechoslovak} {Hyphenation} {Patterns}},
     journal = {Zpravodaj \v{C}eskoslovensk\'eho sdru\v{z}en{\'\i} u\v{z}ivatel\r{u} TeXu},
     pages = {118--126},
     year = {2020},
     volume = {30},
     number = {3-4},
     doi = {10.5300/2020-3-4/118},
     language = {en},
     url = {http://geodesic.mathdoc.fr/articles/10.5300/2020-3-4/118/}
}
TY  - JOUR
AU  - Sojka, Petr
AU  - Sojka, Ondřej
TI  - Towards New Czechoslovak Hyphenation Patterns
JO  - Zpravodaj Československého sdružení uživatelů TeXu
PY  - 2020
SP  - 118
EP  - 126
VL  - 30
IS  - 3-4
UR  - http://geodesic.mathdoc.fr/articles/10.5300/2020-3-4/118/
DO  - 10.5300/2020-3-4/118
LA  - en
ID  - 10_5300_2020_3_4_118
ER  - 
%0 Journal Article
%A Sojka, Petr
%A Sojka, Ondřej
%T Towards New Czechoslovak Hyphenation Patterns
%J Zpravodaj Československého sdružení uživatelů TeXu
%D 2020
%P 118-126
%V 30
%N 3-4
%U http://geodesic.mathdoc.fr/articles/10.5300/2020-3-4/118/
%R 10.5300/2020-3-4/118
%G en
%F 10_5300_2020_3_4_118
Sojka, Petr; Sojka, Ondřej. Towards New Czechoslovak Hyphenation Patterns. Zpravodaj Československého sdružení uživatelů TeXu, Tome 30 (2020) no. 3-4, pp. 118-126. doi: 10.5300/2020-3-4/118

1. Keary, Major: On Hyphenation - Anarchy of Pedantry. PC Update, The magazine of the Melbourne PC User Group. 2005. Available also from: https://web.archive.org/web/20050310054738/http://www.melbpc.org.au/pcupdate/9100/9112article4.htm

2. Marchand, Yannick, Adsett, Connie R., Damper, Robert I.: Automatic Syllabification in English: A Comparison of Different Algorithms. Language and Speech. 2009, vol. 52, no. 1, pp. 1–27. Available from doi: 10.1177/0023830908099881. | DOI

3. Bartlett, Susan, Kondrak, Grzegorz, Cherry, Colin: Automatic Syllabification with Structured SVMs for Letter-to-Phoneme Conversion. In: Proceedings of ACL-08: HLT. Columbus, Ohio: Association for Computational Linguistics, 2008, pp. 568–576. Available also from: https://www.aclweb.org/anthology/P08-1065

4. Trogkanis, Nikolaos, Elkan, Charles: Conditional Random Fields for Word Hyphenation. In: Proceedings of the 48th Annual Meeting of the Association for Computational Linguistics. Uppsala, Sweden: Association for Computational Linguistics, 2010, pp. 366–374. Available also from: https://www.aclweb.org/anthology/P10-1038

5. Liang, Franklin M.: Word Hy-phen-a-tion by Com-put-er. 1983. PhD thesis. Stanford University.

6. Shao, Yan, Hardmeier, Christian, Nivre, Joakim: Universal Word Segmentation: Implementation and Interpretation. Transactions of the Association for Computational Linguistics. 2018, vol. 6, pp. 421–435. Available from DOI: 10.1162/tacl_a_00033. | DOI

7. Reutenauer, Arthur, Miklavec, Mojca: TeX hyphenation patterns. TUG, [n.d.]. Available also from: https://tug.org/tex-hyphen/ Accessed 2019-11-24.

8. The Oxford Spelling Dictionary. Oxford University Press, 1990. The Oxford Library of English Usage.

9. Webster's Third New International Dictionary of the English Language Unabridged. Springfield, Massachusetts, U.S.A: Merriam-Webster Inc., 2002.

10. The Chicago Manual of Style. 17th ed. Chicago: University of Chicago Press, 2017. isbn 9780226287058.

11. Sojka, Petr: Notes on Compound Word Hyphenation in TeX. TUGboat. 1995, vol. 16, no. 3, 290–297. Available also from: https://tug.org/TUGboat/tb16-3/tb48soj2.pdf

12. Sojka, Petr, Ševeček, Pavel: Hyphenation in TeX - Quo Vadis?. TUGboat. 1995, vol. 16, no. 3, 280–289. Available also from: https://tug.org/TUGboat/tb16-3/tb48soj1.pdf

13. Sojka, Petr: Hyphenation on Demand. TUGboat. 1999, vol. 20, no. 3, 241–247. Available also from: https ://tug.org/TUGboat/tb20-3/tb64sojka.pdf.

14. Sojka, Petr: Slovenské vzory dělení: čas pro změnu?. (Slovak Hyphenation Patterns: A Time for Change?) CSTUG Bulletin. 2004, vol. 14, no. 3–4, 183–189. Available from doi: 10.5300/2004-3-4/183. | DOI

15. Sojka, Petr, Sojka, Ondřej: The Unreasonable Effectiveness of Pattern Generation. TUGboat. 2019, vol. 40, no. 2, pp. 187–193. Available also from: https://tug.org/TUGboat/tb40-2/tb125sojka-patgen.pdf

16. Jakubíčekm Milos, Kilgarriff, Adam, Kovář, Vojtěch, Rychlý, Pavel, Suchomel, Vít: The TenTen Corpus Family. In: Proc. of the 125 7th International Corpus Linguistics Conference (CL). Lancaster, 2013, pp. 125–127.

17. Kilgarriff, Adam, Rychlý, Pavel, Smrž, Pavel, Tugwell, David: The Sketch Engine. In: Proceedings of the Eleventh EURALEX International Congress. Lorient, France, 2004, pp. 105–116.

18. Sojka, Petr, Sojka, Ondřej: The Unreasonable Effectiveness of Pattern Generation. Zpravodaj CSTUG. 2019, vol. 29, no. 1–4, 73–86. Available from DOI: 10.5300/2019-1-4/73. | DOI

19. Chlebíková, Jana: Ako rozděliť (slovo) Československo. (How to hyphenate the word Czechoslovakia). Zpravodaj CSTUG. 1991, vol. 1, no. 4, 10–13. Available from DOI: 10.5300/1991-4/10. | DOI

20. Sojka, Petr: Slovenské vzory dělení: čas pro změnu?. In: Proceedings of SLT 2004, 4th seminar on Linux and TEX. Znojmo: Konvoj, 2004, 67–72. Available also from: https://fi.muni.cz/usr/sojka/papers/skhyp.pdf

21. Sojka, Ondřej, Sojka, Petr: cshyphen repository. [N.d.]. Available also from: https://github.com/tensojka/cshyphen

Cité par Sources :