Querying texts with the AntConc corpus software

Authors

DOI:

https://doi.org/10.36002/litera.v12i1.4987

Keywords:

AntConc, Corpus Linguistics, Indonesian, Reduplication, Regular Expressions

Abstract

AntConc is an open-source corpus linguistic tool featuring basic corpus linguistic analytical techniques. A powerful element of AntConc is its various pattern-searching methods. This paper describes the flavours of AntConc’s searching methods and illustrates them for the query of phonological, morphological, and syntactic phenomena. The more advanced searching method, namely Regular Expressions (RegEx), is also discussed. The paper illustrates the use of RegEx for querying a rather complex morphological phenomenon in Indonesian, namely reduplication. RegEx search output is compared with the output from the basic Wildcard search feature to accentuate the sophistication and flexibility of RegEx for more targeted and specific results. The paper, overall, underscores the importance of mastering the different methods of querying texts with different complexities in AntConc for investigating a wider range of data-intensive, linguistic inquiries.

Downloads

Download data is not yet available.

References

Anthony, L. (2022). What can corpus software do? In A. O’Keeffe & M. J. McCarthy (Eds.), The Routledge Handbook of Corpus Linguistics (2nd ed., pp. 103–125). Routledge. https://doi.org/10.4324/9780367076399-9

Anthony, L. (2024). AntConc (Version 4.3.1) [Computer software]. Waseda University. https://www.laurenceanthony.net/software/AntConc

Arka, I. W. (2010, August 2). Dynamic and stative passives in Indonesian & their computational implementation. MALINDO Workshop, Jakarta.

Bothma, T. J. D. (2017). Lexicography and information science. In P. A. Fuertes-Olivera (Ed.), The Routledge Handbook of Lexicography (1st ed., pp. 197–216). Routledge. https://doi.org/10.4324/9781315104942-14

Cabrera, J. C. M. (2017). Continuity and change: On the iconicity of Ablaut Reduplication (AR). In A. Zirker, M. Bauer, O. Fischer, & C. Ljungberg (Eds.), Dimensions of Iconicity (pp. 63–84). John Benjamins Publishing Company. https://doi.org/10.1075/ill.15.04mor

Daintith, J., & Wright, E. (2008). Kleene star. In A Dictionary of Computing. Oxford University Press. https://www.oxfordreference.com/display/10.1093/acref/9780199234004.001.0001/acref-9780199234004-e-2804

Dryer, M. S., & Haspelmath, M. (Eds.). (2013). WALS Online (v2020.4). Zenodo. https://doi.org/10.5281/zenodo.13950591

Esha, T. (1977). Ali Topan Anak Jalanan. Cypress.

Friedl, J. E. F. (2006). Mastering regular expressions (3rd ed). O’Reilly.

Goldhahn, D., Eckart, T., & Quasthoff, U. (2012). Building Large Monolingual Dictionaries at the Leipzig Corpora Collection: From 100 to 200 Languages. In N. Calzolari, K. Choukri, T. Declerck, M. U. Doğan, B. Maegaard, J. Mariani, A. Moreno, J. Odijk, & S. Piperidis (Eds.), Proceedings of the Eight International Conference on Language Resources and Evaluation (LREC’12) (pp. 759–765). European Language Resources Association (ELRA). http://www.lrec-conf.org/proceedings/lrec2012/pdf/327_Paper.pdf

Inkelas, S. (2014). Non-Concatenative Derivation: Reduplication. In R. Lieber & P. Štekauer (Eds.), The Oxford Handbook of Derivational Morphology. Oxford University Press. https://doi.org/10.1093/oxfordhb/9780199641642.013.0011

Kiyomi, S. (1995). A new approach to reduplication: A semantic study of noun and verb reduplication in the Malayo-Polynesian languages. Linguistics, 33, 1145–1167. https://doi.org/10.1515/ling.1995.33.6.1145

Lewenstein, M., Ian Munro, J., Raman, V., & Thankachan, S. V. (2014). Less space: Indexing for queries with wildcards. Theoretical Computer Science, 557, 120–127. https://doi.org/10.1016/j.tcs.2014.09.003

Mertz, D. Q. (2023). Regular expression puzzles and AI coding assistants: 24 puzzles solved by the author, with and without assistance from Copilot, ChatGPT and more. Manning.

Mistica, M., Arka, I. W., Baldwin, T., & Andrews, A. (2009). Double Double, Morphology and Trouble: Looking into Reduplication in Indonesian. Proceedings of the Australasian Language Technology Association Workshop 2009, 44–52. https://www.aclweb.org/anthology/U09-1007

Moeliono, A. M., Lapoliwa, H., Alwi, H., Tjatur, S. S., Sasangka, W., & Sugiyono, S. (2017). Tata bahasa baku bahasa Indonesia (Edisi Keempat). Badan Pengembangan dan Pembinaan Bahasa, Kementrian Pendidikan dan Kebudayaan. http://repositori.kemdikbud.go.id/16351/

Moeljadi, D. (2023). Reduplikasi sebagian, salin suara, dan trilingga dalam bahasa Indonesia [Book Chapter]. https://badanbahasa.kemendikdasmen.go.id/resource/doc/files/Bunga_Rampai_Tata_Bahasa_Kontemporer_Morfologi.pdf

Nomoto, H., Akasegawa, S., & Shiohara, A. (2018). Reclassification of the Leipzig Corpora Collection for Malay and Indonesian. NUSA, 65, 47–66.

Paquot, M., & Gries, S. Th. (Eds.). (2020). A Practical Handbook of Corpus Linguistics. Springer International Publishing. https://doi.org/10.1007/978-3-030-46216-1

Quasthoff, U., & Goldhahn, D. (2013). Indonesian corpora (No. 7; Technical Report Series on Corpus Building). Abteilung Automatische Sprachverarbeitung, Institut für Informatik, Universität Leipzig.

Rajeg, G. P. W. (Director). (2020, May 24). Tutorial AntConc [Video recording]. Universitas Udayana. https://doi.org/10.6084/m9.figshare.12950687

Rubino, C. (2013). Reduplication (v2020.4). In M. S. Dryer & M. Haspelmath (Eds.), The World Atlas of Language Structures Online. Zenodo. https://doi.org/10.5281/zenodo.13950591

Sariah. (2018). Dwilingga salin suara dalam bahasa Indonesia. Widyaparwa, 46(2), 99–262.

Stefanowitsch, A. (2020). Corpus linguistics: A guide to the methodology. Language Science Press.

Weisser, M. (2016). Regular expressions. In Practical corpus linguistics: An introduction to corpus-based language analysis (First edition, pp. 82–100). Wiley-Blackwell.

Wiltshire, C., & Marantz, A. (2000). Reduplication. In G. Booij, C. Lehmann, J. Mugdan, W. Kesselheim, & S. Skopeteas (Eds.), Morphologie (pp. 557–567). Walter de Gruyter. https://doi.org/10.1515/9783110111286.1.8.557

Wivell, G., Miatto, V., Karakaş, A., Kostyszyn, K., & Repetti, L. (2024). All about ablaut: A typology of ablaut reduplicative structures. Linguistic Typology, 28(3), 505–536. https://doi.org/10.1515/lingty-2023-0018

Downloads

Published

2025-01-14

How to Cite

Rajeg, G. P. W., Laksminy, L. P., & Rajeg, I. M. (2025). Querying texts with the AntConc corpus software. LITERA : Jurnal Bahasa Dan Sastra, 12(1), 1–19. https://doi.org/10.36002/litera.v12i1.4987

Similar Articles

1 2 3 4 5 > >> 

You may also start an advanced similarity search for this article.