Building a corpus of tourism scientific texts: foundations, methodology and potentialities

Keywords: Parallel corpus, Comparable corpus, Tourism, Terminology, Scientific writing


The aim of this work is to present the theoretical foundations and methodology used to build a Portuguese-English parallel (abstracts/translations) and comparable corpus (abstracts) of scientific abstracts in Tourism (named TEXTur), aimed at scientific writing practice, teaching and research, highlighting its applications and potentialities. Three highly ranked Brazilian tourism journals were selected in order to extract abstracts published in Brazilian Portuguese and their translations into English, and three highly ranked international journals of tourism were selected in order to extract abstracts published in English. Texts collected were divided into sentences and annotated according to the corresponding rhetorical movement (Introduction/Gap/Objective/Methodology/Result/Conclusion), based on the model proposed by Swales and Feak (2009) and enhanced by Feltrim (2004). Regarding the parallel corpus, original and translated texts were aligned side by side. Time frame refers to issues published within the four last years (2018-2021). In this paper we present some partial results, with data extracted from one of the national journals (Revista Brasileira de Pesquisa em Turismo – RBTur – Brazilian Journal of Tourism Research) and one of the international journals (Journal of Travel Research - JTR). A total of 92 pairs of abstracts/translations extracted from RBTur were aligned, divided into sentences and annotated. Within the same time frame, 360 abstracts extracted from JTR were divided into sentences and annotated.  As practical implications of this research, we point out the potential application of the corpus as (1) a teaching resource in Methodology and Academic Writing classes (2) a source for data extraction and observation aimed at different types of Linguistics research (3) an input for the development of scientific writing tools and (4) a source of information for students, professors, and researchers. The originality of the research lies in the fact that the corpus consists of scientific texts exclusively from the field of tourism.


Download data is not yet available.

Author Biographies

Ivanir Delvizio, São Paulo State University

Doctor and Master in Linguistic Studies from Universidade Estadual Paulista (UNESP) and a degree in Translation (UNESP). She is currently a lecturer at UNESP. E-mail:

Yanae Pereira da Silva, Universidade Estadual Paulista

Undergraduate student in Tourism at Paulista State University. She is carrying out undergraduate research. She has a scholarship from the Tutorial Education Program (PET). E-mail:

Mariana Nascimento Jordão, Universidade Estadual Paulista

Tourism undergraduate student at São Paulo State University (UNESP). Scientific Initiation Fellow under the Pibic Programme. E-mail:


Antiqueira, L., Feltrim, V. D., Nunes, M. D. G. V. (2003). Projeto e implementação do sistema SciPo. São Carlos, Brasil. Série de Relatórios Técnicos do Instituto de Ciências Matemáticas e de Computação (nº 223).
Aranha, S. (2007). A busca de modelos retóricos mais apropriados para o ensino da escrita Acadêmica. Revista Do GEL, 4(2), 97–114.
Baker, M. (1993). Corpus Linguistics and translation studies: implications and applications. Em Baker, M.; Francis, G.; Tognini-Bonelli, E. (org.). Text and technology: in honour of John Sinclair. Amsterdam: John Benjamins.
Baker, M. (1995). Corpora in translation studies: an overview and some suggestions for future research. Target, 7(2), 223-243.
Baker, M. (1996). Corpus-based translation studies: the challenges that lie ahead. Em SOMERS, H. (ed.). Terminology, LSP and translation studies in language engineering: in honour of Juan C. Sager. Amsterdam: John Benjamins, p. 177-186.
Berber Sardinha, A. P. (2000). Linguística de corpus: histórico e problemática. Delta: documentação de estudos em linguística teórica e aplicada, São Paulo, 16(2), 323-367.
Carvalho, C. T. De; Laranha; L. A. N.; Pinto, P. T. (2021). DIY corpora: o que são e para quem são?. Tradterm, São Paulo, 37(1), Janeiro, 2021, p. 64-87.
Caseli, H. De M.; Nunes, M. das G. V. (2004). Corpus paralelo e corpus paralelo alinhado: propriedades e aplicações. Estudos Lingüísticos, 33, p. 581-586.
Gil, B.; Aranha, S. (2017) Um estudo do gênero abstract na disciplina de Antropologia: a heterogeneidade da(s) área(s). Delta, 33(3)R, 843-871.
Marcuschi, L. A. (2002). Gêneros Textuais: definição e funcionalidade. In: Dionisio, A. P.; Machado, A. R.; Bezerra, M. A. (org.). Gêneros Textuais e Ensino. Rio de Janeiro: Lucerna.
Marquiafável, V. S. (2007). Um processo para a geração de recursos lingüísticos aplicáveis em ferramentas de auxílio à escrita científica. 273 f. 2007. Dissertação (Mestrado em Linguística) - Universidade Federal de São Carlos, São Carlos.
Rejovsky, M. (2010). Produção Científica em Turismo: análise de estudos referenciais no exterior e no Brasil. Turismo em análise, 21(2), 224-246.
Swales, J. (1990). Genre analysis: English in academic and research settings. Cambridge: University Press.
Tagnin, S. E. O. (2013). O jeito que a gente diz: combinações consagradas em inglês e português. Barueri: Disal.
Tagnin, S. E. O. (2011). Linguística de Corpus e Fraseologia: uma feita para a outra. Em Ortiz, M. L. A.; Unternbaumen, E. H. (Org.). Uma (re)visão da teoria e da pesquisa fraseológicas. Campinas: Ponte, p. 227-302.
Tanikaki, S. de F. B.; Souza, J. W. da C. (2021). Criação e Anotação do corpus de resumos científicos de Ciências Sociais Aplicadas. Anais do 13º Simpósio Brasileiro de Tecnologia da Informação e da Linguagem Humana (STIL), Evento Online. Porto Alegre: Sociedade Brasileira de Computação, 2021. p. 437-441.
How to Cite
Delvizio, I., Silva, Y. P. da, & Jordão, M. N. (2023). Building a corpus of tourism scientific texts: foundations, methodology and potentialities. ATELIÊ DO TURISMO, 7(2), 103 - 126.