home

 

Gabrielatos, C. (2007). Selecting query terms to build a specialised corpus from a restricted-access database. ICAME Journal 31, 5-43.

 

Abstract / Introduction

 

This paper proposes an accessible measure of the relevance of additional terms to a given query, describes and comments on the steps leading to its development, and discusses its utility. The measure, termed relative query term relevance (RQTR), draws on techniques used in information retrieval, and can be combined with a technique used in creating corpora from the world wide web,

namely keyword analysis. It is independent of reference corpora, and does not require knowledge of the number of (relevant) documents in the database. Although it does not make use of user/expert judgements of document relevance, it does allow for subjective decisions. However, subjective decisions are triangulated against two objective indicators: keyness and, mainly, RQTR.

 

Key words

 

Corpus, query terms, database, keywords, keyness, term relevance

 

Relevant details

 

An earlier version was presented at the Lancaster University Corpus Research Group on 30 October 2006.

 

 

Articles on the same topic

 

 

Baroni, Marco and Silvia Bernardini. 2003. The BootCaT toolkit: Simple utilities for bootstrapping corpora and terms from the web, version 0.1.2.

Baroni, Marco and Serge Sharoff. 2005. Creating specialized and general corpora using automated search engine queries. Paper presented at Corpus Linguistics 2005, Birmingham University, 14–17 July 2005.

Baroni, Marco, Adam Kilgarriff, Jan Pomikálek and Pavel Rychlý. 2006. Web-BootCaT: Instant domain-specific corpora to support human translators. Proceedings of EAMT 2006, 247–252.

Sinclair, John. 2004. Appendix: How to build a corpus. In M. Wynne (ed.). Developing linguistic corpora: A guide to good practice, 79–83. Oxford: Oxbow Books.

Boughanem, Mohand, Yannick Loiseau and Henri Prade. 2006. Rank-ordering documents according to their relevance in information retrieval using refinements of ordered-weighted aggregations. In M. Detyniecki, J.M. Jose, A. Nürnberger and C.J. van Rijsbergen (eds.). Adaptive multimedia retrieval: User, context, and feedback. Third International Workshop, AMR 2005, Glasgow, UK, July 28–29, 2005: Revised selected papers, 44–54. Berlin: Springer.

Xu, Jinxi and W. Bruce Croft. 1996. Query expansion using local and global document analysis. In Proceedings of the 19th Annual international ACM SIGIR Conference on Research and Development in information Retrieval (SIGIR ‘96), Zurich, Switzerland (August 18–22, 1996). ACM Press, New York, NY, 4–11.

 

 

If you know of any related publications or discussions freely available online,

please contact me.