This volume highlights the ways in which recent developments in corpus linguistics and natural language processing can engage with topics across language studies, humanities and social science disciplines.
New approaches have emerged in recent years that blur disciplinary boundaries, facilitated by factors such as the application of computational methods, access to large data sets, and the sharing of code, as well as continual advances in technologies related to data storage, retrieval, and processing. The “march of data” denotes an area at the border region of linguistics, humanities, and social science disciplines, but also the inevitable development of the underlying technologies that drive analysis in these subject areas.
Organized into 3 sections, the chapters are connected by the underlying thread of linguistic corpora: how they can be created, how they can shed light on varieties or registers, and how their metadata can be utilized to better understand the internal structure of similar resources. While some chapters in the volume make use of well-established existing corpora, others analyze data from platforms such as YouTube, Twitter or Reddit. The volume provides insight into the diversity of methods, approaches, and corpora that inform our understanding of the “border regions” between the realms of data science, language/linguistics, and social or cultural studies.
List of Figures
List of Tables
List of Contributors
Acknowledgements
Introduction, Steven Coats (University of Oulu, Finland) and Veronika Laippala (University of Turku, Finland)
Part I. Methods for Data Collection, Analysis, and Visualization
1. Using Automatic Speech Recognition Transcripts for Linguistic Research, Steven Coats (University of Oulu, Finland)
2. Low-code Data Science Tools for Linguistics: Swiss Army Knives or Pretty Black Boxes? Jukka Tyrkkö and Daniel Ihrmark (Linnaeus University, Sweden)
3. The Visualisation and Evaluation of Semantic and Conceptual Maps, Gerold Schneider (University of Zurich, Switzerland)
Part II. Corpus Construction, Registers, and Genres
4. Toward Automatic Register Classification in Unrestricted Databases of Historical English, Liina Repo (University of Turku, Finland), Brett Hashimoto (Brigham Young University, USA), Aatu Liimatta (University of Helsinki, Finland), Lassi Saario (University of Helsinki, Finland), Tanja Säily (University of Helsinki, Finland), Iiro Tiihonen (University of Helsinki, Finland), Mikko Tolonen (University of Helsinki, Finland), and Veronika Laippala (University of Turku, Finland)
5. Exploring the Interplay of Registers and Topicality in a Web-Scale Corpus, Valtteri Skantsi, Veronika Laippala, and Aku Kyroläinen (University of Turku, Finland)
6. Towards ‘Large and Tidy’: Establishing Internal Structure in Mega-Corpora, Axel Bohmann (University of Freiburg, Germany)
Part III. Social Media, Discourse, and Meanings
7. Multi-Modal Considerations for Social Media Discourse Analysis: A Specialised Corpus of Twitter Commentary on ‘Working from Home’, Christopher Fitzgerald (Mary Immaculate College, Ireland), Geraldine Mark (Cardiff University, Wales), Anne O’Keeffe (Mary Immaculate College, Ireland), Dawn Knight (Cardiff University, Wales), Justin McNamara (Mary Immaculate College, Ireland), Svenja Adolphs (University of Nottingham, England), Benjamin Cowan (University College Dublin, Ireland), Tania Fahey Palma (University of Aberdeen, Scotland), Fiona Farr (University of Limerick, Ireland), and Sandrine Peraldi (University College Dublin, Ireland)
8. Exploring Patterns of Self-Identification in the LGBTQ+ Reddit Corpus, Laura Hekanaho (University of Helsinki/Tampere University, Finland), Turo Hiltunen (University of Helsinki, Finland), Minna Palander-Collin (University of Helsinki, Finland); and Helmiina Hotti (University of Helsinki, Finland)
Index
Series editors: Mikko Laitinen (University of Eastern Finland, Finland) and Martin Schweinberger (University of Queensland, Australia)
Editorial Board:
Jennifer Edmond, Associate Professor of Digital Humanities (Trinity College Dublin, Ireland)
Jacob Eisenstein, Assistant Professor of Computational Linguistics (Georgia Institute of Technology, USA)
Bruno Gonçalves, Vice President of Data Science and Finance/Fellow (JP Morgan Chase & Co/ISI Foundation, Italy)
Jack Grieve, Professor of Corpus Linguistics (University of Birmingham, UK)
Martin Hilpert, Assistant Professor of English Linguistics (Université de Neuchâtel, Switzerland)
Andreas Kerren, Professor of Computer Science (Linnaeus University, Sweden)
Haidee Kotze, Professor of Translation Studies (KU Leuven, Belgium)
Krister Linden, Adjunct Professor of Language Technology (University of Helsinki, Finland)
Dong Nguyen, Visiting Research of Text Mining Methods (Alan Turing Institute, UK)
Kari-Jouko Räihä, Professor of Computer Science (Tampere University, Finland)
Ute Römer, Assistant Professor of Applied Linguistics and English as a Second Language (Georgia State University, USA)
David Shepard, Professor of Germanic Languages, Comparative Literature, and Digital Humanities (University of California Los Angeles, USA)
Benedikt Szmrecsanyi, Associate Professor of Linguistics (KU Leuven, Belgium)
The growing availability of computer-readable language data, increasing computational power and rapidly evolving statistical methodologies have had a profound effect on how scholars study and analyse human language use. However, the fields of linguistics, computer science and digital humanities have largely developed their own separate approaches and paradigms, often failing to communicate across disciplines in an effective way.
Language, Data Science and Digital Humanities bridges these disciplinary gaps by publishing monographs and edited volumes that explore disciplinary synergies and introduce new theoretical principles. Written in clear and transparent language, these books offer cutting-edge digital methodologies and create new opportunities for understanding how problems and research questions can be approached from different perspectives.
The methodological range of the series covers empirical linguistics, natural language processing, machine learning, data visualization, text mining, mark-up and annotation, statistical tools in analysing language data, and multimodal analysis. The volumes explain methodological solutions in detail using worked examples, and are supported by companion websites, allowing authors to share primary data, scripts, sophisticated data visualizations and other digital content.
Produktdetaljer
Biografisk notat
Steven Coats is a Lecturer in English at the University of Oulu, Finland.
Veronika Laippala is Professor of Digital Language Studies at the University of Turku, Finland.