NFDI4DS | UHH-SEMS - Publication Details

Tanja Säily

ORCID: 0000-0003-4407-8929

Publications

Citations

Views

---

Saved

---

About

Contact & Profiles

A5000404764

Research Areas

Natural Language Processing Techniques
Linguistic Variation and Morphology
Lexicography and Language Studies
Linguistics and language evolution
Authorship Attribution and Profiling
Gender Studies in Language
Syntax, Semantics, Linguistic Variation
Linguistics, Language Diversity, and Identity
Topic Modeling
Second Language Acquisition and Learning
Language, Discourse, Communication Strategies
Digital Humanities and Scholarship
Data Visualization and Analytics
Phonetics and Phonology Research
Multilingual Education and Policy
Mathematics, Computing, and Information Processing
EFL/ESL Teaching and Learning
Speech and dialogue systems
Spanish Linguistics and Language Studies
Speech Recognition and Synthesis
Linguistic research and analysis
Language and cultural evolution
Islamic Finance and Banking Studies
Digital Communication and Language
Organizational Management and Leadership

University of Helsinki
2015-2024

Linnaeus University
2021

Clinical Research Center Kiel
2020

Significance testing of word frequencies in corpora

OPENALEX - Publications

Jefrey Lijffijt Terttu Nevalainen Tanja Säily Panagiotis Papapetrou Kai Puolamäki and 1 more

Finding out whether a word occurs significantly more often in one text or corpus than another is an important question analysing corpora. As noted by Kilgarriff (Language never, ever, random, Corpus Linguistics and Linguistic Theory , 2005; 1(2): 263–76.), the use of χ 2 log-likelihood ratio tests problematic this context, as they are based on assumption that all samples statistically independent each other. However, words within not independent. pointed (Comparing corpora, International...

10.1093/llc/fqu064 article EN Digital Scholarship in the Humanities 2014-12-08

Variation in morphological productivity in the BNC: Sociolinguistic and methodological considerations

OPENALEX - Publications

Tanja Säily

The first aim of this work is to examine gender-based variation in the productivity nominal suffixes - ness and ity present-day British English. Possible interpretations are presented for findings that used less productively by women, while with there no gender difference. second analyse validity hapax-based measures sociolinguistic research. It discovered they require a significantly larger corpus than type-based ones, category-conditioned degree P unusable when comparing subcorpora based...

10.1515/cllt.2011.006 article EN Corpus Linguistics and Linguistic Theory 2011-01-01

New methods for analysing diachronic suffix competition across registers

OPENALEX - Publications

Paula Rodríguez-Puente Tanja Säily Jukka Suomela

Abstract This paper tracks stylistic variation in the use of two roughly synonymous suffixes, Romance - ity and native ness , during Early Modern English period. We seek to verify from a statistical viewpoint claims Rodríguez-Puente (2020) who reports on decrease favour registers representative speech-written formal-informal continua at that time. To this end, we develop new methods visual analysis enable diachronic comparisons competing processes across subcorpora, building upon an earlier...

10.1075/ijcl.22014.rod article EN cc-by International Journal of Corpus Linguistics 2022-08-19

Variation in noun and pronoun frequencies in a sociohistorical corpus of English

OPENALEX - Publications

Tanja Säily Terttu Nevalainen Harri Siirtola

Journal Article Variation in noun and pronoun frequencies a sociohistorical corpus of English Get access Tanja Säily, Säily Department Modern Languages, University Helsinki, Finland Search for other works by this author on: Oxford Academic Google Scholar Terttu Nevalainen, Nevalainen Harri Siirtola Computer Sciences, Tampere, Literary Linguistic Computing, Volume 26, Issue 2, June 2011, Pages 167–188, https://doi.org/10.1093/llc/fqr004 Published: 06 May 2011

10.1093/llc/fqr004 article EN Literary and Linguistic Computing 2011-05-06

Sociolinguistic variation in morphological productivity in eighteenth-century English

OPENALEX - Publications

Tanja Säily

Abstract This paper presents ongoing work on Säily and Suomela’s (

10.1515/cllt-2015-0064 article EN Corpus Linguistics and Linguistic Theory 2015-12-08

Wrangling with Non-Standard Data

OPENALEX - Publications

Eetu Mäkelä Krista Lagus Leo Lahti Tanja Säily Mikko Tolonen and 3 more

Research in the digital humanities and computational social sciences requires overcoming complexity research data, methodology, questions. In this article, we show through case studies of three different science projects, that these problems are prevalent, multiform, as well laborious to counter. Yet, without facilities for acknowledging, detecting, handling correcting such bias, any results based on material will be faulty. Therefore, argue need a wider recognition acknowledgement...

10.5617/dhnbpub.11180 article EN Digital Humanities in the Nordic and Baltic Countries Publications 2020-06-01

Text Variation Explorer

OPENALEX - Publications

Harri Siirtola Tanja Säily Terttu Nevalainen Kari‐Jouko Räihä

This paper reviews the gap between current methods of text visualization and needs corpus-linguistic research, introduces a tool that takes step towards bridging gap. Current tend to treat problem as data-encoding issue only, do not strive for interactive, tightly coupled representations would foster discovery. The argues such visualizations should always be linked effortless movement its visualization, controls provide continuous immediate feedback facilitate exploration. We introduce tool,...

10.1075/ijcl.19.3.05sii article EN International Journal of Corpus Linguistics 2014-09-01

Revisiting NMT for Normalization of Early English Letters

OPENALEX - Publications

Mika Hämäläinen Tanja Säily Jack Rueter Jörg Tiedemann Eetu Mäkelä

Mika Hämäläinen, Tanja Säily, Jack Rueter, Jörg Tiedemann, Eetu Mäkelä. Proceedings of the 3rd Joint SIGHUM Workshop on Computational Linguistics for Cultural Heritage, Social Sciences, Humanities and Literature. 2019.

10.18653/v1/w19-2509 article EN 2019-01-01

Language Change Database: A new online resource

OPENALEX - Publications

Terttu Nevalainen Turo Vartiainen Tanja Säily Joonas Kesäniemi Agata Dominowska and 1 more

Abstract We introduce the Language Change Database (LCD), which provides access to results of previous corpus-based research dealing with change in English language. The LCD will be published on an open-access linked data platform that allow users enter information about their own publications into database and conduct searches based linguistic extralinguistic parameters. Both metadata numerical from original available for download, enabling systematic reviews, meta-analyses, replication...

10.1515/icame-2016-0006 article EN ICAME journal 2016-03-01

Registerial Adaptation vs. Innovation Across Situational Contexts: 18th Century Women in Transition

OPENALEX - Publications

Stefania Degaetano‐Ortlieb Tanja Säily Yuri Bizzoni

Endeavors to computationally model language variation and change are ever increasing. While analyses of recent diachronic trends frequently conducted, long-term accounting for sociolinguistic less well-studied. Our work sheds light on the temporal dynamics use British 18th century women as a group in transition across two situational contexts. findings reveal that formal contexts adapt register conventions, while informal they act innovators influencing others. adopted from other...

10.3389/frai.2021.609970 article EN cc-by Frontiers in Artificial Intelligence 2021-06-01

Interactive Text Visualization with Text Variation Explorer

OPENALEX - Publications

Harri Siirtola Poika Isokoski Tanja Säily Terttu Nevalainen

Digitalization is changing how research carried out in all areas of science. Humanities no exception - materials that used to be hand-written or printed on paper are increasingly available digital form. This development scholars interacting with their material. We addressing the problem interactive text visualization context sociolinguistic language study. When a scholar reading and analyzing from computer screen instead paper, we can support this by providing dashboard for reading, creating...

10.1109/iv.2016.57 article EN 2020 24th International Conference Information Visualisation (IV) 2016-07-01

Explorations into the social contexts of neologism use in early English correspondence

OPENALEX - Publications

Tanja Säily Eetu Mäkelä Mika Hämäläinen

Abstract This paper describes ongoing work towards a rich analysis of the social contexts neologism use in historical corpora, particular Corpora Early English Correspondence , with research questions concerning innovators, meanings and diffusion neologisms. To enable this kind study, we are developing new processes, tools ways combining data from different sources, including Oxford Dictionary Historical Thesaurus contemporary published texts. Comparing candidates across these sources is...

10.1075/pc.18001.sai article EN Pragmatics & Cognition 2018-12-31

Coming Soon ...