Data & methods · Pronunciation Matters

What data is produced

Pronunciation Matters works with speech recordings from learners and reference speakers. Each recording is accompanied by metadata that allows scholarly classification without disclosing clear names in the web app. Depending on the speaker group, this may include target language, speaker status, language level, first language, gender, year of recording, recording context, stays in the target-language area, or, for reference speakers, information on origin and standard variety.

The data is managed in pseudonymised form. Clear names, consent forms, and organisational documents remain separate from the research environment. The web app does not work with clear names, but with stable person and session IDs. This separation is the basis for making the data usable for research while managing it in accordance with data-protection requirements.

In addition to the audio recordings, structured accompanying data is produced: task lists, item IDs, transcripts, timestamps, alignment data, interview segments, and material references. The aim is not simply to collect audio files, but to create an analysable corpus in which recordings, tasks, metadata, and web app functions fit together.

Task formats

The project corpora work with several task formats. Across the project, the wordlist, sentence or text task, and interview are central. The exact design may vary by language and is described in the respective corpus areas.

The wordlist is used for the controlled elicitation of isolated pronunciation. It is not practice material, but an elicitation instrument. Items are selected so that relevant pronunciation phenomena occur repeatedly and under comparable conditions. The focus is on intelligibility, systematic realisations, and contrasts, not on maximum proximity to a single native-speaker ideal.

The sentence or text task complements the wordlist. It examines how pronunciation patterns occur under sentence-prosodic or connected conditions. For individual corpora, this may be implemented as a sentence list or as a connected text. What matters is that the task does not consist of arbitrary texts, but is tailored to the respective research logic.

The interview adds a reflective and less tightly controlled component to the controlled reading tasks. Learners can talk about their own pronunciation, perceived difficulties, and notable points in the material. This makes it possible not only to document how certain forms are realised, but also to examine how learners perceive and describe pronunciation.

Learner-oriented item selection

Task items are not selected solely according to lists of linguistic phenomena. They must also be manageable for learners. An item may be phonologically interesting and still unsuitable if it is unnecessarily rare, morphologically complex, strongly culture-specific, or hardly readable for the target group.

A central lesson from MAR.ELE was that existing research designs and established material lists provide important points of connection, but do not automatically fit a learner-oriented pronunciation corpus. Pronunciation Matters uses this experience. Where existing projects, corpora, or task formats are suitable, they can serve as references. Where they create too many additional problems for learners, they are adapted or replaced by controlled formats developed for the project.

For the wordlists, this means that item selection is phonologically motivated, but checked for readability and suitability for the target group. Relevant phenomena should occur repeatedly without the list being openly ordered by phenomenon. This reduces strategic or metalinguistically controlled reading.

For sentence and text tasks, this means that the materials should be comprehensible and formally controlled. In sentence lists, items from the wordlist are taken up again under sentence-prosodic conditions without introducing new pronunciation phenomena without control. In connected texts, attention is paid to whether the structure of the material fits the respective research question.

Audio processing and annotation

For wordlist and sentence or text tasks, the temporal structure of the recordings is not produced by general automatic transcription. Instead, a controlled audio and alignment workflow is used.

First, the audio recordings are prepared in a lossless working format. Relevant recordings are cleaned, and standardised pauses are inserted between items. These pauses are not a minor editing step. They make it possible to segment the recordings reliably into item or sentence units.

Praat is used for segmentation. Praat annotations and scripts can identify item boundaries and connect them with the fixed master lists of the respective material. In wordlists, the sounding intervals can be assigned to individual items. In sentence lists and text segments, the segment boundaries are checked against the canonical material catalogues.

For sentence and text tasks, Montreal Forced Aligner is additionally used. Audio, transcript or master text, acoustic model, and pronunciation dictionary are combined to generate word boundaries within the segments. This produces timestamps that can represent not only whole items, but, where the data is suitable, individual words within sentences or text passages.

The results are transferred into structured target formats. TextGrid files, alignment data, and canonical JSON structures form the basis for the later player, for highlighting in the text, and for targeted comparison functions in the web app.

Interview transcription

Interviews follow a different logic from wordlists and sentence or text tasks. The interview is not primarily a phonetic fine alignment, but a content-oriented conversation about pronunciation, task perception, and subjective difficulties.

The interview transcripts follow a simple content-oriented transcription scheme based on Dresing/Pehl. In addition, a small number of phenomena relevant to the project are retained in standardised form: filled pauses, self-repairs and cut-offs, relevant short pauses, and relevant laughter or sighing. Conversation-analytic fine notation, detailed prosodic marking, and phonetic detail transcription are deliberately not used.

For interview processing, an automatically generated raw transcript can be used as a working basis. This raw transcript is reviewed editorially. Speaker assignment, segmentation, punctuation, filled pauses, and relevant material references are corrected or added. The export is then transformed by script into a canonical PROMAT interview JSON.

The interview data is segment-based. This means that speaker changes and conversation segments are the primary structure. Token times can be retained for display, search, highlighting, or later extensions, but they are not presented as phonetically precise fine-alignment data.

Intake, pseudonymisation, and data integration

Before data is imported into the web app, it passes through an intake process. This process serves the controlled collection and checking of participant data, session data, document references, and recording-related information. It is not the research database itself, but a preparatory working and review layer.

Clear names and consent documents remain in the secure area. Pseudonymised person data, session data, and exposure information are recorded separately. A stable person_id connects the layers without transferring clear names into the research data. The final session_id is not freely invented by hand, but generated from the checked session information.

After collection, audio, annotation, and metadata are transferred into a target structure. There, raw data, processed working files, alignment files, web derivatives, and item-level audio files are kept separate. Scripts integrate the data into the web app and research data structure. This keeps it traceable which files are original recordings, which files are processing results, and which artefacts are provided for web use.

The web app as a research instrument

The web app is not merely a storage location for audio files. It is a working instrument for research and university teaching.

Users can access recordings, switch between task formats, and compare speakers in targeted ways. The player connects audio, timestamps, and material texts so that wordlists, sentence lists, texts, and interviews can each be used in an appropriate display format.

The web app provides dedicated research surfaces for comparative analyses. Recordings can be investigated by persons, sessions, task formats, or phenomenon-based selections. For the analysis of specific pronunciation phenomena, sets of items can be preselected, created, and modified. These sets can then be used in comparison views or in the player without requiring users to edit the underlying data structures by hand.

Reference recordings have a special role in this context. They represent important standard pronunciations in the respective languages and do not serve as an independent object of investigation. Their function is to provide a tertium comparationis: learner pronunciation and target pronunciation can be compared acoustically and systematically using the same items. The reference recordings are therefore not a simple normative template, but a controlled axis of comparison for research, teaching, and material development.

These functions are especially relevant for university teaching. Students can do more than listen to isolated examples; they can work systematically with selected data, for example to investigate segmental contrasts, prosodic patterns, typical learner difficulties, or differences between learner and reference recordings.

Access, protection, and publication

Pronunciation Matters distinguishes between protected research data and publicly released materials.

Protected research data includes, in particular, speech recordings, pseudonymised metadata, player access, detailed comparison views, and work surfaces for phenomenon-based selection. These areas are not freely available on the public web because voice and metadata remain sensitive research data even in pseudonymised form.

Public content includes project information, general methodology descriptions, language-specific design information, and released teaching materials. Teaching materials can emerge from the research work, but they are published only after disciplinary and legal review.

In this way, the platform connects transparency and protection. The project should be understandable and make materials available for teaching, without publishing personal research data in an uncontrolled way.