Spanish learner pronunciation: elicitation design and task protocol

Starting point

The Spanish corpus of Pronunciation Matters is situated at the intersection of two developments: corpus-phonological elicitation designs that aim to document pronunciation in controlled and comparable ways, and ELE didactics as well as university-level phonetics and phonology, where pronunciation as an object of learning and learner pronunciation as an object of research are increasingly being addressed explicitly.¹ Empirical data on learner pronunciation do not emerge without prior structuring: they are shaped by task formats that determine which linguistic units are read or spoken, under what conditions this takes place, and how comparable the resulting recordings are. The aim is therefore a task protocol that elicits central phenomena of Spanish learner pronunciation systematically, without overburdening the elicitation with unnecessarily difficult vocabulary, complex reading passages, or an implicit orientation toward first-language target norms.

The fact that pronunciation as an object of learning and learner pronunciation as an object of research are increasingly being integrated into Spanish phonetics and phonology is also visible in major introductions within German-language Hispanic linguistics: Pustka (2021) includes a dedicated chapter on pronunciation in foreign language teaching at the beginning of her corpus-linguistic introduction; Gabriel/Meisenburg/Selig (2025), in the second, revised edition, add a new chapter on Spanish as a foreign language in the German-speaking context, with sections on segmental aspects and prosody. Pronunciation Matters builds on this development and complements it with a platform that makes controlled learner data available for empirical research, research-oriented university teaching, and didactically oriented follow-up questions.

The didactic perspective also follows foundational work on pronunciation teaching in ELE, especially Gil Fernández (2007), as well as practice-oriented contributions on foreign-accent features in the classroom such as Benet/Pešková (2017). More recent work in ELE didactics likewise defines pronunciation not primarily through approximation to first-language norms, but through intelligibility, comprehensibility, and communicative function (cf. Santamaría Busto 2024; Bárkányi/Galindo Merino/Pérez-Bernabeu 2024).

This double positioning leads to the central design decision of the Spanish task protocol: existing models provide important groundwork, but can only partly be transferred directly to the targeted study of learner pronunciation. This concerns lexical difficulty, content-related distraction in reading passages, and the question of which pronunciation phenomena need to be elicited repeatedly and under controlled conditions. The guiding principles are therefore intelligibility, controlled elicitation, empirical comparability, and materials that make sense for learners.

Previous work and empirical starting point

An important intermediate step was the earlier project MAR.ELE – Corpus sobre la pronunciación del español por aprendientes de ELE en Marburg. In that smaller pilot project, 22 recordings with students from Marburg University were produced. MAR.ELE made Spanish learner pronunciation accessible as corpus material and open to empirical analysis. At the same time, practical work with this corpus also revealed the limits of a design that had been adopted too directly. Those experiences were decisive for the more extensive revision in the present project and are closely tied to the broader project development outlined on What this project is about.

For MAR.ELE, the wordlist from the (I)FEC project was adopted in full in order to remain compatible with an established corpus-phonological design for Spanish. Methodologically, that made sense, but it also exposed clear problems in work with learners: some items turned out to be unnecessary lexical stumbling blocks, while other phenomena that are particularly revealing for learner pronunciation were not distributed optimally or were not represented strongly enough. The reading passage used in MAR.ELE was useful as a comparison text, but turned out to be too demanding for the target group. The current design therefore responds not out of mere preference for a different model, but to concrete experience gained in practice.

Methodological starting point

A central goal of corpus-phonological projects is to elicit speech data in ways that make them comparable across projects. This logic also underlies established protocols such as PFC/IPFC for French and the related Spanish project (I)FEC. The starting point is Labov’s distinction between different degrees of self-monitoring or attention paid to speech: controlled reading tasks, wordlists, texts, and more open speech formats produce different types of data which, taken together, make it possible to describe phonological variation more comprehensively (cf. Labov 1972; Detey/Durand/Laks/Lyche 2016; Racine/Zay/Detey/Kawaguchi 2012; Pustka et al. 2018).

This kind of standardisation is methodologically useful because it creates comparability. For Pronunciation Matters, however, this ideal had to be balanced against a different research interest: the aim is not a general reference corpus for the Spanish sound system, but a learner corpus that provides data on pronunciation patterns, intelligibility, and didactically relevant contrasts with as few interfering factors as possible.

Design of the wordlist

The (I)FEC wordlist was designed for a broad corpus-phonological programme. It aims to capture numerous phenomena of the Spanish sound system, regional variation, and possible contrasts. For a learner corpus, this breadth is only partly useful. Some items are phonologically interesting, but turned out to be unnecessary stumbling blocks for learners or had little practical analytic value: morphologically complex forms such as estudiéis, rare diphthongs or triphthongs in items such as bou, miau, and guau, loanwords such as kétchup and iceberg, or lexically marginal words such as ñandú, yunque, and ciempiés.² Such words can shift the task away from pronunciation and toward lexical knowledge, reading confidence, or uncertainty when dealing with unfamiliar forms.

The new wordlist therefore deliberately moves away from the ideal of adopting the full protocol unchanged. At the same time, it remains strongly indebted to (I)FEC: 58 different word forms from (I)FEC were retained.³ The sequence of a randomised main part with individual lexical items followed by a final block of minimal or pseudo-minimal pairs was also kept. At the same time, 32 word forms were newly added in order to adapt the list more closely to learners and to the planned analyses.⁴ These include items that improve the coverage of specific consonantal phenomena, for instance word-final /d/ in ciudad, usted, and verdad. In (I)FEC, final /d/ is not covered systematically; in Pronunciation Matters, this phenomenon was therefore expanded deliberately.

The wordlist was therefore not reinvented, but revised in a targeted way. What matters are intelligibility, systematic realisations, and relevant contrasts, not orientation toward a first-language target norm or maximal coverage of the sound system. The list comprises 92 items: a main part with individual lexical items (86) and a clearly separated final block of minimal or pseudo-minimal pairs (6). The items are selected on phonological grounds, but filtered for learner suitability. Preference was given to high-frequency words, words appropriate to early learning levels, and forms that are as orthographically transparent as possible. Relevant phenomena should be represented repeatedly, usually by three to five items, so that individual tokens are not overinterpreted.

The selection of target phenomena is based on central descriptive domains of Spanish phonetics and phonology: segmental contrasts, syllable structure, word stress, grapheme–phoneme relations, and prosodic embedding. For the linguistic description of these domains, Pustka (2021) and Gabriel/Meisenburg/Selig (2025) provide important reference points. At the same time, the selection of items takes into account didactic work that understands the oral form as part of lexical knowledge and closely links pronunciation, phonological awareness, and vocabulary work (cf. Hidalgo Gallardo/Pérez Serrano 2024).

In Spanish, the interface between orthography and pronunciation is particularly relevant because the apparent transparency of sound–spelling correspondences should not be overestimated didactically (cf. Díez Plaza 2024).

The Spanish wordlist contains 92 items: 86 individual lexical items and 6 minimal or pseudo-minimal pairs.

1 mesa
2 reloj
3 viuda
4 tabúes
5 neutro
6 querría
7 caída
8 ciudad
9 lavar
10 avión
11 jamón
12 numeró
13 toros
14 gente
15 regla
16 flor
17 ríe
18 hoy
19 juzgar
20 signo
21 labio
22 deuda
23 queja
24 euforia
25 oír
26 ladrón
27 club
28 vainilla
29 número
30 usted
31 ángel
32 giro
33 cuidado
34 caza
35 logro
36 solo
37 mismo
38 vino
39 admirar
40 sueño
41 vacío
42 traer
43 jefe
44 álbum
45 vida
46 ustedes
47 chico
48 algo
49 ahí
50 enfermo
51 diablo
52 nadie
53 causa
54 tirar
55 llave
56 perro
57 carro
58 cuidar
59 tierra
60 baile
61 drama
62 vienes
63 gracias
64 oído
65 casa
66 ración
67 tampoco
68 muchacho
69 salud
70 quería
71 paz
72 champán
73 hambre
74 obtiene
75 oye
76 reír
77 suave
78 lleno
79 barrio
80 Europa
81 allí
82 numero
83 otros
84 verdad
85 caro
86 bienes
87 número – numero – numeró
88 caro – carro
89 ahí – allí
90 pero – perro
91 ola – hola
92 bienes – vienes

Limitations of traditional reading passages

For the second task area, too, the question was whether an existing reading passage should be adopted. In phonetic tradition, The North Wind and the Sun is an established comparison text that has long been used in the context of the International Phonetic Association (cf. International Phonetic Association 1999). The advantage is obvious: if many projects use the same text, data become easier to compare. At the same time, this textual tradition is itself not unproblematic. For the Spanish version, Coloma (2015) shows why the standard text was not optimally balanced and proposes a modified version that contains additional phonemes, has fewer word repetitions, and is phonetically more balanced. The discussion of alternative versions of such comparison texts also shows that even established standard texts are always phonologically and didactically selective (cf. Deterding 2006; Coloma 2015).

The (I)FEC project also does not simply use the IPA standard text, but developed its own reading passage. This text is useful for a general corpus-phonological protocol, but only partially suitable for a learner corpus. It contains 381 words and is therefore very long for many learners; the reading task may accordingly become tiring. In addition, it contains numerous lexically, syntactically, or semantically demanding passages, such as se agrava, perrera, lugar de los hechos, inspeccionar, suntuosa, or sentences such as Tan suntuosa por fuera, y por dentro parece un zoológico obtenido por la caza nocturna en selvas y pantanos. Comparable problems also appeared in the MAR.ELE text, whose selection was inspired by Andrea Pešková’s Archivo de los acentos en el ELE. The modified excerpt from El Principito used there, designed partly with very different phenomena in mind, contains forms such as sinnúmero, lúgubre, inquirió, compadecido, and vergüenza.

Such texts are not intended to test vocabulary, text comprehension, literary reading experience, attention, or fatigue. For learners who read them aloud, however, these factors become part of the task in practice. They can interfere with the pronunciation data and make interpretation more difficult: it is not always clear whether a striking realisation reflects a pronunciation pattern, uncertainty while reading, unfamiliar vocabulary, or overload caused by the text.

There is also a structural problem: connected texts rarely cover relevant phenomena evenly. Some phenomena occur repeatedly, while others are absent or appear only by chance. Learner-specific pronunciation issues in particular are not represented systematically enough. A reading passage is therefore useful for observing global read speech, rhythm, and prosodic patterns, but it is not optimal when specific learner phenomena need to be elicited repeatedly and under controlled conditions.

The sentence list as a controlled alternative

For Pronunciation Matters, a sentence list was therefore developed instead of a traditional reading passage. It does not replace the wordlist, but complements it functionally. The sentence list tests whether pronunciation patterns observed in isolated words remain stable under simple sentence-prosodic conditions. Each sentence therefore contains exactly two items from the wordlist. These items appear in the same lexical form; inflection is only allowed if it remains phonologically neutral and does not introduce a new target phenomenon. The sentence list therefore does not introduce new pronunciation phenomena, but recombines known items in controlled sentence contexts.

The sentence list thus follows a progression from controlled production toward more complex sentence contexts and treats intelligibility and comprehensibility as central reference points (cf. Marrero 2024; Santamaría Busto 2024). The link between phonological awareness and oral production in communicative tasks also provides a connection to Fullana/Pujolà (2024).

The list contains about 50 sentences: 30 declarative sentences, 10 yes/no questions, and 10 wh-questions. The sentences mostly contain 8–14 words, are syntactically simple, and are oriented toward A1–B1 vocabulary. Embedded clauses, chains of subordinate clauses, idiomatic expressions, stylistically marked formulations, and semantically distracting content are avoided.

This does not create a substitute for free speech data. Freer speech is often collected additionally in corpus-phonological traditions, but for beginners and many learner groups it is not always the right context for testing specific segmental phenomena systematically. The sentence list is rather a controlled alternative to a traditional reading passage: less complex than a connected narrative text, but more informative than isolated individual words, because sentence rhythm, sentence stress, and intonation come into play and become at least partly analysable.

The Spanish sentence list contains 50 sentences: 30 declaratives, 10 yes/no questions, and 10 wh-questions. Each sentence contains exactly two items from the wordlist.

D1 Hoy miro el reloj con calma antes de salir.
D2 La viuda vive en una casa tranquila cerca del centro.
D3 Los tabúes influyen mucho en la gente joven.
D4 Nadie se quedó en la ciudad después.
D5 Cuidar un perro exige tiempo y atención diaria.
D6 El médico pudo salvar la vida al enfermo.
D7 Es difícil juzgar solo por lo que se oye.
D8 El champán acompañó el baile elegante del evento.
D9 Algo tan caro no siempre vale la pena.
D10 El avión bajó hacia tierra sin problemas graves.
D11 El jamón quedó sobre la mesa después de cenar.
D12 La vainilla dejó un aroma suave en la cocina.
D13 Él quería llevar a su hijo a ver los toros.
D14 Por la mañana siento dolor en el labio y el oído.
D15 Mi tío se compró un carro y su deuda aumentó.
D16 La queja llegó finalmente al jefe responsable.
D17 La euforia le hizo traer más bebidas de lo necesario.
D18 Puedo oír bien cuando ella se ríe bajito.
D19 El ladrón buscó la llave correcta sin éxito.
D20 Un giro rápido lo llevó a una caída muy dolorosa.
D21 Con mucho cuidado se puso a lavar la ropa.
D22 Su logro es un motivo para admirar su esfuerzo.
D23 Al despertar, tenía sueño y un vacío extraño.
D24 Con su familia es un ángel y con sus amigos un diablo.
D25 La flor del jardín tiene un color neutro y bonito.
D26 El muchacho habla sobre la caza con su abuelo.
D27 Cuando el chico tiene hambre, pierde el control.
D28 El álbum familiar les hizo reír toda la tarde.
D29 Ahí termina la calle y allí comienza otra.
D30 El profesor explicó la regla y numeró ejemplos.
QY1 ¿El vaso está lleno de vino ahora?
QY2 ¿Vienes mañana a ver la casa?
QY3 ¿La salud importa más que la paz?
QY4 ¿Ustedes dicen “hola” al entrar al aula?
QY5 ¿Tampoco los otros quieren venir?
QY6 ¿Querría otra ración de tortilla?
QY7 ¿El drama continúa dentro del club?
QY8 ¿En Europa la verdad importa?
QY9 ¿Usted espera respuesta ahí?
QY10 ¿Viajó solo con el mismo plan de siempre?
QW1 ¿Por qué dices gracias pero no respondes?
QW2 ¿Cuándo vienen los otros hoy?
QW3 ¿Dónde sirve el vino el jefe?
QW4 ¿Por qué esa causa genera tanto drama?
QW5 ¿Cómo se obtiene la llave correcta?
QW6 ¿Cómo de lleno está el barrio?
QW7 ¿Por qué cuidas los bienes y no la salud?
QW8 ¿Cómo numero los puntos que obtiene cada equipo?
QW9 ¿Cuál es el signo de la paz?
QW10 ¿Por qué quieres tirar ese número de teléfono?

Interview

The interview is, finally, a project-wide extension of the design. It adds a reflective component to the controlled reading tasks: learners are not only recorded, but also asked about their own pronunciation, perceived difficulties, and striking phenomena observed during the recording process. This means that the corpus includes not only external observation, but also the learners’ own perspective. For the study of learner pronunciation, this is especially important because it documents not only realisations, but also metalinguistic judgments and subjective perceptions of difficulty. The interview component can also be linked to work that understands phonological awareness and metacognitive reflection as part of pronunciation work (cf. Blanco Canales 2024).

At the same time, the interview component shows that the Spanish corpus is part of a broader working structure in which research, data collection, and material development are thought together from the outset. Information on the contributors to the overall project is gathered on Team & contributors.

Scope and limitations of the protocol

The protocol developed here does not claim to be perfect for every possible research question. It was designed with learners in mind and attempts to elicit central pronunciation phenomena, especially in the area of consonantism, in a systematic and comparable way. Depending on the research interest, Pronunciation Matters therefore provides suitable research material or at least a starting point for further analyses.

If a single phenomenon is to be investigated in great depth, a separate, specifically designed data collection may still be necessary. The online corpus made available to the scholarly community is intended to allow many questions to be tested first on the basis of real learner data, to support pilot studies, and to prepare pretests. Like any more general corpus, it remains limited and is the result of necessary compromises between comparability, learner orientation, phenomenon coverage, and practical feasibility.

Footnotes

For the overall project, its development, platform logic, methodological infrastructure, and collaborative setup, see the project pages What this project is about, Project structure, Data & methods, and Team & contributors. ↩
Examples of (I)FEC items that were not retained or proved problematic: estudiéis, cambiáis, bou, miau, guau, kétchup, iceberg, ñandú, yunque, ciempiés, rosbif, coñac, chalet, suntuoso, diurético. ↩
Word forms retained from (I)FEC: reloj, viuda, tabúes, querría, caída, numeró, toros, flor, ríe, hoy, juzgar, signo, labio, deuda, queja, ladrón, club, vainilla, número, ángel, caza, logro, mismo, vino, admirar, sueño, álbum, chico, algo, ahí, enfermo, diablo, nadie, causa, llave, perro, cuidar, baile, drama, vienes, gracias, oído, casa, ración, muchacho, salud, quería, paz, champán, obtiene, oye, reír, lleno, Europa, allí, numero, otros, pero. ↩
Newly added word forms: mesa, neutro, ciudad, lavar, avión, jamón, gente, regla, euforia, oír, usted, giro, cuidado, solo, vacío, traer, jefe, vida, ustedes, tirar, carro, tierra, tampoco, hambre, suave, barrio, verdad, caro, bien, ola, hola, bienes. ↩

References

Bárkányi, Zsuzsanna / Galindo Merino, M. Mar / Pérez-Bernabeu, Aarón (Hg.) (2024): La integración de la pronunciación en el aula de ELE. Amsterdam/Philadelphia: John Benjamins. DOI: https://doi.org/10.1075/ivitra.42
Benet, Ariadna / Pešková, Andrea (2017): „Cómo reducir el ‚acento extranjero‘ en el ELE“. Der fremdsprachliche Unterricht Spanisch 58, 16–20.
Blanco Canales, Ana (2024): „La enseñanza de la pronunciación basada en el desarrollo de la conciencia fonológica y la metacognición“. In: Bárkányi, Zsuzsanna / Galindo Merino, M. Mar / Pérez-Bernabeu, Aarón (Hg.): La integración de la pronunciación en el aula de ELE. Amsterdam/Philadelphia: John Benjamins, 175–189.
Coloma, Germán (2015): „Una versión alternativa de ‚El viento norte y el sol‘ en español“. Revista de Investigación Lingüística 18, 191–212.
Deterding, David (2006): „The North Wind versus a Wolf: Short Texts for the Description and Measurement of English Pronunciation“. Journal of the International Phonetic Association 36(2), 187–196.
Detey, Sylvain / Durand, Jacques / Laks, Bernard / Lyche, Chantal (2016): „The PFC programme and its methodological framework“. In: Detey, Sylvain / Durand, Jacques / Laks, Bernard / Lyche, Chantal (Hg.): Varieties of Spoken French. Oxford: Oxford University Press, 13–23.
Díez Plaza, César L. (2024): „La ortografía en la enseñanza de ELE: De la grafía a la pronunciación“. In: Bárkányi, Zsuzsanna / Galindo Merino, M. Mar / Pérez-Bernabeu, Aarón (Hg.): La integración de la pronunciación en el aula de ELE. Amsterdam/Philadelphia: John Benjamins, 40–54.
Fullana, Natalia / Pujolà, Joan-Tomàs (2024): „L2 pronunciation in the spotlight: From phonological awareness to oral production in communicative tasks“. In: Bárkányi, Zsuzsanna / Galindo Merino, M. Mar / Pérez-Bernabeu, Aarón (Hg.): La integración de la pronunciación en el aula de ELE. Amsterdam/Philadelphia: John Benjamins, 88–101.
Gabriel, Christoph / Meisenburg, Trudel / Selig, Maria (2025): Spanisch: Phonetik und Phonologie. Eine Einführung. 2., überarb. Aufl. Tübingen: Narr Francke Attempto. DOI: https://doi.org/10.24053/9783381100125
Gil Fernández, Juana (2007): Fonética para profesores de español: de la teoría a la práctica. Madrid: Arco Libros.
Hidalgo Gallardo, Matías / Pérez Serrano, Mercedes (2024): „Léxico y pronunciación: Un camino de ida y vuelta“. In: Bárkányi, Zsuzsanna / Galindo Merino, M. Mar / Pérez-Bernabeu, Aarón (Hg.): La integración de la pronunciación en el aula de ELE. Amsterdam/Philadelphia: John Benjamins, 12–24.
International Phonetic Association (1999): Handbook of the International Phonetic Association. Cambridge: Cambridge University Press.
Labov, William (1972): Sociolinguistic Patterns. Philadelphia: University of Pennsylvania Press.
Marrero, Victoria (2024): „De la comprensión auditiva a la pronunciación“. In: Bárkányi, Zsuzsanna / Galindo Merino, M. Mar / Pérez-Bernabeu, Aarón (Hg.): La integración de la pronunciación en el aula de ELE. Amsterdam/Philadelphia: John Benjamins, 72–87.
Pešková, Andrea (o. J.): Archivo de los acentos en el ELE. Online: https://andrea-peskova.com/archivo-de-los-acentos-l2/
Pustka, Elissa (2021): Phonetik und Phonologie des Spanischen. Eine korpuslinguistische Einführung. Berlin: Erich Schmidt.
Pustka, Elissa / Gabriel, Christoph / Meisenburg, Trudel / Burkard, Monja / Dziallas, Kristina (2018): „(Inter-)Fonología del Español Contemporáneo (I)FEC: Metodología de un programa de investigación para la fonología de corpus“. Loquens 5(1), e046.
Racine, Isabelle / Zay, Françoise / Detey, Sylvain / Kawaguchi, Yuji (2012): „Des atouts d’un corpus multitâches pour l’étude de la phonologie en L2: l’exemple du projet ‚Interphonologie du français contemporain‘ (IPFC)“. In: Kamiyama, Takeki / Kawaguchi, Yuji / Minegishi, Makoto (Hg.): Corpus-based Analysis and Diachronic Linguistics. Amsterdam: John Benjamins, 1–19.
Santamaría Busto, Enrique (2024): „Hacia una evaluación comunicativa y eficaz de la pronunciación del español“. In: Bárkányi, Zsuzsanna / Galindo Merino, M. Mar / Pérez-Bernabeu, Aarón (Hg.): La integración de la pronunciación en el aula de ELE. Amsterdam/Philadelphia: John Benjamins, 190–206.
Tacke, Felix (2023–2024): MAR.ELE – Corpus sobre la pronunciación del español por aprendientes de ELE en Marburg. Marburg: Philipps-Universität Marburg. Online: https://hispanistica.com/projects/marele/

Starting point

Previous work and empirical starting point

Methodological starting point

Design of the wordlist

Wordlist

Limitations of traditional reading passages

The sentence list as a controlled alternative

Sentence list

Interview

Scope and limitations of the protocol

Footnotes

References