فهرست مطالب

Language Testing - Volume:13 Issue: 1, Mar 2023

International Journal of Language Testing
Volume:13 Issue: 1, Mar 2023

  • تاریخ انتشار: 1402/02/17
  • تعداد عناوین: 12
|
  • Niloufar Shahmirzadi * Pages 1-17
    The documentation of test takers’ achievements has been accomplished through large-scale assessments to find general information about students’ language ability. To remove subjectivity, Cognitive Diagnostic Assessment (CDA) has recently played a crucial role in perceiving candidates’ latent attribute patterns to find multi-diagnostic information rather than single proficiency classification. However, there are some gaps in the literature about in detail investigation of test takers’ listening comprehension language ability in responding to placement test items of a public English language center. The present study aims to validate an English placement test at a language center through a retrofitting process. In an exploratory mixed-method design, 449 participants from the same language center, including 274 females and 175 males, were selected. The performance of randomly selected participants in a language center placement test was analyzed by applying the GDINA model from R-studio packages, to detect Differential Item Functioning (DIF). Results of the study revealed DIF in some items since there is some bias in test items. The implication of this study is to provide meaningful interpretations of respondents’ attributes and improve teaching and learning by finding the strengths and weaknesses of candidates. For this purpose, the findings derived from the result of the study can raise the awareness of test developers in preparing unbiased items for the placement test, and at the same time, assist test-takers to become more critical of their English language achievements. It is also helpful for materials developers to become aware of developing materials free from bias.
    Keywords: cognitive diagnostic assessment, Differential Item Functioning, listening comprehension, Placement test
  • Zeinab Azizi, Ehsan Namaziandost * Pages 18-43
    Though dynamic assessment (DA) has gained strong theoretical and empirical support over the last decades, second language (L2) practitioners have blamed it for its applicability in large classes. To ameliorate this limitation, peer-dynamic assessment (peer-DA), rooted in the conceptualization of zone of proximal development (ZPD), can be introduced and practiced as an alternative approach. Thus, this study aimed to investigate the effects of peer-DA on cultivating Iranian upper-intermediate EFL learners’ interlanguage pragmatic (ILP) competence. Additionally, it was to disclose how peer-DA leads to improving the learners’ ILP competence. To achieve these aims, a sample of 84 upper-intermediate EFL learners, including females was selected through a convenience sampling method at Iran Language Institute in Borujerd City, Iran. Then, a total of 37 EFL learners whose scores fell around the mean score were selected and randomly assigned to two groups, namely an experimental (n = 19) and a control (n = 18). Then, they went through a pre-test, interventions (lasting 16 one-hour sessions held two times a week), and a post-test. The experimental group’s interactions were meticulously recorded. The collected data were analyzed through two independent samples t-tests, and the microgenetic development approach. Findings documented a statistically significant difference between the experimental group and control group concerning the gains of ILP competence on the post-test. Furthermore, the results of the microgenetic development analysis evidenced how the gradual, contingent prompts could lead to noticeable improvements in the learning of ILP features. These findings may have some pedagogical implications for different stakeholders.
    Keywords: Zone of Proximal Development, Interlanguage pragmatic competence, Peer-dynamic assessment, Microgenetic development approach, EFL learners
  • Behnaz Rastegar, Abbas Zarei * Pages 44-66
    Much has been done on assessment literacy (AL) components and job demand-resources (JD-R). However, an interdisciplinary look at AL components as the predictors of JD-R and its possible consequences in the engagement and burnout of teachers’ assessment performance has been neglected. To fill this gap, the present study explored this issue in the context of Iran. To this end, through convenience sampling, 146 Iranian EFL teachers were selected to answer questionnaires on AL, JD-R, burnout and engagement. A series of multiple regression analyses were run to analyze the collected data. The results showed that some components of AL such as test construction, administering, rating, and interpreting test, psychometric properties of a test, using and interpreting statistics and authenticity were significant predictors of job demand. Moreover, the results revealed that alternative and digital-based assessment, recognizing test type, distinction and function, and authenticity were significant predictors of job resource. Furthermore, test construction, administering, rating, and interpreting test, psychometric properties of a test, and using and interpreting statistics could significantly predict teachers’ burnout. In addition, alternative and digital-based assessment, giving feedback in assessment, and ethical and cultural considerations in assessment turned out to significantly predict teachers’ engagement. These findings can have theoretical and practical implications for stakeholders.
    Keywords: assessment literacy, Job demand, Job resource, burnout, Engagement
  • MohammadReza Anani Sarab, Simindokht Rahmani * Pages 67-103

    Language testing and assessment has grown in popularity and gained significance in the last few decades, and there is a rising need for assessment literate stakeholders in the field of language education. As teachers play a major role in assessing students, there is a need to make sure they have the right level of assessment knowledge and skills to accomplish their duties as assessors. The present study sought to develop and validate an assessment literacy test. To this end, a thirty-five scenario-based Language Assessment Literacy Test (LALT) was developed based on the seven standards covered by the Standards for Teacher Competence in Educational Assessment of Students (1990). Construct validity of the test was investigated by collecting data from 168 Iranian EFL teachers. To investigate the validity of the measure, tests of reliability, item difficulty, and item discrimination were carried out. The test was then subjected to EFA (Exploratory Factor Analysis whose results showed a seven-factor solution for the test items. The implications of the study for EFL teachers, language testers, teacher trainers, and curriculum developers are discussed.

    Keywords: Assessment, Exploratory Factor Analysis (EFA), Language Assessment Literacy, teachers’ language assessment literacy
  • Mohammad Ahmadi Safa *, Hamidreza Sheikholmoloki Pages 104-132
    Iranian National University Entrance Exam (INUEE) as a nationwide high-stakes test is held annually to screen Iranian high school graduates and admit them into higher education programs in universities. This high-stakes examination has a wide range of impacts on test takers as the primary stake-holders and the parents, teachers, and high school principals as the secondary stakeholders. This study reports the impacts of INUEE on high school teachers and principals. To this aim, 27 teachers and 18 principals from three western provinces of Iran sat for a structured interview. Each interview lasted nearly 30 minutes. All the interviews were audio-recorded and transcribed. Next, following the Grounded Theory (Glaser & Strauss, 1967) as the basis of analysis, the transcriptions were subjected to content analysis to extract common patterns and recurring themes. Content analysis was applied to codify the transcribed interview data through an inductive process of frequent moving back and forth to extract common patterns and recurring themes of the data. After coding and 'quantitizing' the data (Dörnyei, 2007), the basic themes were identified, frequency counted, and tabulated. The results indicated that from the majority of the participants' perspective, the INUEE has detrimental consequences for students, teachers, school principals, and the educational curriculum. The findings of the study underscore the consequential invalidity and unfairness of the test and its negative impacts on different aspects of the educational system. The findings provide practical implications for educational policy-makers, school principals, and teachers highlighting the necessity of their awareness of negative consequences of INUEE.
    Keywords: High-Stakes Test, Impact, INUEE, Iran, Test fairness
  • Sayed Hadi Sadeghi * Pages 133-138

    The book Practical Language Testing, authored by Glenn Fulcher (2010), is an epitome of comprehensive and detailed documentation on the relevance of the appropriate techniques, knowledge, and conceptual frameworks required to interpret and develop language assessments. It is also a systematic, helpful reference for researchers and educators that address assessment selection and design concerns, offering instructors guidance on picking the proper test for a particular educational setting and assisting in interpreting the scores generated by such tests. The book lays the groundwork for instructors to establish a thoughtful awareness of test scenarios to furnish the most efficient language assessment for their learners. The book's language is highly lucid and user-friendly to both experts and novices. The book finds an appropriate amalgamation between practical applications and theoretical underpinnings to empower practitioners in comprehending the objectives of effective assessment and its function in facilitating good learning and teaching. On account of the fact that the referenced book has the potential to make a substantial contribution and have significant implications for teachers, test designers, and other stakeholders, the current article attempts to provide a review of the book and emphasize the critical concepts in each chapter.

    Keywords: Practical language testing, score interpretation, Test Administration, Test construction, test purpose
  • Arturo Mendoza Ramos *, Joaquín Martinez Pages 139-165
    Language placement tests (LPTs) are used to assess students’ proficiency in a progressive manner in the target language. Based on their performance, students are assigned to stepped language courses. These tests are usually considered low stakes because they do not have significant consequences in students’ lives, which is perhaps the reason why studies conducted with LPTs are scarce. Nevertheless, tests should be regularly examined, and statistical analysis should be conducted to assess their functioning, particularly when they have a medium or high-stakes impact. In the case of LPTs administered on a large-scale, the logistic and administrative consequences of an ill-defined test may lead to an economic burden and unnecessary use of human resources which can also affect students negatively. This study was undertaken at one of the largest public institutions in Latin America. Nearly 1700 students sit an English LPT every academic semester. A diagnostic statistical analysis revealed a need for revision. To retrofit the test, a new test architecture and blueprints were designed in adherence to the new curriculum, and new items were developed and tried out gradually in several pilot studies. Item Response Theory (IRT) was used to examine the functioning of the new test items. The aim of this study is to show how the test was retrofitted, and to compare the functioning of the retrofitted version of the English LPT with the previous one. The results show that the quality of items was higher than that of the former English LPT.
    Keywords: English in higher education, item quality, Item Response Theory, large-scale assessment, placement tests
  • Ahmad Yulianto *, Anastasya Pudjitriherwanti, Chevy Kusumah, Dies Dwi Astuti Pages 166-187
    The increasing use of computer-based mode in language testing raises concern over its similarities with and differences from paper-based format. The present study aimed to delineate discrepancies between TOEFL PBT and CBT. For that objective, a quantitative method was employed to probe into scores equivalence, the performance of male-female participants, the relationship between completion time and test score, and test mode’s effects on participants’ performance. Totally, 124 undergraduates partook in the current research whose ages ranged from 19 – 21 years (M = 20, SD = .66). To analyze the data, MANOVA, Pearson correlation, and regression tests were run. The findings uncovered that: (1) PBT and CBT were equivalent in scores; (2) male and female’s scores were not significantly different; (3) there was a moderately negative correlation between completion time and score; (4) computer familiarity, habit in using computers, and perception toward CBT did not affect performance in TOEFL. For researchers, the implication of this study concerns the interchangeability of the two-test modes. For CBT test designers, it concerns the appropriate inclusion of visuals, time related measurement, and procedures to design computer-based tests.
    Keywords: CBT, computer familiarity, discrepancies, equivalent, PBT
  • Nia Kurniasih *, Emi Emilia, Eva Sujatna Pages 188-205
    This study aimed at evaluating PISA-like reading test developed by teachers participating in the teacher training for teaching PISA-like reading. To serve the purpose, an experimental test was administered to 107 students aged 15-16 using a set of text and questions constructed according to the criteria of PISA Reading test Level 1. Item analysis was performed following the sampling using Rasch Measurement, deemed essential for determining the ideal index of test items relative to students’ ability in making correct response. The component of the calculation comprises reliability, separation, dan standard error. The Rasch model was constructed manually using Microsoft Excel to obtain the result of the calculation, and a Wright Map was also made manually to illustrate the result of the calculation. The results of the item analysis indicated that the test and the items the teachers constructed have met good criteria. The results revealed an even distribution of test item difficulty at the targeted level. The samples’ ability to make correct answers, however, was decentralized towards the test items of moderate level of difficulty. Only a limited number of students have showed good ability in their response to test items of higher difficulty.
    Keywords: Teacher Training, RASCH measurement, ability, difficulty
  • Mojtaba Mohammadi *, Maryam Zarrabi, Jaber Kamali Pages 206-224
    With the incremental integration of technology in writing assessment, technology-generated feedback has found its way to take further steps toward replacing human correcting and rating. Yet, further investigation is deemed necessary regarding its potential use either as a supplement to or replacement for human feedback. This embedded mixed-method research aims to investigate three groups of Iranian intermediate IELTS applicants who received automated, teacher, and blended (automated + teacher) feedback modes on different aspects of writing when practicing for the writing skill of IELTS. Furthermore, a structured written interview was conducted to explore learners’ perception (attitude, clarity, preference) of the mode of feedback they received. Findings revealed that students who received teacher-only and blended feedback performed better in writing. Also, the blended feedback group outperformed the others regarding task response, the teacher feedback group in cohesion and coherence, and the automated feedback group in lexical resource. The analysis of the interviews revealed that learners had high opinion regarding the clarity of all feedback modes and learners’ attitude about feedback modes were positive. However, they highly preferred the blended one. The findings suggest new ideas that can facilitate learning and assessing writing and recommend that teachers provide comprehensive, accurate, and continuous feedback as a means of formative assessment.
    Keywords: Automated Writing Evaluation (AWE), Blended feedback, Formative Assessment, IELTS writing, Learners’ Perception
  • Alireza Manzari * Pages 225-235
    Modern teaching practices emphasize learner autonomy and learner-centered approaches to language learning. Such teaching methods require corresponding assessment approaches. Self-assessment is viewed as an assessment mode which matches modern learner-centered teaching methodologies. However, the validity and reliability of self-assessments are not yet conclusively established. This study aimed to provide validity and reliability evidence for self-assessments among Iranian EFL university learners. The Common European Framework of Reference (CEFR) Self-Assessment Grid was translated into Persian and was given to a sample of Iranian undergraduate students of English. A C-Test battery containing four passages was used as a criterion for concurrent validation. Self-assessments of university EFL learners were examined for internal consistency and test-retest reliability. Findings showed that while self-assessments are highly reliable they lack validity as evidenced with low correlations between components of self-assessment grid and the C-Test. The implications of the study for the application of self-assessments in foreign language education are discussed.
    Keywords: learner-centered teaching, learner autonomy, Self-assessment, Reliability, Validity
  • Laura Naka * Pages 236-259

     Formative assessment has been often encouraged recently as a pivot component in the methodology of evaluation but the student's view of how this affects the improvement of language learning in some countries is disregarded in a way. Therefore, the following study investigates the experiences of students (pre-service teachers) in formative assessment and the tools used for its implementation. Teaching and evaluation are based on learning outcomes and students' approach to the activities that students are liable to convey in the course of English for teachers in the Department of Education. The study analyzed students' perceptions and their approach to formative assessment utilization. The Formative Assessment appears as a dignified way to raise the quality of learning, and as a result, the students achieve their goal to pass the exam successfully and get high grades. To acquaint their perspectives, 85 students as study respondents took part in the research in which both quantitative and qualitative methods were used. The quantitative research data from the questionnaire were analyzed through the SPSS platform, while the qualitative research data extracted from the focus group are presented through descriptive analysis, in the form of quotations. The findings of the questionnaires evinced the students' views on formative assessment and further, data from focus group discussions validated the reasonableness of what the students perceived in the questionnaire. Implications recommended that in addition to the tiresome work of continuously preparing during the teaching, EFL teachers are able to meet the individual needs of the students depending on the differences they have among themselves.

    Keywords: pre-service teachers, assessment tools, Teaching Methodology, Activities