- Psychometric Methodologies and Testing
- Advanced Statistical Modeling Techniques
- Reliability and Agreement in Measurement
- Multi-Criteria Decision Making
- Cognitive Abilities and Testing
- Educational and Psychological Assessments
- Cognitive Science and Mapping
- Advanced Statistical Methods and Models
- Natural Language Processing Techniques
- Education, Achievement, and Giftedness
- Speech Recognition and Synthesis
- Mental Health Research Topics
- Medical Education and Admissions
- Intelligent Tutoring Systems and Adaptive Learning
- Statistical Methods in Clinical Trials
- Conflict Management and Negotiation
- Energy Load and Power Forecasting
- Spreadsheets and End-User Computing
- Regional Economic and Spatial Analysis
- Speech and dialogue systems
- Resilience and Mental Health
- Student Assessment and Feedback
- Control Systems and Identification
- Linguistic Education and Pedagogy
- Optimal Experimental Design Methods
Educational Testing Service
2009-2018
University of Notre Dame
2008-2011
The purpose of this paper is to review some the key literature on response time as it has played a role in cognitive ability measurement, providing historical perspective well covering current research. We discuss speed-level distinction, dimensions speed and level abilities frameworks, speed–accuracy tradeoff, approaches addressing analysis methods, particularly item theory-based, models from psychology (ex-Gaussian function, diffusion model), other uses testing besides measurement. several...
Existing studies of mediation models have been limited to normal-theory maximum likelihood (ML). Because real data in the social and behavioral sciences are seldom normally distributed often contain outliers, classical methods generally lead inefficient or biased parameter estimates. Consequently, conclusions from a analysis can be misleading. In this article, we propose 2 approaches alleviate these problems. One is identify cases that strongly affect testing using local influence robust...
Dynamic factor analysis models with time-varying parameters offer a valuable tool for evaluating multivariate time series data dynamics and/or measurement properties. We use the Model of Activation proposed by Zautra and colleagues (Zautra, Potter, & Reich, 1997) as motivating example to construct dynamic model vector autoregressive relations cross-regression at level. Using techniques drawn from state-space literature, was fitted set daily affect (over 71 days) 10 participants who had been...
Equating of tests composed both discrete and passage‐based multiple choice items using the nonequivalent groups with anchor test design is popular in practice. In this study, we compared effect on observed score equating via simulation. Results suggested that an a larger proportion items, more each passage, and/or degree local dependence among within one passage produces errors, especially when taking new form reference differ ability. Our findings challenge common belief should be miniature...
In this study, we define the term screener test, elaborate key considerations in test design, and describe how to incorporate concepts of practicality argument-based validation drive an evaluation tests for language assessment. A is defined as a brief assessment designed identify examinee member particular population or subpopulation. Consequently, its focus measurement provide information that distinguishes targeted subpopulations. Although trade-off between quality important consideration...
We evaluated the use of nominal response model (NRM) to score multiple-choice (also known as “select best option”) situational judgment tests (SJTs). Using data from two large studies, we compared reliability and correlations NRM scores with those various classical item theory (IRT) scoring methods. The SJTs measured emotional management (Study 1) teamwork collaboration 2). In Study 1 method was shown be superior in yielding higher external measures three test theory–based four other...
Abstract For a multiple‐choice test under development or redesign, it is important to choose the optimal number of options per item so that possesses desired psychometric properties. On basis available data for assessment with 8 options, we evaluated effects changing on properties (difficulty, reliability, and score comparability) using simulation. Using 2 criteria (low frequency poor discrimination) remove nonfunctioning schemes (random educated guessing) model hypothetical response...
In the nonequivalent groups with anchor test (NEAT) design, standard error of linear observed‐score equating is commonly estimated by an estimator derived assuming multivariate normality. However, real data are seldom normally distributed, causing this normal to be inconsistent. A general estimator, which does not rely on normality assumption, would preferred, because it asymptotically accurate regardless distribution data. article, analytical formula for equating, characterizes effect...
Preequating is in demand because it reduces score reporting time. In this article, we evaluated an observed‐score preequating method: the empirical item characteristic curve (EICC) method, which makes without response theory (IRT) possible. EICC results were compared with a criterion equating and IRT true‐score conversions. Results suggested that method worked well under conditions considered study. The difference between conversion was smaller than .5 raw‐score points (a practical often...
Abstract In this report, systematic applications of statistical and psychometric methods are used to develop evaluate scoring rules in terms test reliability. Data collected from a situational judgment facilitate the comparison. For well‐developed item with appropriate keys (i.e., correct answers), agreement among various item‐scoring is expected item‐option characteristic curves. addition, when models based on item‐response theory fit data, reliability greatly improved, particularly if...
The purpose of this study is to investigate the impact discrete anchor items versus passage‐based on observed score equating using empirical data. This compares an SAT ® critical reading that contains more proportionally, compared total tests be equated, another fewer and proportionally. Both these anchors were administered in administration. type was evaluated with respect bias. results clearly reveal almost always leads accurate functions than does items.
ABSTRACT Equating of tests composed both discrete and passage‐based items using the nonequivalent groups with anchor test (NEAT) design is popular in practice. This study investigated impact on observed score equating via simulation. Results suggested that an a larger proportion and/or degree local dependence among produces errors, especially when group ability differences are not minimal. Our findings challenge common belief should be miniature version to equated.
Mediation analysis investigates how certain variables mediate the effect of predictors on outcome variables. Existing studies mediation models have been limited to normal theory maximum likelihood (ML) or least squares with normally distributed data. Because real data in social and behavioral sciences are seldom often contain outliers, classical methods can result biased inefficient estimates, which lead inaccurate unreliable test meditated effect. The authors propose two approaches for...
Synthetically generated speech (SGS) has become an integral part of our oral communication in a wide variety contexts. It can be instantly at low cost and allows precise control over multiple aspects output, all which highly appealing to second language (L2) assessment developers who have traditionally relied upon human voice actors for recording audio materials. Nevertheless, SGS is not widely used L2 assessments. One major concern this use case lies its potential impact on test‐taker...