Analyzing within-group change in an experimental context, where the same group of people is measured before and after some event, can be fraught with statistical problems and issues with causal inference. Still, these designs are common from political science to developmental neuropsychology to economics. In cases with cognitive data, it has long been known that a second administration, with no treatment or an ineffective manipulation between testings, leads to increased scores at time 2 without an increase in the underlying latent ability. We investigate several analytic approaches involving both manifest and latent variable modeling to see which methods are able to accurately model manifest score changes with no latent change. Using data from 760 schoolchildren given an intelligence test twice, with no intervention between, we show using manifest test scores, either directly or through univariate latent change score analysis, falsely leads one to believe an underlying increase has occurred. Second-order latent change score models also show a spurious significant effect on the underlying latent ability. Longitudinal structural equation modeling with measurement invariance correctly shows no change at the latent level when measurement invariance is tested, imposed, and model fit tested. When analyzing within-group change in an experiment, analyses must occur at the latent level, measurement invariance tested, and change parameters explicitly tested. Otherwise, one may see change where none exists.
Published in | Science Journal of Applied Mathematics and Statistics (Volume 13, Issue 2) |
DOI | 10.11648/j.sjams.20251302.12 |
Page(s) | 34-44 |
Creative Commons |
This is an Open Access article, distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution and reproduction in any medium or format, provided the original work is properly cited. |
Copyright |
Copyright © The Author(s), 2025. Published by Science Publishing Group |
Pre-post Change, Statistical Methods, Model Comparison, Latent Variable Modelling
[1] | Arrindell, W. A. (1993). The fear of fear concept: Stability, retest artefact and predictive power. Behaviour Research and Therapy, 31(2), 139-148. |
[2] | Arrindell, W. A. (2001). Changes in waiting-list patients over time: data on some commonly- used measures. Beware! Behaviour Research and Therapy, 39(10), 1227-1247. |
[3] | Berns, C., Brüchle, W., Scho, S., Schneefeld, J., Schneider, U., & Rosenkranz, K. (2020). Intensity dependent effect of cognitive training on motor cortical plasticity and cognitive performance in humans. Experimental Brain Research, 238(12), 2805-2818. |
[4] | Bonnechère, B., Klass, M., Langley, C., & Sahakian, B. J. (2021). Brain training using cognitive apps can improve cognitive performance and processing speed in older adults. Scientific Reports, 11(1), 1-11. |
[5] | Borsboom, D., Mellenbergh, G. J., & van Heerden, J. (2003). The theoretical status of latent variables. Psychological Review, 110(2), 203. |
[6] | Brown, T. A. (2006). Confirmatory factor analysis for applied research. New York: Guilford Press. |
[7] | Cane, V. R., & Heim, A. W. (1950). The effects of repeated retesting: III. Further experiments and general conclusions. Quarterly Journal of Experimental Psychology, 2(4), 182-197. |
[8] | Castro-Schilo, L., & Grimm, K. J. (2018). Using residualized change versus difference scores for longitudinal research. Journal of Social and Personal Relationships, 35, 32-58. |
[9] | Choquette, K. A., & Hesselbrock, M. N. (1987). Effects of retesting with the Beck and Zung depression scales in alcoholics. Alcohol and Alcoholism, 22(3), 277-283. |
[10] | Coman, E. N., Picho, K., McArdle, J. J., Villagra, V., Dierker, L., & Iordache, E. (2013). The paired t-test as a simple latent change score model. Frontiers in Psychology, 4, 738. |
[11] | Durham, C. J., McGrath, L. D., Burlingame, G. M., Schaalje, G. B., Lambert, M. J., & Davies, D. R. (2002). The effects of repeated administrations on self-report and parent-report scales. Journal of Psychoeducational Assessment, 20(3), 240-257. |
[12] | Eid, M. (2000). A multitrait-multimethod model with minimal assumptions. Psychometrika, 65(2), 241-261. |
[13] | Fan, X., Thompson, B., & Wang, L. (1999). Effects of sample size, estimation methods, and model specification on structural equation modeling fit indexes. Structural Equation Modeling: A Multidisciplinary Journal, 6(1), 56-83. |
[14] | Farmus, L. Arpin-Cribbie, C. A., & Cribbie, R. A. (2019). Continuous predictors of pretestposttest change: Highlighting the impact of the regression artifact. Frontiers of Applied Mathematics and Statistics, 4, 64. |
[15] | Ferrer, E., Balluerka, N., & Widaman, K. F. (2008). Factorial invariance and the specification of second-order latent growth models. Methodology, 4(1), 22-36. |
[16] | Geiser, C., & Lockhart, G. (2012). A comparison of four approaches to account for method effects in latent state-trait analyses. Psychological Methods, 17(2), 255. |
[17] | Griffin, B., Bayl‐Smith, P., Duvivier, R., Shulruf, B., & Hu, W. (2019). Retest effects in medical selection interviews. Medical Education, 53(2), 175-183. |
[18] | Hoffman, L., Hofer, S. M., & Sliwinski, M. J. (2011). On the confounds among retest gains and age-cohort differences in the estimation of within-person change in longitudinal studies: a simulation study. Psychology and Aging, 26(4), 778. |
[19] | Jensen, A. R. (1965). Scoring the Stroop test. Acta Psychologica, 24(5), 398-408. |
[20] | Jones, S. M., Shulman, L. J., Richards, J. E., & Ludman, E. J. (2020). Mechanisms for the Testing Effect on Patient-Reported Outcomes. Contemporary Clinical Trials Communications, 100554. |
[21] | Kievit, R. A., Brandmaier, A. M., Ziegler, G., Van Harmelen, A. L., de Mooij, S. M., Moutoussis, M.,... & Lindenberger, U. (2018). Developmental cognitive neuroscience using latent change score models: A tutorial and applications. Developmental Cognitive Neuroscience, 33, 99-117. |
[22] | Köhler, C., Hartig, J., & Schmid, C. (2020). Deciding between the covariance analytical approach and the change-score approach. Multivariate Behavioral Research. |
[23] | Lenhart, L., Steiger, R., Waibel, M., Mangesius, S., Grams, A. E., Singewald, N., & Gizewski, E. R. (2020). Cortical reorganization processes in meditation naïve participants induced by 7 weeks focused attention meditation training. Behavioural Brain Research, 112828. |
[24] | Longwell, B. T., & Truax, P. (2005). The differential effects of weekly, monthly, and bimonthly administrations of the Beck Depression Inventory-II: Psychometric properties and clinical implications. Behavior Therapy, 36(3), 265-275. |
[25] | Lüdtke, O., & Robitzsch, A. (2020, September 12). ANCOVA versus Change Score for the Analysis of Nonexperimental Two-Wave Data: A Structural Modeling Perspective. |
[26] | Maris, E. (1998). Covariance adjustment versus gain scores—revisited. Psychological Methods, 3, 309-327. |
[27] | Markus, K. A., & Borsboom, D. (2013). Frontiers of test validity theory: Measurement, causation, and meaning. New York, NY: Routledge |
[28] | Maulik, P. K., Kallakuri, S., Devarapalli, S., Vadlamani, V. K., Jha, V., & Patel, A. (2017). Increasing use of mental health services in remote areas using mobile technology: a pre- post evaluation of the SMART Mental Health project in rural India. Journal of Global Health, 7(1): 010408. |
[29] | McArdle, J. J. (2009). Latent variable modeling of differences and changes with longitudinal data. Annual Review of Psychology, 60, 577-605. |
[30] | O’Neill, S. O., Kreif, N., Grieve, R., Sutton, M., & Sekhon, J. S. (2016). Estimating causal effects: Considering three alternatives to difference-in-difference estimation. Health Service and Outcomes Research Methodology, 16, 1-21. |
[31] | Pearl, J. (2016). Lord’s paradox revisited-(oh Lord! Kumbaya!). Journal of Causal Inference, 4(2). |
[32] | Stieger, M., Wepfer, S., Rüegger, D., Kowatsch, T., Roberts, B. W., & Allemand, M. (2020). Becoming more conscientious or more open to experience? Effects of a two‐week smartphone‐based intervention for personality change. European Journal of Personality. Advanced online publication |
[33] | Ormel, J., Koeter, M. W. J., & Van den Brink, W. (1989). Measuring change with the General Health Questionnaire (GHQ). Social Psychiatry and Psychiatric Epidemiology, 24(5), 227-232. |
[34] | Sliwinski, M., Hoffman, L., & Hofer, S. M. (2010). Evaluating convergence of within-person change and between-person age differences in age-heterogeneous longitudinal studies. Research in Human Development, 7(1), 45-60. |
[35] | van Breukelen, G. J. (2013). ANCOVA versus CHANGE from baseline in nonrandomized studies: The difference. Multivariate Behavioral Research, 48(6), 895-922. |
[36] | Van Iddekinge, C. H., & Arnold, J. D. (2017). Retaking employment tests: What we know and what we still need to know. Annual Review of Organizational Psychology and Organizational Behavior, 4, 445-471. |
[37] | Vernon, P. E. (1954, March). Practice and coaching effects in intelligence tests. In The Educational Forum (Vol. 18, No. 3, pp. 269-280). Taylor & Francis. |
[38] | Wallis, P. S. (2013). The impact of screen format and repeated assessment on responses to a measure of depressive symptomology completed twice in a short timeframe (Doctoral dissertation, Arts & Social Sciences: Department of Psychology). |
[39] | Wicks, R. H. (1992). Improvement over time in recall of media information: An exploratory study. Journal of Broadcasting & Electronic Media, 36(3), 287-302. |
[40] | Windle, C. (1954). Test-retest effect on personality questionnaires. Educational and Psychological Measurement, 14(4), 617-633. |
[41] | Windle, C. (1955). Further studies of test-retest effect on personality questionnaires. Educational and Psychological Measurement, 15(3), 246-253. |
[42] | Zhang, H., Shen, Z., Liu, S., Yuan, D., & Miao, C. (2021). Ping pong: An exergame for cognitive inhibition training. International Journal of Human-Computer Interaction, 1-12. |
APA Style
Protzko, J., Nijenhuis, J. T., Ziada, K. E., Metwaly, H. A. M., Bakhiet, S. F., et al. (2025). Analyzing Within-Group Changes in an Experiment: To Deal with Retest Effects, You Have to Go Latent But Not All Latents Are Equal. Science Journal of Applied Mathematics and Statistics, 13(2), 34-44. https://doi.org/10.11648/j.sjams.20251302.12
ACS Style
Protzko, J.; Nijenhuis, J. T.; Ziada, K. E.; Metwaly, H. A. M.; Bakhiet, S. F., et al. Analyzing Within-Group Changes in an Experiment: To Deal with Retest Effects, You Have to Go Latent But Not All Latents Are Equal. Sci. J. Appl. Math. Stat. 2025, 13(2), 34-44. doi: 10.11648/j.sjams.20251302.12
@article{10.11648/j.sjams.20251302.12, author = {John Protzko and Jan te Nijenhuis and Khaled Elsayed Ziada and Hanaa Abdelazim Mohamed Metwaly and Salaheldin Farah Bakhiet and Yousif Balil Bashir Maki}, title = {Analyzing Within-Group Changes in an Experiment: To Deal with Retest Effects, You Have to Go Latent But Not All Latents Are Equal }, journal = {Science Journal of Applied Mathematics and Statistics}, volume = {13}, number = {2}, pages = {34-44}, doi = {10.11648/j.sjams.20251302.12}, url = {https://doi.org/10.11648/j.sjams.20251302.12}, eprint = {https://article.sciencepublishinggroup.com/pdf/10.11648.j.sjams.20251302.12}, abstract = {Analyzing within-group change in an experimental context, where the same group of people is measured before and after some event, can be fraught with statistical problems and issues with causal inference. Still, these designs are common from political science to developmental neuropsychology to economics. In cases with cognitive data, it has long been known that a second administration, with no treatment or an ineffective manipulation between testings, leads to increased scores at time 2 without an increase in the underlying latent ability. We investigate several analytic approaches involving both manifest and latent variable modeling to see which methods are able to accurately model manifest score changes with no latent change. Using data from 760 schoolchildren given an intelligence test twice, with no intervention between, we show using manifest test scores, either directly or through univariate latent change score analysis, falsely leads one to believe an underlying increase has occurred. Second-order latent change score models also show a spurious significant effect on the underlying latent ability. Longitudinal structural equation modeling with measurement invariance correctly shows no change at the latent level when measurement invariance is tested, imposed, and model fit tested. When analyzing within-group change in an experiment, analyses must occur at the latent level, measurement invariance tested, and change parameters explicitly tested. Otherwise, one may see change where none exists. }, year = {2025} }
TY - JOUR T1 - Analyzing Within-Group Changes in an Experiment: To Deal with Retest Effects, You Have to Go Latent But Not All Latents Are Equal AU - John Protzko AU - Jan te Nijenhuis AU - Khaled Elsayed Ziada AU - Hanaa Abdelazim Mohamed Metwaly AU - Salaheldin Farah Bakhiet AU - Yousif Balil Bashir Maki Y1 - 2025/05/14 PY - 2025 N1 - https://doi.org/10.11648/j.sjams.20251302.12 DO - 10.11648/j.sjams.20251302.12 T2 - Science Journal of Applied Mathematics and Statistics JF - Science Journal of Applied Mathematics and Statistics JO - Science Journal of Applied Mathematics and Statistics SP - 34 EP - 44 PB - Science Publishing Group SN - 2376-9513 UR - https://doi.org/10.11648/j.sjams.20251302.12 AB - Analyzing within-group change in an experimental context, where the same group of people is measured before and after some event, can be fraught with statistical problems and issues with causal inference. Still, these designs are common from political science to developmental neuropsychology to economics. In cases with cognitive data, it has long been known that a second administration, with no treatment or an ineffective manipulation between testings, leads to increased scores at time 2 without an increase in the underlying latent ability. We investigate several analytic approaches involving both manifest and latent variable modeling to see which methods are able to accurately model manifest score changes with no latent change. Using data from 760 schoolchildren given an intelligence test twice, with no intervention between, we show using manifest test scores, either directly or through univariate latent change score analysis, falsely leads one to believe an underlying increase has occurred. Second-order latent change score models also show a spurious significant effect on the underlying latent ability. Longitudinal structural equation modeling with measurement invariance correctly shows no change at the latent level when measurement invariance is tested, imposed, and model fit tested. When analyzing within-group change in an experiment, analyses must occur at the latent level, measurement invariance tested, and change parameters explicitly tested. Otherwise, one may see change where none exists. VL - 13 IS - 2 ER -