Research Article | | Peer-Reviewed

Analyzing Within-Group Changes in an Experiment: To Deal with Retest Effects, You Have to Go Latent But Not All Latents Are Equal

Received: 27 February 2025     Accepted: 10 March 2025     Published: 14 May 2025
Views:       Downloads:
Abstract

Analyzing within-group change in an experimental context, where the same group of people is measured before and after some event, can be fraught with statistical problems and issues with causal inference. Still, these designs are common from political science to developmental neuropsychology to economics. In cases with cognitive data, it has long been known that a second administration, with no treatment or an ineffective manipulation between testings, leads to increased scores at time 2 without an increase in the underlying latent ability. We investigate several analytic approaches involving both manifest and latent variable modeling to see which methods are able to accurately model manifest score changes with no latent change. Using data from 760 schoolchildren given an intelligence test twice, with no intervention between, we show using manifest test scores, either directly or through univariate latent change score analysis, falsely leads one to believe an underlying increase has occurred. Second-order latent change score models also show a spurious significant effect on the underlying latent ability. Longitudinal structural equation modeling with measurement invariance correctly shows no change at the latent level when measurement invariance is tested, imposed, and model fit tested. When analyzing within-group change in an experiment, analyses must occur at the latent level, measurement invariance tested, and change parameters explicitly tested. Otherwise, one may see change where none exists.

Published in Science Journal of Applied Mathematics and Statistics (Volume 13, Issue 2)
DOI 10.11648/j.sjams.20251302.12
Page(s) 34-44
Creative Commons

This is an Open Access article, distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution and reproduction in any medium or format, provided the original work is properly cited.

Copyright

Copyright © The Author(s), 2025. Published by Science Publishing Group

Keywords

Pre-post Change, Statistical Methods, Model Comparison, Latent Variable Modelling

References
[1] Arrindell, W. A. (1993). The fear of fear concept: Stability, retest artefact and predictive power. Behaviour Research and Therapy, 31(2), 139-148.
[2] Arrindell, W. A. (2001). Changes in waiting-list patients over time: data on some commonly- used measures. Beware! Behaviour Research and Therapy, 39(10), 1227-1247.
[3] Berns, C., Brüchle, W., Scho, S., Schneefeld, J., Schneider, U., & Rosenkranz, K. (2020). Intensity dependent effect of cognitive training on motor cortical plasticity and cognitive performance in humans. Experimental Brain Research, 238(12), 2805-2818.
[4] Bonnechère, B., Klass, M., Langley, C., & Sahakian, B. J. (2021). Brain training using cognitive apps can improve cognitive performance and processing speed in older adults. Scientific Reports, 11(1), 1-11.
[5] Borsboom, D., Mellenbergh, G. J., & van Heerden, J. (2003). The theoretical status of latent variables. Psychological Review, 110(2), 203.
[6] Brown, T. A. (2006). Confirmatory factor analysis for applied research. New York: Guilford Press.
[7] Cane, V. R., & Heim, A. W. (1950). The effects of repeated retesting: III. Further experiments and general conclusions. Quarterly Journal of Experimental Psychology, 2(4), 182-197.
[8] Castro-Schilo, L., & Grimm, K. J. (2018). Using residualized change versus difference scores for longitudinal research. Journal of Social and Personal Relationships, 35, 32-58.
[9] Choquette, K. A., & Hesselbrock, M. N. (1987). Effects of retesting with the Beck and Zung depression scales in alcoholics. Alcohol and Alcoholism, 22(3), 277-283.
[10] Coman, E. N., Picho, K., McArdle, J. J., Villagra, V., Dierker, L., & Iordache, E. (2013). The paired t-test as a simple latent change score model. Frontiers in Psychology, 4, 738.
[11] Durham, C. J., McGrath, L. D., Burlingame, G. M., Schaalje, G. B., Lambert, M. J., & Davies, D. R. (2002). The effects of repeated administrations on self-report and parent-report scales. Journal of Psychoeducational Assessment, 20(3), 240-257.
[12] Eid, M. (2000). A multitrait-multimethod model with minimal assumptions. Psychometrika, 65(2), 241-261.
[13] Fan, X., Thompson, B., & Wang, L. (1999). Effects of sample size, estimation methods, and model specification on structural equation modeling fit indexes. Structural Equation Modeling: A Multidisciplinary Journal, 6(1), 56-83.
[14] Farmus, L. Arpin-Cribbie, C. A., & Cribbie, R. A. (2019). Continuous predictors of pretestposttest change: Highlighting the impact of the regression artifact. Frontiers of Applied Mathematics and Statistics, 4, 64.
[15] Ferrer, E., Balluerka, N., & Widaman, K. F. (2008). Factorial invariance and the specification of second-order latent growth models. Methodology, 4(1), 22-36.
[16] Geiser, C., & Lockhart, G. (2012). A comparison of four approaches to account for method effects in latent state-trait analyses. Psychological Methods, 17(2), 255.
[17] Griffin, B., Bayl‐Smith, P., Duvivier, R., Shulruf, B., & Hu, W. (2019). Retest effects in medical selection interviews. Medical Education, 53(2), 175-183.
[18] Hoffman, L., Hofer, S. M., & Sliwinski, M. J. (2011). On the confounds among retest gains and age-cohort differences in the estimation of within-person change in longitudinal studies: a simulation study. Psychology and Aging, 26(4), 778.
[19] Jensen, A. R. (1965). Scoring the Stroop test. Acta Psychologica, 24(5), 398-408.
[20] Jones, S. M., Shulman, L. J., Richards, J. E., & Ludman, E. J. (2020). Mechanisms for the Testing Effect on Patient-Reported Outcomes. Contemporary Clinical Trials Communications, 100554.
[21] Kievit, R. A., Brandmaier, A. M., Ziegler, G., Van Harmelen, A. L., de Mooij, S. M., Moutoussis, M.,... & Lindenberger, U. (2018). Developmental cognitive neuroscience using latent change score models: A tutorial and applications. Developmental Cognitive Neuroscience, 33, 99-117.
[22] Köhler, C., Hartig, J., & Schmid, C. (2020). Deciding between the covariance analytical approach and the change-score approach. Multivariate Behavioral Research.
[23] Lenhart, L., Steiger, R., Waibel, M., Mangesius, S., Grams, A. E., Singewald, N., & Gizewski, E. R. (2020). Cortical reorganization processes in meditation naïve participants induced by 7 weeks focused attention meditation training. Behavioural Brain Research, 112828.
[24] Longwell, B. T., & Truax, P. (2005). The differential effects of weekly, monthly, and bimonthly administrations of the Beck Depression Inventory-II: Psychometric properties and clinical implications. Behavior Therapy, 36(3), 265-275.
[25] Lüdtke, O., & Robitzsch, A. (2020, September 12). ANCOVA versus Change Score for the Analysis of Nonexperimental Two-Wave Data: A Structural Modeling Perspective.
[26] Maris, E. (1998). Covariance adjustment versus gain scores—revisited. Psychological Methods, 3, 309-327.
[27] Markus, K. A., & Borsboom, D. (2013). Frontiers of test validity theory: Measurement, causation, and meaning. New York, NY: Routledge
[28] Maulik, P. K., Kallakuri, S., Devarapalli, S., Vadlamani, V. K., Jha, V., & Patel, A. (2017). Increasing use of mental health services in remote areas using mobile technology: a pre- post evaluation of the SMART Mental Health project in rural India. Journal of Global Health, 7(1): 010408.
[29] McArdle, J. J. (2009). Latent variable modeling of differences and changes with longitudinal data. Annual Review of Psychology, 60, 577-605.
[30] O’Neill, S. O., Kreif, N., Grieve, R., Sutton, M., & Sekhon, J. S. (2016). Estimating causal effects: Considering three alternatives to difference-in-difference estimation. Health Service and Outcomes Research Methodology, 16, 1-21.
[31] Pearl, J. (2016). Lord’s paradox revisited-(oh Lord! Kumbaya!). Journal of Causal Inference, 4(2).
[32] Stieger, M., Wepfer, S., Rüegger, D., Kowatsch, T., Roberts, B. W., & Allemand, M. (2020). Becoming more conscientious or more open to experience? Effects of a two‐week smartphone‐based intervention for personality change. European Journal of Personality. Advanced online publication
[33] Ormel, J., Koeter, M. W. J., & Van den Brink, W. (1989). Measuring change with the General Health Questionnaire (GHQ). Social Psychiatry and Psychiatric Epidemiology, 24(5), 227-232.
[34] Sliwinski, M., Hoffman, L., & Hofer, S. M. (2010). Evaluating convergence of within-person change and between-person age differences in age-heterogeneous longitudinal studies. Research in Human Development, 7(1), 45-60.
[35] van Breukelen, G. J. (2013). ANCOVA versus CHANGE from baseline in nonrandomized studies: The difference. Multivariate Behavioral Research, 48(6), 895-922.
[36] Van Iddekinge, C. H., & Arnold, J. D. (2017). Retaking employment tests: What we know and what we still need to know. Annual Review of Organizational Psychology and Organizational Behavior, 4, 445-471.
[37] Vernon, P. E. (1954, March). Practice and coaching effects in intelligence tests. In The Educational Forum (Vol. 18, No. 3, pp. 269-280). Taylor & Francis.
[38] Wallis, P. S. (2013). The impact of screen format and repeated assessment on responses to a measure of depressive symptomology completed twice in a short timeframe (Doctoral dissertation, Arts & Social Sciences: Department of Psychology).
[39] Wicks, R. H. (1992). Improvement over time in recall of media information: An exploratory study. Journal of Broadcasting & Electronic Media, 36(3), 287-302.
[40] Windle, C. (1954). Test-retest effect on personality questionnaires. Educational and Psychological Measurement, 14(4), 617-633.
[41] Windle, C. (1955). Further studies of test-retest effect on personality questionnaires. Educational and Psychological Measurement, 15(3), 246-253.
[42] Zhang, H., Shen, Z., Liu, S., Yuan, D., & Miao, C. (2021). Ping pong: An exergame for cognitive inhibition training. International Journal of Human-Computer Interaction, 1-12.
Cite This Article
  • APA Style

    Protzko, J., Nijenhuis, J. T., Ziada, K. E., Metwaly, H. A. M., Bakhiet, S. F., et al. (2025). Analyzing Within-Group Changes in an Experiment: To Deal with Retest Effects, You Have to Go Latent But Not All Latents Are Equal. Science Journal of Applied Mathematics and Statistics, 13(2), 34-44. https://doi.org/10.11648/j.sjams.20251302.12

    Copy | Download

    ACS Style

    Protzko, J.; Nijenhuis, J. T.; Ziada, K. E.; Metwaly, H. A. M.; Bakhiet, S. F., et al. Analyzing Within-Group Changes in an Experiment: To Deal with Retest Effects, You Have to Go Latent But Not All Latents Are Equal. Sci. J. Appl. Math. Stat. 2025, 13(2), 34-44. doi: 10.11648/j.sjams.20251302.12

    Copy | Download

    AMA Style

    Protzko J, Nijenhuis JT, Ziada KE, Metwaly HAM, Bakhiet SF, et al. Analyzing Within-Group Changes in an Experiment: To Deal with Retest Effects, You Have to Go Latent But Not All Latents Are Equal. Sci J Appl Math Stat. 2025;13(2):34-44. doi: 10.11648/j.sjams.20251302.12

    Copy | Download

  • @article{10.11648/j.sjams.20251302.12,
      author = {John Protzko and Jan te Nijenhuis and Khaled Elsayed Ziada and Hanaa Abdelazim Mohamed Metwaly and Salaheldin Farah Bakhiet and Yousif Balil Bashir Maki},
      title = {Analyzing Within-Group Changes in an Experiment: To Deal with Retest Effects, You Have to Go Latent But Not All Latents Are Equal
    },
      journal = {Science Journal of Applied Mathematics and Statistics},
      volume = {13},
      number = {2},
      pages = {34-44},
      doi = {10.11648/j.sjams.20251302.12},
      url = {https://doi.org/10.11648/j.sjams.20251302.12},
      eprint = {https://article.sciencepublishinggroup.com/pdf/10.11648.j.sjams.20251302.12},
      abstract = {Analyzing within-group change in an experimental context, where the same group of people is measured before and after some event, can be fraught with statistical problems and issues with causal inference. Still, these designs are common from political science to developmental neuropsychology to economics. In cases with cognitive data, it has long been known that a second administration, with no treatment or an ineffective manipulation between testings, leads to increased scores at time 2 without an increase in the underlying latent ability. We investigate several analytic approaches involving both manifest and latent variable modeling to see which methods are able to accurately model manifest score changes with no latent change. Using data from 760 schoolchildren given an intelligence test twice, with no intervention between, we show using manifest test scores, either directly or through univariate latent change score analysis, falsely leads one to believe an underlying increase has occurred. Second-order latent change score models also show a spurious significant effect on the underlying latent ability. Longitudinal structural equation modeling with measurement invariance correctly shows no change at the latent level when measurement invariance is tested, imposed, and model fit tested. When analyzing within-group change in an experiment, analyses must occur at the latent level, measurement invariance tested, and change parameters explicitly tested. Otherwise, one may see change where none exists.
    },
     year = {2025}
    }
    

    Copy | Download

  • TY  - JOUR
    T1  - Analyzing Within-Group Changes in an Experiment: To Deal with Retest Effects, You Have to Go Latent But Not All Latents Are Equal
    
    AU  - John Protzko
    AU  - Jan te Nijenhuis
    AU  - Khaled Elsayed Ziada
    AU  - Hanaa Abdelazim Mohamed Metwaly
    AU  - Salaheldin Farah Bakhiet
    AU  - Yousif Balil Bashir Maki
    Y1  - 2025/05/14
    PY  - 2025
    N1  - https://doi.org/10.11648/j.sjams.20251302.12
    DO  - 10.11648/j.sjams.20251302.12
    T2  - Science Journal of Applied Mathematics and Statistics
    JF  - Science Journal of Applied Mathematics and Statistics
    JO  - Science Journal of Applied Mathematics and Statistics
    SP  - 34
    EP  - 44
    PB  - Science Publishing Group
    SN  - 2376-9513
    UR  - https://doi.org/10.11648/j.sjams.20251302.12
    AB  - Analyzing within-group change in an experimental context, where the same group of people is measured before and after some event, can be fraught with statistical problems and issues with causal inference. Still, these designs are common from political science to developmental neuropsychology to economics. In cases with cognitive data, it has long been known that a second administration, with no treatment or an ineffective manipulation between testings, leads to increased scores at time 2 without an increase in the underlying latent ability. We investigate several analytic approaches involving both manifest and latent variable modeling to see which methods are able to accurately model manifest score changes with no latent change. Using data from 760 schoolchildren given an intelligence test twice, with no intervention between, we show using manifest test scores, either directly or through univariate latent change score analysis, falsely leads one to believe an underlying increase has occurred. Second-order latent change score models also show a spurious significant effect on the underlying latent ability. Longitudinal structural equation modeling with measurement invariance correctly shows no change at the latent level when measurement invariance is tested, imposed, and model fit tested. When analyzing within-group change in an experiment, analyses must occur at the latent level, measurement invariance tested, and change parameters explicitly tested. Otherwise, one may see change where none exists.
    
    VL  - 13
    IS  - 2
    ER  - 

    Copy | Download

Author Information
  • Sections