Add like
Add dislike
Add to saved papers

Categorization of continuous covariates and complex regression models - friends or foes in intersectionality research.

OBJECTIVE: To reduce health inequities it is important to identify intersections in characteristics of individuals subject to privilege or disadvantage. Different proposals for that have recently been published. One approach (1) considers models specified with 1st and all 2nd -order effects and another (2) the stratification based on multiple covariates; both categorize continuous covariates. A simulation study was conducted in order to review both methods with regard to identification of intersections showing true differences, rate of false positive results, and generalizability to independent data compared to an established approach (3) of backward variable elimination according to Bayesian information criterium (BE-BIC) combined with splines.

STUDY DESIGN AND SETTING: R software has been used to simulate the covariates age, sex, body mass index, education, and diabetes to examine their association with a continuous frailty score for osteoporosis using multiple linear regression. In setting 1, none of the covariates was associated with the frailty score, i.e. only noise is present in the data. In setting 2, the covariates age, sex, and their interaction were associated with the frailty score, such that only females above 55 years formed an intersection associated with an increased frailty score. All approaches were compared under varying sample sizes (N=200-3000) and signal-to-noise ratios (SNR, 0.5-4) in 1000 replications. For model evaluation, bootstrap resampling was used. The models were fitted in internal learning data and then used to predict outcomes in the internal validation data. The mean squared error (MSE) was used for comparison and the frequency of false positive findings calculated.

RESULTS: In setting 1, approaches 1 and 2 generated spurious effects in more than 90% of simulations across all sample sizes. In smaller sample size, approach 3 (BE-BIC) selected 36.5% the correct model, in larger sample size in 89.8% and always had a lower number of spurious effects. MSE in independent data was generally higher for approaches 1 and 2 when compared to 3. In setting 2, approach 1 selected most frequently the correct interaction but frequently showed spurious effects (>75%). Across all sample sizes and SNR, approach 3 generated least often spurious results and had lowest MSE in independent data.

CONCLUSION: Categorization of continuous covariates is detrimental to studies on intersectionality. Due to high and unrestricted model complexity such approaches are prone to spurious effects and often lack interpretability. Approach 3 (BE-BIC) is considerably more robust against spurious findings, showed better generalizability to independent data, and can be used with most statistical software. For intersectionality research we consider it most important to describe relevant differences between intersections and to avoid non-reproducible and spurious findings.

Full text links

We have located links that may give you full text access.
Can't access the paper?
Try logging in through your university/institutional subscription. For a smoother one-click institutional access experience, please use our mobile app.

Related Resources

For the best experience, use the Read mobile app

Mobile app image

Get seemless 1-tap access through your institution/university

For the best experience, use the Read mobile app

All material on this website is protected by copyright, Copyright © 1994-2024 by WebMD LLC.
This website also contains material copyrighted by 3rd parties.

By using this service, you agree to our terms of use and privacy policy.

Your Privacy Choices Toggle icon

You can now claim free CME credits for this literature searchClaim now

Get seemless 1-tap access through your institution/university

For the best experience, use the Read mobile app