University of Florida Homepage

Potential selection bias associated with using geocoded birth records for epidemiologic research

MAO – Potential selection bias associated with using geocoded birth records for epidemiologic research

Sandie Ha, Hui Hu, Liang Mao, Dikea Roussos-Ross, Jeffrey Roth, Xiaohui Xu

Article first published online: 04 Feb 2016 Annals of Epidemiology

DOI: 10.1016/j.annepidem.2016.01.002

ABSTRACT:

Purpose

There is an increasing use of geocoded birth registry data in environmental epidemiology research. Ungeocoded records are routinely excluded.

Methods

We used classification and regression tree analysis and logistic regression to investigate potential selection bias associated with this exclusion among all singleton Florida births in 2009 (n = 210,285).

Results

The rate of unsuccessful geocoding was 11.5% (n = 24,171). This ranged between 0% and 100% across zip codes. Living in a rural zip code was the strongest predictor of being ungeocoded. Other predictors for geocoding status varied with urbanity status. In urban areas, maternal race (adjusted odds ratio [aOR] ranging between 1.08 for Hispanic and 1.18 for black compared to white), maternal age [aOR: 1.16 (1.10–1.23) for ages 20–34 compared to <20], maternal nativity [aOR: 1.20 (1.15–1.25) for non-US versus US born], delivery at a birth center [aOR: 1.72 (1.49–2.00) compared to hospital delivery], multiparity [aOR: 0.91 (0.88–0.94)], maternal smoking [aOR: 0.82 (0.76–0.88)], and having nonprivate insurance [aOR: 1.25 (1.20–1.30) for Medicaid versus private insurance] were significantly associated with being ungeocoded. In rural areas, births delivered at birth center [aOR: 2.91 (1.80–4.73)] or home [aOR: 1.94 (1.28–2.95)] had increased odds compared to hospital births. The characteristics predictive of being ungeocoded were also significantly associated with adverse birth outcomes such as low birth weight and preterm delivery, and the association for maternal age was different when ungeocoded births were included and excluded.

Conclusions

Geocoding status is not random. Women with certain exposure-outcome characteristics may be more likely to be ungeocoded and excluded, indicating potential selection bias.

Read the full publication at Annals of Epidemiology