
Compute Air Pollutant Exposure for Given Time Windows (County-level aggregation)
summarise_exposure.Rd
This function calculates the mean, median, and standard deviation of exposure to an air pollutant for each participant based on their recorded start and end dates, and county/state.
Usage
summarise_exposure(
participants_df,
air_quality_df,
date_col,
pollutant_col,
start_col,
end_col,
county_name = "county",
state_name = "state",
group_vars = NULL
)
Arguments
- participants_df
A dataframe containing participant information, including start and end dates.
- air_quality_df
A dataframe containing air quality measurements, with date and pollutant values.
- date_col
A string specifying the column name in
air_quality_df
that contains date values.- pollutant_col
A string specifying the column name in
air_quality_df
that contains pollutant values.- start_col
A string specifying the column name in
participants_df
that contains the start date.- end_col
A string specifying the column name in
participants_df
that contains the end date.- county_name
A string specifying the column name in
participants_df
andair_quality_df
that contains county names.- state_name
A string specifying the column name in
participants_df
andair_quality_df
that contains state names.- group_vars
A character vector of additional grouping variables from
participants_df
(e.g., participant ID, age, smoking status). Default isNULL
.
Value
A tibble containing the mean, median, and standard deviation of exposure for each participant during their study period, along with the number of valid exposure records.
Examples
library(tidyverse)
#> ── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
#> ✔ dplyr 1.1.4 ✔ readr 2.1.5
#> ✔ forcats 1.0.0 ✔ stringr 1.5.1
#> ✔ ggplot2 3.5.1 ✔ tibble 3.2.1
#> ✔ lubridate 1.9.3 ✔ tidyr 1.3.1
#> ✔ purrr 1.0.2
#> ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
#> ✖ dplyr::filter() masks stats::filter()
#> ✖ dplyr::lag() masks stats::lag()
#> ℹ Use the conflicted package to force all conflicts to become errors
# Example air quality data (PM2.5 levels)
air_quality_df <- tibble(
date = seq(as.Date("2000-01-01"), as.Date("2020-12-31"), by = "day"),
pm25_level = runif(length(date), min = 5, max = 80),
county_name = rep(c("CountyA", "CountyB", "CountyC"), length.out = length(date)),
state_name = rep(c("StateX", "StateY", "StateZ"), length.out = length(date))
)
# Example participants data
participants_df <- tibble(
participant_id = 1:5,
start_date = as.Date(c("2005-06-01", "2010-01-01", "2015-03-15", "2008-07-10", "2012-09-20")),
end_date = as.Date(c("2005-12-31", "2010-12-31", "2015-09-30", "2009-05-20", "2013-06-15")),
age = c(65, 72, 50, 60, 58),
county_name = c("CountyA", "CountyB", "CountyC", "CountyA", "CountyB"),
state_name = c("StateX", "StateY", "StateZ", "StateX", "StateY"),
smoking_status = c("Never", "Former", "Current", "Never", "Former")
)
# Compute exposure
exposure_results <- summarise_exposure(
participants_df = participants_df,
air_quality_df = air_quality_df,
date_col = "date",
pollutant_col = "pm25_level",
start_col = "start_date",
end_col = "end_date",
county_name = "county_name",
state_name = "state_name",
group_vars = c("participant_id", "age", "smoking_status")
)
print(exposure_results)
#> # A tibble: 5 × 7
#> participant_id age smoking_status mean_exposure median_exposure sd_exposure
#> <int> <dbl> <chr> <dbl> <dbl> <dbl>
#> 1 1 65 Never 43.1 44.3 22.1
#> 2 2 72 Former 38.3 33.1 21.0
#> 3 3 50 Current 40.6 42.0 20.4
#> 4 4 60 Never 43.5 44.3 22.4
#> 5 5 58 Former 44.8 47.1 20.4
#> # ℹ 1 more variable: n_exposure_records <int>