Skip to contents

This function downloads ZIP files from provided URLs (either as a character vector or a data frame column), extracts CSV files, and optionally reads and combines them into a single tibble.

Usage

download_stack_epa_airdata(
  urls,
  output_dir = tempdir(),
  download = TRUE,
  unzip = TRUE,
  read_csvs = TRUE,
  stack = TRUE,
  clean_names = TRUE
)

Arguments

urls

A character vector of URLs or a data frame containing a column named urls.

output_dir

A directory path where downloaded and extracted files will be stored. Defaults to a temporary directory.

download

Logical. Whether to download the ZIP files. Defaults to TRUE.

unzip

Logical. Whether to extract the ZIP files. Requires download = TRUE or existing ZIPs. Defaults to TRUE.

read_csvs

Logical. Whether to read extracted CSVs into R. Requires unzip = TRUE or existing extracted files. Defaults to TRUE.

stack

Logical. Whether to combine extracted CSVs into a single tibble. Requires read_csvs = TRUE. Defaults to TRUE.

clean_names

Logical. Whether to clean column names in the final tibble (using janitor::clean_names()). Defaults to TRUE.

Value

If read_csvs = TRUE, returns either a tibble combining all extracted CSV files (stack = TRUE) or a list of tibbles (stack = FALSE). If read_csvs = FALSE, returns a list of file paths to extracted CSVs. Skips URLs that fail to download or extract.

Examples

if (FALSE) {
df <- tidypollute::scrape_epa_airdata_zip_links()

# Filter dataset and pass a column of URLs
filtered_df <- df %>% filter(year == 1991, analyte == "WIND")

# Use function with a data frame
download_stack_epa_airdata(filtered_df, download = TRUE, stack = TRUE, output_dir = "data/")

# Use function with a character vector
download_stack_epa_airdata(filtered_df$urls, download = TRUE, stack = TRUE, output_dir = "data/")
}