
Download, Extract, and Optionally Read or Combine CSV Files from EPA AirNow ZIP URLs
download_stack_epa_airdata.Rd
This function downloads ZIP files from provided URLs (either as a character vector or a data frame column), extracts CSV files, and optionally reads and combines them into a single tibble.
Usage
download_stack_epa_airdata(
urls,
output_dir = tempdir(),
download = TRUE,
unzip = TRUE,
read_csvs = TRUE,
stack = TRUE,
clean_names = TRUE
)
Arguments
- urls
A character vector of URLs or a data frame containing a column named
urls
.- output_dir
A directory path where downloaded and extracted files will be stored. Defaults to a temporary directory.
- download
Logical. Whether to download the ZIP files. Defaults to TRUE.
- unzip
Logical. Whether to extract the ZIP files. Requires
download = TRUE
or existing ZIPs. Defaults to TRUE.- read_csvs
Logical. Whether to read extracted CSVs into R. Requires
unzip = TRUE
or existing extracted files. Defaults to TRUE.- stack
Logical. Whether to combine extracted CSVs into a single tibble. Requires
read_csvs = TRUE
. Defaults to TRUE.- clean_names
Logical. Whether to clean column names in the final tibble (using janitor::clean_names()). Defaults to TRUE.
Value
If read_csvs = TRUE
, returns either a tibble combining all extracted CSV files (stack = TRUE
)
or a list of tibbles (stack = FALSE
). If read_csvs = FALSE
, returns a list of file paths to extracted CSVs.
Skips URLs that fail to download or extract.
Examples
if (FALSE) {
df <- tidypollute::scrape_epa_airdata_zip_links()
# Filter dataset and pass a column of URLs
filtered_df <- df %>% filter(year == 1991, analyte == "WIND")
# Use function with a data frame
download_stack_epa_airdata(filtered_df, download = TRUE, stack = TRUE, output_dir = "data/")
# Use function with a character vector
download_stack_epa_airdata(filtered_df$urls, download = TRUE, stack = TRUE, output_dir = "data/")
}