We will translate English names published in NABA newsletter to Latin names using NABA taxonomy which is described https://www.naba.org/pubs/checklst.html The full NABA English to Latin list, published in 2001 is found here: https://www.naba.org/pubs/enames2_5.html
NABA (2001) AZ Checklist (English and Latin names) was downloaded here on 2022-06-29: https://www.webutterfly.org/beta/Species/TripChecklist
The sources of data were all from NABA. One set was assembled by Matt Forister for the 2021 Science paper. That data goes through 2008 and includes sites with 10 years or more of data. The second set of data was assembled by the McDowell Sonoran Conservancy for the sites in Arizona published in the NABA newsletters. This data goes from 2004 (the first year of butterfly surveys in the Preserve) through 2021. In NABA_Rowe, "Common/white checkered skipper" was changed to "Common or white checkered skipper" to facilitate reading in to R. These two sources have the same data (date, site, NABAEnglishName, ButterflyCount (number of butterfly individuals), PartyHours (see below), and #Parties (number of different groups on the survey), except that the Conservancy (NABA_Rowe) data also includes # observers (total observers across groups) and total distance miles (sum of the distance traveled by all parties on the survey). Both of these (#observers and total distance miles are possible covariates that may help control for effort and area covered. PartyHours does not include # observers in the calculation and is defined by NABA as follows:
"A party, as defined above, that spends one hour in the field actively butterflying on foot is equivalent to one party-hour. For example, if you had three groups of butterfliers and group A, consisting of two people, counted butterflies for 3 hours;group B consisting of one person, counted butterflies for 5 hours;and group C, consisting of three people, counted butterflies for 4 hours,the total party-hours would be 3 + 5 + 4 = 12. Total party-hours cannot exceed the number of hours of the count x the number of parties. Parties that temporarily separate to count different butterflies become separate parties with separate party-hours during the time of separation. E.g., if a party of 3 counts for 3 hours as a single party, breaks up into 3 sub-parties for 2 hours to count separate butterflies, then counts together for 2 more hours, you should report 3 parties (the maximum number at one time) totaling 11 party-hours (1 party x 3 hours + 3 parties x 2 hours + 1 party x 2 hours). Exclude time when butterfly counting did not occur.”
Separate tables included in the data folder are the lat/long coordinates for all of the sites and %urbanization for the sites in the Conservancy data. Percent urbanization is defined as the percent impervious surface (from NLCD) within the defined buffers around each center point. 10km was the largest diameter that could be used without crossing the US/Mexico border at the Ramsey and Santa Rita sites (i.e., missing data in Mexico).
There were several large outliers at Santa Rita site that were eruptive species (over 134,917, 133,490) and Patagonia (20,054). Transformation of these numbers are not sufficient to normalize data and run a model. To deal with this, we use windorizing (command in R) - replace high values with percentile with trimmed mean. Choose a percentage that will only capture the two highest values. This allows us to keep the data. Winsorization is a process where the highest and lowest α-fraction of the observations are replaced by their nearest neighbors in the remaining “central” (1−2α)-fraction of the data. For example, if there are 100 observations in an increasing order and if α=0.05, we replace each of the smallest five values with the sixth lowest value and each of the highest five values with the 95-the highest value.