Step |
Function |
Package |
Brief
description |
Citation |
Pre-filtering |
bdc_scientificName_empty |
bdc |
Flag occurrences without a scientific name |
(Ribeiro et al.
2022) |
|
bdc_coordinates_empty |
bdc |
Flag occurrences without
latitude or longitude |
(Ribeiro et al. 2022) |
|
bdc_coordinates_outOfRange |
bdc |
Flag occurrences
that don’t map on Earth |
(Ribeiro et al. 2022) |
|
bdc_basisOfRecords_notStandard |
bdc |
Flag
occurrences based on ”basisOfRecord” |
(Ribeiro et al.
2022) |
|
bdc_country_from_coordinates |
bdc |
Get country name
using coordinates (where no country name is provided) |
(Ribeiro et al.
2022) |
|
jbd_CfC_chunker |
BeeBDC |
A chunked and
multi-threaded function similar to bdc_country_from_coordinates where
datasets are otherwise too large |
(Dorey et al. In
review) |
|
bdc_country_standardized |
bdc |
Standardises country
names and adds ISO2 codes, if needed |
(Ribeiro et al.
2022) |
|
bdc_coordinates_transposed |
bdc |
Flags occurrences
where latitude and longitude appear to be transposed |
(Ribeiro et al.
2022) |
|
jbd_Ctrans_chunker |
BeeBDC |
A chunked and
multi-threaded function similar to bdc_coordinates_transposed where
datasets are otherwise too large |
(Dorey et al. In
review) |
|
bdc_coordinates_country_inconsistent &
jbd_coordCountryInconsistent |
bdc and BeeBDC |
Flag
occurrences where the country name does not match the coordinates. Both
functions are very similar with slightly different implementations |
(Ribeiro et al. 2022; Dorey et al. In review) |
|
bdc_coordinates_from_locality |
bdc |
Extracts
occurrences with locality information that might allow geolocation (to
fill gaps) |
(Ribeiro et al. 2022) |
|
flagAbsent |
BeeBDC |
Flags occurrences that are marked
as ”absent” under occurrenceStatus |
(Dorey et al. In
review) |
|
flagLicense |
BeeBDC |
Flags occurrences that are
restricted by usage licences |
(Dorey et al. In
review) |
|
GBIFissues |
BeeBDC |
Flags occurrences based on
user-defined GBIF flags |
(Dorey et al. In review) |
|
clean_fossils |
CoordinateCleaner |
Flags species that
occur in common fossil databases |
(Zizka et al. 2019) |
Taxonomy |
bdc_clean_names |
bdc |
Cleans
scientific names (e.g., removes prepended family names, removes
qualifiers, etc.) and flags occurrences with name qualifiers |
(Ribeiro
et al. 2022) |
|
bdc_query_names_taxadb |
bdc |
Harmonizes taxonomy
based on the taxadb package |
(Ribeiro et al. 2022) |
|
harmoniseR |
BeeBDC |
Harmonizes taxonomy based on
custom synonym lists (updated bee taxonomy provided) |
(Dorey et al.
In review) |
Space |
bdc_coordinates_precision &
jbd_coordinates_precision |
bdc and BeeBDC |
Flags occurrences
based on coordinate precision (number of decimal places). The bdc
function flags occurrences if either latitude or longitude are
imprecise. The BeeBDC function only flags occurrences where both
latitude and longitude are imprecise |
(Ribeiro et al. 2022; Dorey et
al. In review) |
|
coordUncerFlagR |
BeeBDC |
Flags occurrences that do not
pass a threshold of ”coordinateUncertaintyInMeters” |
(Dorey et al.
In review) |
|
clean_coordinates |
CoordinateCleaner |
Flags
occurrences based on proximity to the centroids of capitals, countries,
provinces, GBIF, and institutions; zeros for coordinates; points in
seas; statistical outliers; urban areas; and coordinate validity |
(Zizka et al. 2019) |
|
cd_ddmm |
CoordinateCleaner |
Flags occurrences where a
significant fraction of records have been subjected to a common ‘degree
minute’ to ‘decimal degree’ conversion error, where the degree sign is
recognized as decimal delimiter |
(Zizka et al. 2019) |
|
diagonAlley |
BeeBDC |
Flags records for fill-down
errors in coordinates within datasets using a sliding window |
(Dorey et
al. In review) |
|
cd_round |
CoordinateCleaner |
Flags occurrences that
might be from gridded datasets |
(Zizka et al. 2019) |
|
countryOutlieRs |
BeeBDC |
Flags occurrences that do not
match exact or adjacent country based on a custom checklist (bee country
checklist is provided) |
BeeBDC |
|
manualOutlierFindeR |
BeeBDC |
Flags occurrences (and
their duplicates) based on expert identification |
BeeBDC |
Time |
dateFindR |
BeeBDC |
Attempts to find and
rescue dates hidden in other columns |
(Dorey et al. In
review) |
|
bdc_eventDate_empty |
bdc |
Flags occurrences without
a date |
(Ribeiro et al. 2022) |
|
bdc_year_outOfRange |
bdc |
Flags occurrences created
prior to a user-specified year cut-off |
(Ribeiro et al.
2022) |
Duplicates |
dupeSummary |
BeeBDC |
Iteratively
searches out duplicates based on user-specified inputs. Duplicate
clusters are merged and one occurrence is kept (based on user input)
while the rest are flagged |
(Dorey et al. In
review) |
|
cc_dupl |
CoordinateCleaner |
Flags duplicates based on
matching coordinates and species names |
(Zizka et al.
2019) |
Filtering |
bdc_filter_out_flags |
bdc |
Filters occurrences using all present flag columns (starting with ”.”) |
(Ribeiro et al. 2022) |
|
bdc_filter_out_names |
bdc |
Filters occurrences
based only on the accepted status of taxonomy |
(Ribeiro et al.
2022) |
|
summaryFun |
BeeBDC |
Filters occurrences based on flag
columns and has additional inputs including: choosing filters to
examine, only updating the ”.summary” flag column, and removing the flag
columns |
(Dorey et al. In review) |