Fishes of Texas Project Documentation

In an attempt to meet the need for normalized, high quality historic fish occurrence data at various spatial scales. We've done the following:

  • find data: We’ve contacted well over 200 potential data providers and scoured the internet for more data. Data have come in many formats, requiring various treatments (spreadsheets, text files, and paper). We sought out personal accounts of species observations from researchers and the public as well as extracted records from literature those in track 3 only). More here and here and here.
  • data entry: Data that were not digital had to be hand-entered into digital spreadsheets.
  • re-format (normalize) data: Data often came to us in various formats, meaning that a single data field could include various ways of presenting essentially the same information.  For example, species names (a single atomized data field for us) were either combined with higher taxonomy, reduced to a simple species name, and/or included a common name. Dates were in various arrangements of day, month, and year, and/or included Roman Numeral months and/or with months written out in text. We had to adjust those as well as other fields into single unified formats.
  • compile data: We brought all of the disparate datasets into one relational database. The first version was in Microsoft Access and later versions in MySQL and PostgreSQL. More here.
  • georeference text locations: Since, little of our data originally came to us with spatial coordinates we manually applied them along with error estimate to each record. We could now visualize them on a map. More here.
  • synonymize taxa and collector/determiner names: Species names were often misspelled and provided under multiple historic names, and more rarely with common names, due to provider use of historical taxonomies and irregular updating over a long period of time. We synonymized the multitude of names we found in the raw data to a standard taxonomy. Similarly we synonymized that various versions of collector and determiner names. More here.
  • detect errors (usually via visualization on a map): Once records were georeferenced we could map them out, species by species, and see outliers. The other common method to detect errors was to group records into collecting events based on combinations of date, locality names, and collector names. More here.
  • verify/correct determinations: We looked at thousands of specimens, either because we flagged them as outliers, they weren't identified to species level, or because they were a species easily mistaken for another. More here.
  • verify data against documentation: Often outlier specimens, after verification that they were correctly determined, turned out to be erroneous locations that could be corrected based on examining ledgers, fieldnotes, original labels, and published manuscripts or maps. Often georeferences and dates that were imprecise or incorrect could be refined as well.
  • photograph specimens, field notes and jar labels: When available we photographed and provide fieldnotes. We also photographed specimens and jar labels, prioritizing those that represented edge of range records or that document unusual or rare populations. More here.
  • preserve original data: Original donor verbatim data are preserved and displayed alongside our edited version. More here.
  • publish data (including useful summaries): Data are all published on GBIF and our website. More here.
  • publish research products (models, native ranges, conservation areas): We publish data summary tools, species distribution models, and native fish conservation areas etc. on our stats tab. More here.
  • No labels