This document is intended to provide a structure for normalizing data. It is by no means an exhaustive treatise on the topic, nor is it an authority. It is simply this- if you want to get your data into a standard order, here's what NPL did, how we did it, and what I wish we had done. While it may not be the 'best' way, consider that it is at the very least a way to get everything standard so that tweaking procedures becomes easier. 

One of the strongest tools in the data-nitpicking tool kit is Open Refine. This application is a way of taking a spreadsheet and massaging it into a database ready format.

See the 'Refine Help' page for instructions on how to apply the snippets of code.

 

Other standards:

File Naming Conventions:

  • Specimen Photographs: NPL_12345_1_a1.jpg
    • <Prefix>_<Catalog Number>_<Suffix>_<Qualifier>_<Component><index>.jpg

    • <Prefix> is a catalog prefix, which includes BEG, NPL, P, R, TX, UT or WSA.
    • <Catalog Number> is the numeric value of the identifier.  These values may be anywhere in the range 1 – 99,999,999.
    •  <Suffix> is usually the specimen number (or letter) in a collection of specimens with the same catalog number.
    • <Qualifier> may represent additional information (such as part and counterpart identifiers of a single specimen) or a secondary catalog number.
    • <Component> is an alphabetic character; please use lower case that represents an unnumbered component of a cataloged specimen.  The character ‘L’ (or ‘l’) is reserved for labels only.  All other characters (‘a’ – ‘k’, ‘m’ – ‘z’) are reserved for specimens.  Many cataloged specimens consist of multiple undesignated specimens.  The component letter is used to distinguish one from another in photographic images.  If there is only one specimen, its component is ‘a’. Capital letters (with the exception of L), are not used because the specimen number may contain a capital A, so using lowercase for the component is less confusing.
    •  <index> is the label number (if there are 3 labels the indices are 1, 2, 3)  or the image sequence number of photographs of the same specimen.
  • Accession Records 2014-004_AccessionRecord.pdf
    • template file name <donor or internal><"Accession Record">.pdf
    • SAVE AS and change the donor or internal to the accession number and an underscore.
  • Project Documentation
    • to be determined