-
Notifications
You must be signed in to change notification settings - Fork 3
Mine For Specimens
Article: "Text Mining For Museum Specimen Identifiers"
- Mine For Specimens and its presentation
- Mine the Catalog Cards and its presentation
Currently there is no easy way to link AMNH science publications in our DSpace Digital Library (http://digitallibrary.amnh.org/handle/2246/5) to specimens (Latin name and catalog number) in our research collections. We would like a solution to dynamically create bibliography that would match references to publications to specimens (Latin name and catalog number) in our collection database (platform = KE EMu (http://www.kesoftware.com/).
There are also images of our specimens in our Scientific Publications. Could images that match specimen numbers be extracted and saved as jpegs? Finally, we have images of specimen catalog cards that have meaningless filenames... the filenames are not by specimen name or catalog number so there is no way to find a card that pertains to a specific specimen. Our goal for this part of the project would be to extract the scientific name and catalog number from the card and dump it into a spreadsheet with the filename, essentially creating an index for all the filenames. This would make it easier for Collections Managers to find specific cards and attach the jpeg of the card to the corresponding specimen record in the collection database.
-
Extract specimen numbers/names from digitized Scientific Publications. Find known specimen numbers and names in the text of [scientific publications]](http://digitallibrary.amnh.org/handle/2246/5), match them to corresponding specimens (Latin name and catalog number) in our collections database, and output a bibliography in XML format that could be imported into the bibliography module in KE Emu. For the schema for Bibliography Module in the documents folder for this project - https://github.com/amnh/HackTheStacks/tree/master/challenges/Mine_For_Specimens/documents). For a list of specimens (Latin name and catalog number) in our collections database, see the documents folder for this project - https://github.com/amnh/HackTheStacks/tree/master/challenges/Mine_For_Specimens/documents). Please note that for IZ (Invertebrate Zoology), we do not have a list of specimens, however, the Barcode Labels for specimen records include the following prefixes: AMNH_PBI 01234567, AMNH_ENT 01234567, PBI_OON 01234567, AMNH_IZC 01235467, AMNH_ARA 01235467, AMNH_HYM 01235467 Note for these: In the text portion of the pub, they are likely to be found 1 time with the range of numbers following. AMNH_PBI 00000001 – 00369799 AMNH_ENT 00000001 – 00150000 PBI_OON 00000001 – 00020000 AMNH_ARA 00000001 – 0005000 AMNH_HYM 00000001 - 00002341
-
Extract images from the Scientific Publications. Scientific publications that include our specimens (Latin name and catalog number) have images we would like you to extract and save as jpegs. The filename of the jpeg should correspond the the specimen catalog number that is pictured in the image. For IZ images, the image is likely to be labeled with the specimen (catalog/database) number as specified above.
-
Extract the scientific name and the catalog number from the jpegs of specimen catalog cards. Extract and insert this data for each card / catalog page into a spreadsheet with the name of the file in which it appears (ie: p0005_74101.jpg). The images of the catalog cards will be provided to the team working on this challenge via an external drive.
-
All our Scientific Publications can be found in our DSpace Digital Library at this link: http://digitallibrary.amnh.org/handle/2246/5
-
For the schema for Bibliography Module in the documents folder for this project - https://github.com/amnh/HackTheStacks/tree/master/challenges/Mine_For_Specimens/documents
-
For a list of specimens (Latin name and catalog number) in our collections database, see the data folder for this project - https://github.com/amnh/HackTheStacks/tree/master/challenges/Mine_For_Specimens/data
-
The images of the catalog cards will be provided to the team working on this challenge via an external drive. Please ask organizers for it if you're working on this challenge!
Challenges --|-- Online Resources And Data Sets --|-- Code of Conduct --|-- Home