Skip to content

Subgroups

Elizabeth Campolongo edited this page Mar 26, 2025 · 32 revisions

Project subgroups

Subgroup 1: Bucket of Bugs to BioClip

Goal: We want A tool that lets us take photographic collections of individual Wild insects And be able to iteratively pass them through something like bioclip in order to rapidly id And refine IDs of insects taxonomically.

Ideally the program also runs offline as it will be used by field text gathering data and places where internet might not be great.

Members:

  • Elizabeth Campolongo
  • Ernie Parke
  • Andy Quitmeyer
  • Matt Thompson

Project Code: https://github.com/Digital-Naturalism-Laboratories/bucket-o-bugs

Subgroup 2: SDMs using AI-generated data

*Goal: Broad goal is to get computer vision generated species data and use those data for ecological application. Specifically, we want to develop the Species Distribution Models and predict the suitable habitats of beetle species under various climate scenario using computer vision identified species data.

Task: This goal needs two steps of the work.

  • First task is to generate the Convolution Neural Network (CNN) model using NEON beetle image data and use this model for species identification of unknown sample bee species.

  • Second task is to get climate data for associated with identified species and conduct the species distribution models which produce the species map for their suitable habitat.

  • Members:

    • Khum Thapa-Magar, INSTAAR, University of Colorado
    • Sarwan Ali, Georgia State University
    • Hsunyi Hsieh, Michigan State University
    • Feel free to join the group if you like

Project Code: https://github.com/Imageomics/sdm-beetlepalooza

Subgroup 3: Beetle ID and Ten Simple Rules

Goal: We want to see how far taxonomically BioClip can get in identifying individual NEON beetles.

  • Members:
    • Sydne Record
    • Hilmar Lapp
    • Evan Waite
    • Laura Nagel
    • Kim Landsbergen
    • Isa Betancourt
    • Elizabeth Campolongo

  • Run BioClip; run 1 - run on segmented images run 2 - run on unsegemented images compare

  • open classification versus list of known taxa

  • Hilmar:

run 1 - Bioclip on 6 known images individually per bar code samples run listed below (08984, 08914, 08980, 08976, 40688, 40713...);

taxonomists' assessment - in one case, runs convened to same tribe (a group of subgenera) for 08914; not 08984;

run 2 - to level of rank 'biochip predict --rank genus [range of images] outcome - tribes not correct

Evan = going from subfamily to tribe - this is a huge leap

Evan = Can we train Bioclip to get to assess each image and stop at the tribe level? Laura = a goal would be to get to genus - that would be a time-saver

Imagining an example in-person tech workflow AI sort to tribe, then human tech can work on identification lower than tribe that would help eliminate a lot to be able to get to tribe (keys would be needed)

Samples are from Wisconsin - same domain 05, 2 different locations (UNDE, STEI)

Laura provided a file w/ all species found within Domain D5 - every unique species ID returned has been included in the list

^ list to be used in BioClip to limit identification to that domain-specific list filename D05_TaxaList.txt

The NEON Domain 05 list represents specimens already found, that have been expert verified But this is not the list of what could be there (which is a larger number)

Hilmar reran w/ the D05 list; Elizabeth helped w/ formatting table code

efforts below all include D05 list as part of BioClip

Evan - both of these are different species - AI found them too be different - but they are in the vial as same species A00000008980-06
A00000008980-08

40688 - ran 10 subsets from this vial - correct ID is Synuchus impunctatus

40713 - correct ID for Bembidion transparens 40688 - using the full image - with all the beetles in the image

running it beetle by beetle - the ability to ID to correct taxon is variable running it as a full image with all beetles included - the correct ID is in the top 3

conversation about how to optimize photos - on Evan's high-res images of their specimens now running EWIC_00001460, EWIC_0000353, EWIC_0000799, EWIC_0000801, EWIC_0001164

Day 3 wrap-up

Sydne re-ran what we did yesterday, got rid of sub-species data Created summaries at the tribe, subfamily level What were scores for each image?

Laura - been data wrangling to evaluate what the cumulative scores were at each of those taxonomic levels Assigning flags, at each of those levels, what was right or wrong

Isa - it would be interested to evaluate the number of training images with the Right/Wrong flag

Elizabeth put together a script on Cyvers - where she summarized the training images for BioClip; 36 genera, how many images for training were used in BioClip runs

Hilmar battling to get individually segmented images ready to run each image with its own reference domain list. This file structure needing shuffling and wrangling.

Goal for tomorrow - to run BioClip on all of the segmented images with the newly wrangled dataset (thank you Hilmar!)


Members: Sydne Record, Isabelle Betancourt, Evan Waite, Laura Nagel, Kim Landsbergen, Hilmar Lapp, Elizabeth Campolongo

Group 3 code is in a group 3 folder in this repository.

Subgroup 4: EcoPalette: Integration of environmental data into species images to improve model accuracy

Members: Alyson East, Nicholas Gunner, Brennan Hays, Daniel Lopez, Isabella Viney

Subgroup goals:

  • Represent ecosystem metadata visually on beetle image
  • Improve AI model's classification confidence of beetle species using visualized metadata
  • Assess the importance of image-encoded metadata in model's accuracy

Workflow:

  • Segment NEON vial-level images of ground beetles into individual beetle images (thanks to Sarwan Ali and Michelle Ramirez)
  • Subset beetle image dataset to include only 5 beetle species for proof-of-concept simplicity
  • Identify abiotic and biotic ecosystem features of interest based on relevance to beetle niche
  • Extract NEON ecosystem data of interest for year 2018 and link to beetle images
  • Train and test AI models in identifying beetle species from (1) beetle image subset including image-encoded metadata and (2) beetle image subset NOT including image-encoded metadata
  • Compare AI model accuracy from (1) and (2) above

Project code: https://github.com/Imageomics/EcoPalette/tree/main

Subgroup 5: Easy Traits with ML

Members: Isadora Fluck, Michelle Ramirez, Jennifer Girón, S M Rayeed, Ekaterina Nepovinnykh, Dhanyapriya Somasundaram, Hojin Yoo, Sydne Record

Goal: automate trait measurements from images

Workflow: 577 images of the beetles + code (phyton + R):

Code: Input: image with multiple individuals Output: a data table with columns:

  • pictureID (that is linked to speciesID, plotID, siteID, etc);
  • individualID (that can be linked to the individual images);
  • elythra area;
  • elythra width;
  • elythra length;

Group5_b: Project Code: https://github.com/yoohj0416/predictbeetle

Subgroup 6: Gaps in Current Models and What Actually Matters

Members: Nathan, Blair, Alec, Parkash

Grad Cam for BioCLIP: https://github.com/mirkab/BeetlePalooza_2024_Mirka

Grad Cam for ResNet-50: https://github.com/parkash-ps/Imageomics-Beetlepalooza-2024

Goals:

  • Identify where and why current CV models misidentify beetle species.