IDRE’s Early Career Research Day recognized Shawn Schwartz’s research, High-Throughput Phenoscaping Using Deep Learning for Accurate Automatic Instance Segmentation of Fish Images, as one of the top four posters presented. More than 80 researchers participated in the poster session event with 40 high-quality research posters on November 20, 2019.
Schwartz, a UCLA Ecology and Evolutionary Biology graduate student, was studying the diversity of fish color patterns when he noticed a limitation in his toolset. He needed to segment each fish from their picture’s background to process the data, and there were too many photos to do it by hand. Schwartz and his research team enlisted the help of COCO (Common Objects in Context), a dataset used to train neural networks to cut the background out of a large number of images.
He ran into one issue with COCO: it had no data on fish — in the biological sense.
“We want to attack this problem that a lot of the databases and hierarchically structured pre-trained sets for model training for machine learning and deep learning are not biologically inspired,” Schwartz said.
As its name suggests, COCO is rich with images of objects common to the average person in their everyday lives. While this is useful for users that want to train neural networks to identify items such as people, tables, and cars, the dataset does not include a diverse collection of organisms that biologists are interested in studying.
The index has an abundance of domesticated animals such as dogs, cats, and horses, but underrepresents organisms such as fish, monkeys, lizards, and other animals that are essential to studying life.
The current research aims to help biologists’ research involving big data by creating biologically inspired datasets that can better train neural networks to fit researchers’ needs.
Schwartz and his fellow researchers in the Alfaro Lab developed a custom dataset of morphologically diverse fish and used a GeForce RTX 2080 GPU to train their neural network to segment the fish images in a way that was well suited to their downstream color pattern analyses.
They compared the color pattern metrics extracted from their algorithmically segmented images to the results from their previous study that used manually segmented images. Their algorithm performed just as accurately as manual segmentation by hand — in a small fraction of the time.
Now, they want to offer their methods to other biologists.
The team is developing a universal tool that has two components. First, they have a web interface for manually segmenting images. Secondly and more advantageously, researchers can save data points as they work and download the data as custom training sets for developing more biologically inspired deep learning models.
Once published, academics across all fields of biology can use this program for any organism that they wish to research.
The Alfaro Lab started researching high-throughput image preprocessing with deep learning at the phenomic scale in April 2019. The research team includes Shawn Schwartz, Liz Karan, Mark Juhn, Whitney Tsai Nakashima, Tyler McCraney, Ph.D., and PI Michael Alfaro, Ph.D.