Science Fair, 2.0
At the recent Massachusetts STEM summit, I was lucky to hear an inspiring presentation by Nathan Han, a sophomore at Boston Latin School, and winner of the 2014 Intel Science and Engineering Fair.
I found Nathan’s work remarkable because it was, essentially, a big data project. Using existing data from publicly available databases of the BRCA1 tumor suppressor gene, a gene implicated in the development of breast and ovarian cancer, Nathan developed a machine-learning algorithm that examined characteristics of multiple mutations of BRCA1 and figured out how to differentiate between mutations that cause disease and those that do not. His tool exhibits an impressive 81 percent accuracy rate, and could be used to identify cancer threats from BRCA1 gene mutations more accurately than currently-available techniques.
I spoke with one of Nathan’s teachers, Kathleen Bateman, after his presentation. She confirmed that there was no bench science in his project; it was purely a computer-based project. I asked if her students work with large data sets or bioinformatics in their biology coursework. She said no, but they have a proactive statistics teacher, who involves students in using R Studio and modern approaches to statistical computing.
Years ago, when I was a judge at a regional science fair, Nathan’s work would probably not have made it past the first round. The judging criteria absolutely required student-collected data and required an experimental design with a controlled variable and a manipulated variable.
Data science has matured since then, and big data has transformed how professional scientists work. It is refreshing to see that this evolution is also being reflected in the science fair world, allowing students to develop the skills needed to ask questions of our data-heavy world, and—as in Nathan’s case—make important discoveries.
Kim Kastens, Principal Scientist
The EDC Oceans of Data Institute