The data science of the brain

Abstract: The brain is, in a sense, a computer. Unlike modern computers with von Neuman architecture, the brain is made of neurons which together serve as processing unit, a memory unit, and storage unit. Though precise mechanisms of how this big set of switches remain not fully understood, it is clear that the brain possesses these three properties: modularity, connectivity, and plasticity. These three properties underlie the current data science of the brain.

The brain is, in a sense, big data. The human brain is said to have 10^11 (100G) neurons. It is also estimated that there are 10^15 (1P) connections of neurons in the human brain. However, the resolution of our scientific measurement is not high enough to capture all of these neurons or connections at once. At this point, we have an understanding of neurons and their connections individually, as well as their connections in association with other neurons. However, the current resolution of brain measurement is far lower than the real set of neurons. A whole human brain MRI scan can be stored as 600MB (or 350MB after gzip). This is far lower resolution than the actual brain. Nonetheless, we still have a lot to mine from this level.

The brain is diverse across people, just like the rest of the body is. Although our current resolution of brain scanning is limited, the variability across individuals can help amplify the understanding of each data point. By using the brain scans from hundreds, or even thousands, of people, we are trying to learn more than what we could from a single scan.

In addition to brain data, other sources of data can help us understand the brain. These include demographic, behavioral, perceptual, motor, cognitive, and also genetic data. Human genomes are quite diverse, and currently known to have roughly 10 million diverse data points. Given that the genome is the blueprint for the body, including the brain, we can use genetic information to understand the brain or vice versa. It is not the case that genetic 10MB of data is applied to 600MB of data of the brain in each person, but more strategically converged.

In this talk, I explore these strategies to analyze the big data of the brain and genes, after showing the properties of the data.


Bio: Assistant Professor of Communication Sciences and Disorders, Dr. Ikuta received his Ph.D. in Neuroscience and Linguistics at Indiana University in 2008, as well as M.A. in Computational Linguistics in 2005. His research interests include neural mechanisms of hearing and language using neuroimaging and computational techniques. He has published papers on the numerous topics including neuroimaging, neuropsychopharmacology, neurodevelopment, psychiatric and/or neurological disorders, dopaminergic modulation, computational linguistics, generative linguistics, auditory hallucination, and genetics.