Abstracts of Talks and Panel Discussions
Dr. Ralph-Axel Müller (Psychology and Center for Clinical and Cognitive Neuroscience)
Title: MR imaging in autism: Mining for biomarkers.
Multimodal magnetic resonance imaging can generate huge amounts of data. Not all of it is informative of brain anatomy or function, but in conventional hypothesis-driven studies data reduction may result in critical loss of information. This is an issue in the study of autism spectrum disorders (ASD) because there is little consensus on what brain features or functional systems may constitute the ‘core’ of the disorder. I will present examples of highly multivariate MRI datasets in ASD and show how narrowly focused hypothesis-driven approaches can miss the ‘big picture’. I will then turn to some exploratory data-driven studies, which aim to uncover imaging biomarkers of ASD. Trade-offs between sample size, data quality, and coverage of anatomical and functional brain features through multimodal imaging remain a challenge. Nonetheless, data mining in neuroimaging provides a promising approach to identifying currently unknown ASD subtypes, which may be linked to specific genetic (or epigenetic) risk factors and may respond to specifically tailored treatments. Challenges in ASD research discussed here are exemplary of those encountered in the study of other disorders (e.g., fetal alcohol syndrome, dyslexia, Alzheimer’s disease) by other members of the Center for Clinical and Cognitive Neuroscience.
Dr. Bowen Shen (Math and Statistics, C2S2)
Title: Advanced Supercomputing Technology for Big Data Science.
It’s well known that the substantial increase in data volume, which may be produced by high‐resolution Earth modeling systems or derived from satellite observations, poses a great challenge to stage, handle, and manage these data and extract scientific insights from the data. We believe that efficiently handling and analyzing these massive data sets, from terabytes for short‐term runs to petabytes for long‐term runs, require innovative thought processes and approaches. To achieve the goals, I will introduce (1) scalable Concurrent Visualization (CV) technology and (2) multi‐level Parallel Ensemble Empirical Model Decomposition (PEEMD) method, which have been developed at NASA. In CV, a simulation code is instrumented such that its data can be extracted for analysis while the simulation is running without having to write the data to disk. By avoiding file system I/O, CV provides much higher temporal resolution than is possible with traditional post‐processing. The original Ensemble Empirical Model (EMD, Huang et al., 1998) and ensemble EMD (Wu et al., 2009) were developed for multiscale analysis as the involved processes are non‐stationary and nonlinear. To efficiently analyze high resolution, global, multiple‐dimensional data sets, we implement multi‐level parallelism into the ensemble EMD and obtain a parallel speedup of 720 using 200 eight‐core processors.
Dr. Mel Hovell (Public Health, Director of CBEACH)
Title: Real Time Measures & Interventions: Setting the stage for behavioral science and engineering human behavior?
Co-authors: Suzanne Hughes, Vincent Berardi, John Bellettiere, Neil Klepeis, Saori Obayashi, Jennifer Jones, Sandy Liles and Marie Boman-Davis
Principles of Behavior have been scientifically vetted for more than 100 years. Pavlov showed that reflex behavior is elicited by novel stimuli; Skinner showed that “voluntary” or operant behavior could be selected as a consequence of consequences. Yet, basic science has not yet offered a reliable means of engineering human behavior. This is likely due to the lack of necessary tools and incompletely-specified theory. Real time and continuous measures and interventions guided by our Behavioral Ecological Model (BEM) may set the stage for engineering and sustaining human behavior. Our investigators are conducting research to alter smoking; separately accelerometers are being used to shape increased step counts per day to enhance fitness in high risk populations, with repeated measures that range from 1-2 million per family over about 4 months, billions for family samples as large as 300. These studies serve as models for future multi-disciplinary research that may culminate in technology that can shape and sustain healthy behavior in individuals and populations. This lightening presentation will introduce the BEM, show how its use generates big data and demonstrate preliminary results of real-time interventions.
Dr. Atsushi Nara (Geography, HDMA)
Title: Behavior mining from large moving object data.
Discovering new insights about spatial, temporal, and behavioral patterns from large moving object data has been a major challenge in the data science community particularly since the rapid advancement and deployment of location aware technologies and services. The challenge stems from high volume, high velocity, and high variety of data, which makes difficult to efficiently and effectively process and analyze moving objects’ behavior. In this talk, I will present a suite of analytic methods to reveal human behavioral patterns in space and time using GPS tracking data. Our data analytics approach involves classifications of movement patterns based on both geometric and semantic properties of movement data and investigations of associations among movement patterns, interaction behaviors, geographic contexts, and individual characteristics.
Dr. Rob Malouf (Linguistics)
Title: Mining cultural insights from online texts.
Public interactions that a generation ago were ephemeral are now mediated by the internet and leave a permanent and accessible record. Combined with large-scale natural language processing and text analysis, this has created unprecedented opportunities for scientists studying human linguistic behavior. This talk will present an instructive example, using conversations gathered from an online forum frequented by people who are planning to relocate. Their discussions typically center around the pros and cons of neighborhoods in various US cities. Using topic modeling, we mined this corpus to construct conceptual city maps in which distances are derived from neither physical proximity nor demographic similarity, but rather from the subjective role that neighborhoods play in the popular imagination as reflected in the text. This map of a city's cultural landscape can be useful to both residents and to researchers. For example, areas which are likely targets of gentrification may look different in the cultural space than other areas which objectively have very similar attributes. Or, by overlaying cultural maps of different cities, we can match up equivalent regions, allowing someone to find a neighborhood in an unfamiliar city which plays a similar role to a neighborhood in a more familiar one.
Dr. Sam Shen (Math and Statistics, Co-Director of Center for Climate and Sustainability Studies)
Title: Big climate data at SDSU and the world.
SDSU Climate Informatics Lab (CIL) was established in 2006 and has been training students with climate data analysis skills through Masters thesis research and independent studies. The lab emphasizes error estimation, optimization, and big data. The lab’s students have been hired by various kinds of consulting firms, SPAWAR, financial companies, IT industry, and others. Some students choose to teach or to continue studying for PhD at SDSU or other top institutions. SDSU-CIL has many national and international collaborations on big data, such as with NASA Jet Propulsion Lab on deep ocean data, NOAA National Climatic Data Center on data errors, and the Third Pole Environment program on Tibetan plateau data. Rapid increase of the modeled and observed climate data requires efficient big data technologies to facilitate various applications. The Coupled Model Intercomparison Project for Inter-governmental Panel for Climate Change (IPCC) is an example of exponential data increase: 1 GB in 1995, 500 GB in 2001, 35 TB in 2007, and 3.5 PB in 2013. SDSU-CIL is developing cutting-edge tools to meet the challenges of big data analysis and visualization. The CIL’s latest release is the 2014 SOGP 1.0 software package: a Weather History Time Machine.
Dr. Andre Skupin (Geography, CICS)
Title: Knowledge Visualization: From Cartographic Inspiration to Societal Impact.
There is untapped potential in geographic visualization expanding its reach and impact beyond traditional geocentered applications. In this presentation, an expanded vision of visualization is presented that is foremost informed by recognition of "space" as a useful construct for supporting all kinds of pattern discovery and decisionmaking. This applies in particular to the artifacts that are produced and consumed by domain actors in the course of various knowledge‐based activities. When such geographic concepts as distance, scale, or region are combined with high‐dimensional approaches, novel techniques and applications can emerge that are applicable to vast collections of structured and unstructured data. This will be demonstrated with examples ranging from thousands of medical records to millions of research publications and social media artifacts. The lightning talk will also touch upon the question of how creative research involving big data and compute‐intensive methods can be translated into meaningful and sustainable innovation via strategic partnerships and product development.
Vincent Berardi (CBEACH)
Title: Data in Practice: A Health-Based Intervention
Co-authors: Melbourne Hovell, Suzanne Hughes, John Bellettiere, Neil Klepeis, Saori Obayashi, Jennifer Jones, Sandy Liles and Marie Boman-Davis
Project Fresh Air is a clinical trial aiming to reduce secondhand smoke (SHS) exposure via feedback from air particle monitors that measure air quality every ten seconds over the course of several months. Each of the approximately 300 homes is fit with two such monitors which results in roughly 1-2 million data points per home throughout the course of the study. The data are transmitted in near real-time to web servers where they are converted to time-series graphs, allowing study personnel to expeditiously appraise participant progress and to identify and react to equipment failures. Once a home has completed the intervention, substantial processing is required to convert the data to a format that is appropriate for analysis. Many options exist when interpreting the data, including whether to assess the study as a whole or focus on single-home units. In either case, effects can be gauged graphically or through statistical procedures. The data time-scale can range from the order of seconds to weeks, which significantly affects analysis. Additionally, missing and anomalous data must also be taken into account. This presentation will provide an overview of the management and interpretation complications that occur when big data moves from the abstract to tangible. Ultimately, the processing of such data should “automatically” be converted in real-time to analytical products in that serve practical and scientific outcomes.
Akshay Pottathil (Geography, CICS)
Title: Avoiding Systematic Breakdowns: A Network Analytics Approach to Continuity of Operations.
Co-Authors: O'Connell, C., et al.
The purpose of this research was to examine the web of global security concerns from a network perspective and map salient relationships among constituent elements. An action research process was utilized to identify relationships between the seemingly disparate elements of crime, corruption, proliferation, sustainability, and health and disaster management currently plaguing the Middle East and North Africa, and illustrate how these elements connect, overlap, and potentially fuel each other. By visualizing the connections between actors within the arena of global security using social network analysis software, a holistic understanding of influence within criminal networks was achieved. Visually coding connections between elements helped researchers determine the strength and validity of current research in multiple domains, exposed non-obvious relationships between elements, and revealed knowledge gaps in regards to geographic and attribute spaces.