Inference and Robotic Path Planning over High Dimensional Categorical Observations
John San Soucie, Ph.D., 2024
Yogesh Girdhar, Advisor
Heidi M. Sosik, Advisor
Advances in marine autonomy, deep-learning, and in-situ marine sensing technology have enabled oceanographers to collect vast amounts of spatiotemporally-distributed, sparse, high-dimensional categorical data. Statistical models, particularly in streaming and computationally-constrained settings, have lagged behind data collection. Recent developments in topic modeling for robotics have highlighted the potential to efficiently extract meaningful relationships from categorical data, and adjust robotic path-planning based on real-time inference. This dissertation seeks to fill the gap in streaming statistical models for sparse, high- dimensional categorical data, in the context of open-ocean phytoplankton community ecology.
We begin by exploring the use of existing topic modeling approaches for plankton community characterization. Topic models are compared to standard ecological techniques for dimensionality reduction. The increased fidelity and expressiveness of topic models allows for greater resolution of plankton co-occurrence relationships. By analyzing these relationships and ocean physics in and around a retentive eddy, the source of phytoplankton variability is traced to storm-driven advection on the ocean surface. We conclude that topic models offer unique insights into the causal mechanisms underlying plankton community variability.
Next, we turn our focus to the development of a streaming belief model for categorical path planning. Such a model must be capable of predicting in regions without data, and it must be able to process streaming data in a computationally efficient manner. We introduce the Gaussian Dirichlet Random Field model, a novel topic model with spatially continuous latent log-probabilities. In addition to producing a more accurate model than the state-of-the-art in locations with data, the Gaussian Dirichlet Random Field model can interpolate and extrapolate. The model is initially presented with a batch hybrid Markov Chain-Monte Carlo inference procedure.
We develop a streaming fully-variational inference approach for inference, called Streaming Gaussian Dirichlet Random Fields, which satisfies both the prediction and efficiency requirements for path planning belief models. In-silico experiments demonstrate the ability of this model to accurately map latent co-occurrence patterns. Comparisons to a standard Gaussian process on both path-planning tasks and observation mapping tasks show how the ability of Streaming Gaussian Dirichlet Random Fields to leverage additional categorical observations enables superior performance.