Skip to main content

News

Texas Robotics faculty present 36 papers at ICRA 2026

Texas Robotics core faculty are co-authors on 36 papers that will be presented at the 2026 IEEE International Conference on Robotics and Automation (ICRA), taking place June 1–5 in Vienna, Austria. 

Texas Robotics faculty present 36 papers at ICRA 2026

Texas Robotics core faculty are co-authors on 36 papers that will be presented at the 2026 IEEE International Conference on Robotics and Automation (ICRA), taking place June 1–5 in Vienna, Austria. ICRA is the flagship conference of the IEEE Robotics and Automation Society and the largest and most impactful robotics research gathering in the world. 

Notably, three of those papers have been named ICRA 2026 Best Paper Award Finalists, a highly competitive distinction recognizing the most outstanding contributions to the field. 

The papers reflect the breadth of work underway across Texas Robotics labs, spanning topics including humanoid locomotion, robot manipulation, autonomous navigation, surgical and medical robotics, tactile sensing, and the integration of large language models into robot planning and control. Several papers address robotics in service of human accessibility, including guide dog robots and navigation assistance for blind and low-vision individuals. The research represented here is the product of collaboration across 25+ partner organizations in the United States and around the world, reflecting the global reach of the Texas Robotics research community.


The 36 papers are listed below, with the names of Texas Robotics core faculty in bold:

 

Accepted Papers

 

Best Paper Award Finalists

 

  1. Geometry-Aware Visual Odometry for Bronchoscopic Navigation Via High-Gain Observer Fusion 

    Mohammadreza Kasaei, Francis Xiatian Zhang, Feng Li, Farshid Alambeigi, Kev Dhaliwal, Mohsen Khadem 

    In collaboration with: University of Edinburgh

    Abstract: Navigational bronchoscopy is critical for pulmonary interventions, yet current platforms depend heavily on pre-operative CT or external sensors, limiting their use in critical care and resource-constrained settings. Vision-only navigation offers a scalable alternative, but conventional visual odometry (VO) struggles with texture-poor airway images, specularities, and the vanishing-point singularities of tubular anatomy, leading to frequent tracking failures and drift. We present a geometry-aware VO framework that explicitly leverages vanishing-point cues from airway lumens. Detected lumens are back-projected to 3D rays, whose weighted fusion yields a stable forward heading even when parallax cues are absent. This heading, together with looming-based velocity estimates, is fused with noisy VO outputs using a bespoke high-gain observer that enforces airway-following priors and rejects drift. We validate the method on ex-vivo mechanically ventilated human lungs using electromagnetic tracking as ground truth. Compared to state-of-the-art pipelines (ORB-SLAM2, LoFTR-VO, DPVO), our approach reduces absolute trajectory error by more than 50% and achieves the lowest relative pose error across all test sequences.

 

  1. GuideTWSI: A Diverse Tactile Walking Surface Indicator Dataset from Synthetic and Real-World Images for Blind and Low-Vision Navigation  

    Hochul Hwang, Soowan Yang, Nhat Hong Anh Nguyen, Parth Goel, Krisha Adhikari, Sunghoon Ivan Lee, Joydeep Biswas, Nicholas Giudice, Donghyun Kim 

    In collaboration with: University of Massachusetts Amherst

    Abstract: Tactile Walking Surface Indicators (TWSIs) are safety-critical landmarks that blind and low-vision (BLV) pedestrians use to locate crossings and hazard zones. From our observation sessions with BLV guide dog handlers, trainers, and an O&M specialist, we confirmed the critical importance of reliable and accurate TWSI segmentation for navigation assistance of BLV individuals. Achieving such reliability requires large-scale annotated data. However, TWSIs are severely underrepresented in existing urban perception datasets, and even existing dedicated paving datasets are limited: they lack robot-relevant viewpoints (e.g., egocentric or top-down) and are geographically biased toward East Asian directional bars - raised parallel strips used for continuous guidance along sidewalks. This narrow focus overlooks truncated domes - rows of round bumps used primarily in North America and Europe as detectable warnings at curbs, crossings, and platform edges. As a result, models trained only on bar-centric data struggle to generalize to dome based warnings, leading to missed detections and false stops in safety critical environments. We introduce GuideTWSI, the largest and most diverse TWSI dataset, which combines a photorealistic synthetic dataset, carefully curated open-source tactile data, and quadruped real-world data collected and annotated by the authors. Notably, we developed an Unreal Engine–based synthetic data generation pipeline to obtain segmented, labeled data across diverse materials, lighting conditions, weather, and robot-relevant viewpoints. Extensive evaluations show that synthetic augmentation improves truncated dome segmentation across diverse state-of-the-art models, with gains of up to +29 mIoU points, and enhances cross-domain robustness. Moreover, real-robot experiments demonstrate accurate stoppings at truncated domes, with high repeatability and stop success rates (96.15%). The GuideTWSI dataset, model weights, and code will be publicly released.

     
  2. OmniRetarget: Interaction-Preserving Data Generation for Humanoid Whole-Body Loco-Manipulation and Scene Interaction 

    Lujie Yang, Xiaoyu Huang, Zhen Wu, Angjoo Kanazawa, Pieter Abbeel, Carmelo Sferrazza, Karen Liu, Yan Duan, Guanya Shi 

    In collaboration with: Amazon, Carnegie Mellon University, Massachusetts Institute of Technology, Stanford University, University of California Berkeley

    Abstract: A dominant paradigm for teaching humanoid robots complex skills is to retarget human motions as kinematic references to train reinforcement learning (RL) policies. However, existing retargeting pipelines often struggle with the significant embodiment gap between humans and robots, producing physically implausible artifacts like foot-skating and penetration. More importantly, common retargeting methods neglect the rich human-object and human-environment interactions essential for expressive locomotion and loco-manipulation. To address this, we introduce OmniRetarget, an interaction-preserving data generation engine based on an interaction mesh that explicitly models and preserves the crucial spatial and contact relationships between an agent, the terrain, and manipulated objects. By minimizing the Laplacian deformation between the human and robot meshes while enforcing kinematic constraints, OmniRetarget generates kinematically feasible trajectories. Moreover, preserving task-relevant interactions enables efficient data augmentation, from a single demonstration to different robot embodiments, terrains, and object configurations. We comprehensively evaluate OmniRetarget by retargeting motions from OMOMO, LAFAN1, and our in-house MoCap datasets, generating over 8-hour trajectories that achieve better kinematic constraint satisfaction and contact preservation than widely used baselines. Such high-quality data enables proprioceptive RL policies to successfully execute long-horizon (up to 30 seconds) parkour and loco-manipulation skills on a Unitree G1 humanoid, trained with only 5 reward terms and simple domain randomization shared by all tasks, without any learning curriculum. All code, retargeted datasets, and result videos can be found at https://omniretarget.github.io.
     
     

Additional Accepted Papers

 

  1. A Champion-Level Vision-Based Reinforcement Learning Agent for Competitive Racing in Gran Turismo 7 

    Hojoon Lee, Takuma Seno, Jun Jet Tai, Kaushik Subramanian, Kenta Kawamoto, Peter Stone, Peter Wurman  

    In collaboration with: KAIST, Sony

    Abstract: Deep reinforcement learning has achieved super-human racing performance in high-fidelity simulators like Gran Turismo 7 (GT7). It typically utilizes global features that require instrumentation external to a car, such as precise localization of agents and opponents, limiting real-world applicability. To address this limitation, we introduce a vision-based autonomous racing agent that relies solely on ego-centric camera views and onboard sensor data, eliminating the need for precise localization during inference. This agent employs an asymmetric actor-critic framework: the actor uses a recurrent neural network with the sensor data local to the car to retain track layouts and opponent positions, while the critic accesses the global features during training. Evaluated in GT7, our agent consistently outperforms model predictive control drivers. To our knowledge, this work presents the first vision-based autonomous racing agent to demonstrate champion-level performance in competitive racing scenarios.

     
  2. A Closed-Chain Approach to Generating Affordance Joint Trajectories for Robotic Manipulators 

    Janak Panthi, Farshid AlambeigiMitchell Pryor

    Abstract: Robots operating in unpredictable environments require versatile, hardware-agnostic frameworks capable of adapting to various tasks. While a recent screw-based affordance approach shows promise, it faces challenges in avoiding undesirable configurations, singularity navigation, and task success prediction. To address these limitations, we propose a novel framework that incorporates gripper orientation control and generates complete joint trajectories in real time for screw-based task-affordance execution. Our method models the affordance and manipulator as a closed-chain mechanism, introducing an innovative approach to solving closed-chain inverse kinematics. It encapsulates task constraints and simplifies task definitions, while remaining hardware and robot agnostic, robust to errors, and invariant to the initial grasp. We validate our framework with simulations on a UR5 robot and real-world implementation on a Boston Dynamics Spot robot. Our experiments demonstrate rapid joint trajectory generation (0.0077 - 0.098s) for various tasks, including a 420-degree valve turn with consideration of the gripper orientation. Comparison with the state-of-the-art methods shows a 4x improvement in planning time, reduced joint movement, and achievement of greater task goals. Video demonstrations and the open-source code for this project are available online.

     
  3. A Single-Fiber Optical Frequency Domain Reflectometry (OFDR)-Based Shape Sensing of Concentric Tube Steerable Drilling Robots 

    Yash Kulkarni, Mobina Tavangarifard, Daniyal Maroufi, Mohsen Khadem, Justin E. Bird, Jeff Siewerdsen, Farshid Alambeigi

    In collaboration with: Johns Hopkins University, University of Edinburgh, University of Texas M.D. Anderson Cancer Center

    Abstract: This paper introduces a novel shape-sensing approach for Concentric Tube Steerable Drilling Robots (CT-SDRs) based on Optical Frequency Domain Reflectometry (OFDR). Unlike traditional FBG-based methods, OFDR enables continuous strain measurement along the entire fiber length with enhanced spatial resolution. In the proposed method, a Shape Sensing Assembly (SSA) is first fabricated by integrating a single OFDR fiber with a flat NiTi wire. The calibrated SSA is then routed through and housed within the internal channel of a flexible drilling instrument, which is guided by the pre-shaped NiTi tube of the CT-SDR. In this configuration, the drilling instrument serves as a protective sheath for the SSA during drilling, eliminating the need for integration or adhesion to the instrument surface that is typical of conventional optical sensor approaches. The performance of the proposed SSA, integrated within the cannulated CT-SDR, was thoroughly evaluated under free-bending conditions and during drilling along multiple J-shaped trajectories in synthetic Sawbones phantoms. Results demonstrate accurate and reliable shape-sensing capability, confirming the feasibility and robustness of this integration strategy.

     
  4. BEV-Patch-PF: Particle Filtering with BEV-Aerial Feature Matching for Off-Road Geo-Localization 

    Dongmyeong Lee, Jesse Quattrociocchi, Christian Ellis, Rwik Rana, Amanda Adkins, Adam Uccello, Garrett Warnell, Joydeep Biswas 

    In collaboration with: U.S. Army Research Laboratory

    Abstract: Localizing ground robots against aerial imagery provides a critical capability for autonomous navigation, especially in environments where GPS is unreliable or unavailable. This task is challenging due to large viewpoint differences and substantial environmental variability. Most prior methods localize each frame independently, using either global-descriptor retrieval or spatial feature alignment, which leaves them vulnerable to ambiguity and multi-modal pose hypotheses. While sequential reasoning can mitigate this uncertainty, adapting existing per-frame pipelines for sequential use introduces unfavorable trade-offs among accuracy, memory, and computation that limit their practical deployment. We propose BEV-Patch-PF, a vision-only, GPS-free sequential geo-localization system that integrates particle filtering with learned bird’s-eye-view (BEV) and aerial feature maps. For each 3-DoF particle pose hypothesis, we crop the corresponding patch from an aerial feature map computed from a local aerial image centered on the predicted pose. The resulting BEV–aerial feature match defines a per-particle log-likelihood for particle-filter updates. In addition, we learn a frame-level uncertainty estimate that adaptively flattens the observation likelihood for unreliable observations, preventing overconfident particle collapse in ambiguous regions. On two real-world off-road datasets, our method achieves 9.7 lower absolute trajectory error (ATE) on seen routes and 6.6 lower ATE on unseen routes than a retrieval-based baseline, while remaining robust under partial canopy cover and shadowing. The system runs in real time at 10 Hz on an NVIDIA Tesla T4, enabling practical robot deployment.

     
  5. CAVER: Curious AudioVisual Exploring Robot 

    Luca Macesanu, Boueny Folefack, Ruchira Ray, Ben Abbatematteo, Roberto Martín-Martín

    Abstract: Multimodal audiovisual perception can enable new avenues for robotic manipulation, from better material classification to the imitation of demonstrations for which only audio signals are available (e.g., playing a tune by ear). However, to unlock such multimodal potential, robots need to learn the correlations between an object’s visual appearance and the sound it generates when they interact with it. Such an active sensorimotor experience requires new interaction capabilities, representations, and exploration methods to guide the robot in efficiently building increasingly rich audiovisual knowledge. In this work, we present CAVER, a novel robot that builds and utilizes rich audiovisual representations of objects. CAVER includes three novel contributions: 1) a novel 3D printed end- effector, attachable to parallel grippers, that excites objects’ audio responses, 2) an audiovisual representation that combines local and global appearance information with sound features, and 3) an exploration algorithm that uses and builds the audiovisual representation in a curiosity-driven manner that prioritizes interacting with high uncertainty objects to obtain good coverage of surprising audio with fewer interactions. We demonstrate that CAVER builds rich representations in different scenarios more efficiently than several exploration baselines, and that the learned audiovisual representation leads to significant improvements in material classification and the imitation of audio-only human demonstrations.

     
  6. C-Free-Uniform: A Map-Conditioned Trajectory Sampler for Model Predictive Path Integral Control 

    Yukang Cao, Rahul Moorthy Mahesh, Oguzhan Goktug Poyrazoglu, Volkan Isler

    In collaboration with: University of Minnesota

    Abstract: Trajectory sampling is a key component of sampling-based control mechanisms. Trajectory samplers rely on control input samplers, which generate control inputs u from a distribution p(u | x) where x is the current state. We introduce the notion of Free Configuration Space Uniformity (C-Free-Uniform for short) which has two key features: (i) the generated control input can be used to uniformly sample the free configuration space, and (ii) in contrast to previously introduced trajectory sampling mechanisms where the distribution p(u | x) is independent of the environment, C-Free-Uniform is explicitly conditioned on the current local map. Next, we integrate this sampler into a new Model Predictive Path Integral (MPPI) Controller, CFU-MPPI. Experiments show that CFU-MPPI outperforms existing methods in terms of success rate in challenging navigation tasks in cluttered polygonal environments while requiring a much smaller sampling budget. Code: https://github.com/ogpoyrazoglu/cuniform_sampling.

     
  7. CLOVER: Context-Aware Long-Term Object Viewpoint and Environment Invariant Representation Learning 

    Amanda Adkins, Dongmyeong Lee, Joydeep Biswas 

    Abstract: Mobile service robots can benefit from object-level understanding of their environments, including the ability to distinguish object instances and re-identify previously seen instances. Object re-identification is challenging across different viewpoints and in scenes with significant appearance variation arising from weather or lighting changes. Existing works on object re-identification either focus on specific classes or require foreground segmentation. Further, these methods, along with object re-identification datasets, have limited consideration of challenges such as occlusions, outdoor scenes, and illumination changes. To address this problem, we introduce CODa Re-ID: an in-the-wild object re-identification dataset containing 1,037,814 observations of 557 objects across 8 classes under diverse lighting conditions and viewpoints. Further, we propose CLOVER, a representation learning method for object observations that can distinguish between static object instances without requiring foreground segmentation. We also introduce MapCLOVER, a method for scalably summarizing CLOVER descriptors for use in object maps and matching new observations to summarized descriptors. Our results show that CLOVER achieves superior performance in static object re-identification under varying lighting conditions and viewpoint changes and can generalize to unseen instances and classes.

     
  8. CoDex: Learning Compositional Dexterous Functional Manipulation without Demonstrations 

    Bowen Jiang, William Reger, Roberto Martín-Martín 

    Abstract: In this work, we study Compositional Dexterous Functional Object Manipulation (CD-FOM): tasks such as aiming and actuating a spray bottle on a plant or a glue gun on wood, which require both actuating an object's internal mechanism and controlling its pose to apply the object's function to the environment. These tasks pose significant challenges for robots due to the demanding integration of semantic understanding—of the object's function, actuation mode, and application area—with intricate physical dexterity—to manage grasp stability, movement trajectory, and actuation. We introduce CoDex, a zero-demonstration framework that autonomously discovers CD-FOM manipulation strategies. CoDex uses vision–language models (VLMs) to infer semantic constraints from the task and scene. These constraints guide analytic constrained optimization to generate a short list of functional grasp candidates that can be efficiently refined with reinforcement learning to generate full grasp–move–actuate policies transferrable from simulation to the real world. We evaluate CoDex on a 7-DoF robot arm with a 16-DoF multi-fingered hand across six CD-FOM tasks involving previously unseen objects with internal mechanisms (spray bottles, hot glue guns, air dusters, flashlights, pepper grinders) and their application to unseen target objects, showcasing its ability to autonomously discover and execute complex, physically viable dexterous behaviors without human demonstrations. More information at https://robin-lab.cs.utexas.edu/CoDex/.

     
  9. COMPASS: Cross-embOdiment Mobility Policy Via ResiduAl RL and Skill Synthesis 

    Wei Liu, Huihua Zhao, Chenran Li, Yuchen Deng, Joydeep Biswas, Yan Chang, Soha Pouya

    In collaboration with: Nvidia

    Abstract: As robots are increasingly deployed in diverse application domains, enabling robust mobility across different embodiments has become a critical challenge. Classical mobility stacks, though effective on specific platforms, require extensive per-robot tuning and do not scale easily to new embodiments. Learning-based approaches, such as imitation learning (IL), offer alternatives, but face significant limitations on the need for high-quality demonstrations for each embodiment. To address these challenges, we introduce COMPASS, a unified framework that enables scalable cross-embodiment mobility using expert demonstrations from only a single embodiment. We first pre-train a mobility policy on a single robot using IL, combining a world model with a policy model. We then apply residual reinforcement learning (RL) to efficiently adapt this policy to diverse embodiments through corrective refinements. Finally, we distill specialist policies into a single generalist policy conditioned on an embodiment embedding vector. This design significantly reduces the burden of collecting data while enabling robust generalization across a wide range of robot designs. Our experiments demonstrate that COMPASS scales effectively across diverse robot platforms while maintaining adaptability to various environment configurations, achieving a generalist policy with a success rate approximately 5X higher than the pre-trained IL policy, and further demonstrates zero-shot sim-to-real transfer.

     
  10. FORTE: Tactile Force and Slip Sensing on Compliant Fingers for Delicate Manipulation 

    Siqi Shang, Mingyo Seo, Yuke ZhuLillian Chin 

    Abstract: Handling fragile objects remains a major challenge for robotic manipulation. Tactile sensing and soft robotics can improve delicate object handling, but typically involve high integration complexity or slow response times. We address these issues through FORTE, an easy-to-fabricate tactile sensing system. FORTE uses 3D-printed fin-ray grippers with internal air channels to provide low-latency force and slip feedback. This feedback allows us to apply just enough force to grasp objects without damaging them. We can accurately estimate grasping forces from 0–8 N with an average error of 0.2 N, and detect slip events within 100 ms of occurring. FORTE can grasp a wide range of slippery, fragile, and deformable objects, including raspberries and potato chips with 92% success and achieves 93% accuracy in detecting slip events. These results highlight FORTE’s potential as a robust solution for delicate robotic manipulation. Project page: https://merge-lab.github.io/FORTE/

 

  1. Human-Centered Development of Guide Dog Robots: Quiet and Stable Locomotion Control 

    Shangqun Yu, Hochul Hwang, Trung Dang, Joydeep Biswas, Nicholas Giudice, Sunghoon Ivan Lee, Donghyun Kim 

    In collaboration with: University of Maine, University of Massachusetts Amherst

    Abstract: A quadruped robot is a promising system that can offer assistance comparable to that of guide dogs due to its similar form factor. However, various challenges remain in making these robots a reliable option for blind and low-vision (BLV) individuals. Among these challenges, noise and jerky motion during walking are critical drawbacks of existing quadruped robots. While these issues have largely been overlooked in guide dog robot research, our interviews with guide dog handlers and trainers revealed that acoustic and physical disturbances can be particularly disruptive for BLV individuals, who rely heavily on environmental sounds for navigation. To address these issues, we developed a novel walking controller for slow stepping and smooth foot swing/contact while maintaining human walking speed, as well as robust and stable balance control. The controller integrates with a perception system to facilitate locomotion over non-flat terrains, such as stairs. Our controller was extensively tested on the Unitree Go1 robot and, when compared with other control methods, demonstrated significant noise reduction -- half of the default locomotion controller. To evaluate the usability, workload, and perceived noise of the developed system from a user’s perspective, we conducted indoor walking experiments. In these tests, participants compared our controller with the robot’s default controller. The results demonstrated higher user acceptance of our controller, highlighting its potential to improve the overall user experience of robotic guide dogs.

     
  2. INTACT-GRIP: An Inflatable Tactile Gripper for Soft Manipulation and High-Resolution Texture Mapping  

    Ozdemir Can Kara, Mohammad Rafiee Javazm, Omid Rezayof, Farshid Alambeigi 

    Abstract: Robotic manipulation, especially of fragile and irregularly shaped objects, remains a significant challenge due to the need for both adaptability and precise tactile feedback. In this work, we introduce INTACT-GRIP, a robotic gripper that combines soft manipulation and high-resolution tactile sensing for inflation-based soft grasping. INTACT-GRIP integrates inflatable balloons with vision-based tactile feedback, enabling fingertip stiffness modulation for stable and damage-free manipulation of fragile and irregularly shaped objects. To evaluate its performance, we conducted a series of qualitative and quantitative experiments. In these experiments, inflation pressure was manually controlled by a human operator, who adjusted and stopped the pressure based on real-time visual feedback of the captured texture features. The results demonstrate the system’s ability to safely conform to fragile and irregularly shaped objects with varying stiffness, enabling pressure-controlled grasping and high-resolution tactile imaging during contact. Furthermore, a case study with a robotic arm highlighted the system’s potential as a versatile solution for precise and soft manipulation of delicate objects, supported by pressure-adjustable fingertips and real-time visual–tactile feedback.

     
  3. Inferring Foresightedness in Dynamic Noncooperative Games 

    Cade Armstrong, Ryan Park, Xinjie Liu, Kushagra Gupta, David Fridovich-Keil 

    Abstract: Dynamic game theory is an increasingly popular tool for modeling multi-agent, e.g. human-robot, interactions. Game-theoretic models presume that each agent wishes to minimize a private cost function that depends on others’ actions. These games typically evolve over a fixed time horizon, specifying how far into the future each agent plans. In practical settings, however, decision-makers may vary in foresightedness, or how much they care about their current cost in relation to their past and future costs. We conjecture that quantifying and estimating each agent’s foresightedness from online data will enable safer and more efficient interactions with other agents. To this end, we frame this inference problem as an inverse dynamic game. We consider a specific objective function parametrization that smoothly interpolates myopic and farsighted planning. Games of this form are readily transformed into parametric mixed complementarity problems; we exploit the directional differentiability of solutions to these problems with respect to their hidden parameters to solve for agents’ foresightedness. We conduct three experiments: one with synthetically generated delivery robot motion, one with real-world data involving people walking, biking, and driving vehicles, and one using high-fidelity simulators. The results of these experiments demonstrate that explicitly inferring agents’ foresightedness enables game-theoretic models to make 33% more accurate models for agents’ behavior.

     
  4. LAD-VF: LLM-Automatic Differentiation Enables Fine-Tuning-Free Robot Planning from Formal Methods Feedback 

    Yunhao Yang, Junyuan Hong, Gabriel Jacob Perin, Zhiwen Fan, Li Yin, Zhangyang (Atlas) Wang, Ufuk Topcu 

    In collaboration with: SylphAI, University of São Paulo

    Abstract: Large language models (LLMs) can translate natural language instructions into executable action plans for robotics, autonomous driving, and other domains. Yet, deploying LLM-driven planning in the physical world demands strict adherence to safety and regulatory constraints, which current models often violate due to hallucination or weak alignment. Traditional data-driven alignment methods, such as Direct Preference Optimization (DPO), require costly human labeling, while recent formal-feedback approaches still depend on resource-intensive fine-tuning. In this paper, we propose LAD-VF, a fine-tuning-free framework that leverages formal verification feedback for automated prompt engineering. By introducing a formal-verification-informed text loss integrated with LLM-AutoDiff, LAD-VF iteratively refines prompts rather than model parameters. This yields three key benefits: (i) scalable adaptation without fine-tuning; (ii) compatibility with modular LLM architectures; and (iii) interpretable refinement via auditable prompts. Experiments in robot navigation and manipulation tasks demonstrate that LAD-VF substantially enhances specification compliance, improving success rates from 60% to over 90%. Our method thus presents a scalable and interpretable pathway toward trustworthy, formally-verified LLM-driven control systems.

     
  5. Large-Language-Model-Guided State Estimation for Partially Observable Task and Motion Planning 

    Yoonwoo Kim, Raghav Arora, Roberto Martín-MartínPeter Stone, Ben Abbatematteo, Yoonchang Sung 

    In collaboration with: Nanyang Technological University

    Abstract: Robot planning in partially observable environments, where not all objects are known or visible, is a challenging problem, as it requires reasoning under uncertainty through partially observable Markov decision processes. During the execution of a computed plan, a robot may unexpectedly observe task-irrelevant objects, which are typically ignored by naive planners. In this work, we propose incorporating two types of common-sense knowledge: (1) certain objects are more likely to be found in specific locations; and (2) similar objects are likely to be co-located, while dissimilar objects are less likely to be found together. Manually engineering such knowledge is complex, so we explore leveraging the powerful common-sense reasoning capabilities of large language models (LLMs). Our planning and execution framework, CoCo-TAMP, introduces a hierarchical state estimation that uses LLM-guided information to shape the belief over task-relevant objects, enabling efficient solutions to long-horizon task and motion planning problems. In experiments, CoCo-TAMP achieves an average reduction of 62.7% in planning and execution time in simulation, and 72.6% in real-world demonstrations, compared to a baseline that does not incorporate either type of common-sense knowledge.

     
  6. Learning Agile Striker Skills for Humanoid Soccer Robots from Noisy Sensory Input 

    Zifan Xu, Myoungkyu Seo, Dongmyeong Lee, Hao Fu, Jiaheng Hu, Jiaxun Cui, Yuqian Jiang, Zhihan Wang, Anastasiia Brund, Joydeep BiswasPeter Stone 

    Abstract: Learning fast and robust ball-kicking skills is a critical capability for humanoid soccer robots, yet it remains a challenging problem due to the need for rapid leg swings, postural stability on a single support foot, and robustness under noisy sensory input and external perturbations (e.g., opponents). This paper presents a reinforcement learning (RL)–based training pipeline that enables humanoid robots to execute robust continual ball-kicking with adaptability to different ball-goal configurations. The pipeline extends a typical teacher-student training framework--in which a teacher policy is trained with ground truth state information and the student learns to mimic it with noisy, imperfect sensing--by including four training stages: (1) long-distance ball chasing (teacher); (2) directional kicking (teacher); (3) teacher policy distillation (student), and (4) student adaptation and refinement (student). Key design elements--including tailored reward functions, realistic noise modeling, and online constrained RL for adaptation and refinement--are critical for closing the sim-to-real gap and sustaining performance under perceptual uncertainty. Extensive evaluations in both simulation and on a real robot demonstrate strong kicking accuracy and goal-scoring success across diverse ball–goal configurations. Ablation studies further highlight the necessity of the constrained RL, noise modeling, and the adaptation stage. This work presents a training pipeline for robust continual humanoid ball-kicking under imperfect perception, establishing a benchmark task for visuomotor skill learning in humanoid whole-body control.

     
  7. Mash, Spread, Slice! Learning to Manipulate Object States Via Visual Spatial Progress 

    Priyanka Mandikal, Jiaheng Hu, Shivin Dass, Sagnik Majumder, Roberto Martín-Martín, Kristen Grauman 

    Abstract: Most robot manipulation focuses on changing the kinematic state of objects: picking, placing, opening, or rotating them. However, a wide range of real-world manipulation tasks involve a different class of object state change—such as mashing, spreading, or slicing—where the object’s physical and visual state evolve progressively without necessarily changing its position. We present SPARTA, the first unified framework for the family of object state change manipulation tasks. Our key insight is that these tasks share a common structural pattern: they involve spatially-progressing, object-centric changes that can be represented as regions transitioning from an actionable to a transformed state. Building on this insight, SPARTA integrates spatially progressing object change segmentation maps, a visual skill to perceive actionable vs. transformed regions for specific object state change tasks, to generate a) structured policy observations that strip away appearance variability, and b) dense rewards that capture incremental progress over time. These are leveraged in two SPARTA policy variants: reinforcement learning for fine-grained control without demonstrations or simulation; and greedy control for fast, lightweight deployment. We validate SPARTA on a real robot for three challenging tasks across 10 diverse real-world objects, achieving significant improvements in training time and accuracy over sparse rewards and visual goal-conditioned baselines. Our results highlight progress-aware visual representations as a versatile foundation for the broader family of object state manipulation tasks. More information at https://vision.cs.utexas.edu/projects/sparta-robot

     
  8. MimicDroid: In-Context Learning for Humanoid Robot Manipulation from Human Play Videos 

    Rutav Shah, Shuijing Liu, Qi Wang, Zhenyu Jiang, Sateesh Kumar, Mingyo Seo, Roberto Martín-Martín, Yuke Zhu 

    Abstract: We aim to enable humanoid robots to efficiently solve new manipulation tasks from a few video examples. In-context learning (ICL) is a promising framework for achieving this goal due to its test-time data efficiency and rapid adaptability. However, current ICL methods rely on labor-intensive teleoperated data for training, which restricts scalability. We propose using human play videos—continuous, unlabeled videos of people interacting freely with their environment—as a scalable and diverse training data source. We introduce MimicDroid, which enables humanoids to perform ICL using human play videos as the only training data. MimicDroid extracts trajectory pairs with similar manipulation behaviors and trains the policy to predict the actions of one trajectory conditioned on the other. Through this process, the model acquired ICL capabilities for adapting to novel objects and environments at test time. To bridge the embodiment gap, MimicDroid first retargets human wrist poses estimated from RGB videos to the humanoid, leveraging kinematic similarity. It also applies random patch masking during training to reduce overfitting to human-specific cues and improve robustness to visual differences. To evaluate few-shot learning for humanoids, we introduce an open-source simulation benchmark with increasing levels of generalization difficulty. MimicDroid outperformed state-of-the-art methods and achieved nearly a twofold higher success rate in the real world. Additional materials can be found on: ut-austin-rpl.github.io/MimicDroid

     
  9. MINT: A Vision-Based Soft Sensor for Mutual Integration of Normal Interaction Force and Texture Perception  

    Mohammad Rafiee Javazm, Siddhartha Kapuria, Ozdemir Can Kara, Sonika Kiehler, Rami Hamada, Joga Ivatury, Farshid Alambeigi 

    Abstract: Inspired by the design of Vision-based Tactile Sensors (VTSs) and soft resistive strain sensors, in this paper, we introduce MINT: a vision-based soft sensor for Mutual Integration of Normal interaction force and Texture perception. MINT is a hybrid vision-based tactile sensor that simultaneously integrates normal force measurement with high-resolution texture perception. This unique sensor utilizes a soft resistive strain sensor between the Gel Layer and Mirror Layer of a typical VTS. By combining electrical and visual sensing modalities, MINT overcomes the limitations of existing resistive sensors and VTSs, offering a robust, efficient, and scalable solution for direct measurement of force and texture capture. To evaluate MINT’s functionality, we first propose a unique design and fabrication procedure. Next, we conduct a series of experiments, evaluating its force and texture sensing capabilities through interactions with various rigid objects.

     
  10. Mixed-Initiative Dialog for Human-Robot Collaborative Manipulation 

    Albert Yu, Chengshu Li, Luca Macesanu, Arnav Balaji, Ruchira Ray, Raymond Mooney, Roberto Martín-Martín 

    In collaboration with: Stanford University

    Abstract: Effective robotic systems for long-horizon human-robot collaboration must adapt to a wide range of human partners, whose physical behavior, willingness to assist, and understanding of the robot's capabilities may change over time. This demands a tightly coupled communication loop that grants both agents the flexibility to propose, accept, or decline requests as they coordinate toward completing the task effectively. We propose MICoBot, a system that enables the human and robot, both using natural language, to take initiative in formulating, accepting, or rejecting proposals on who can best complete different steps of a task. To handle diverse, task-directed dialog, and find successful collaborative strategies that minimize human effort, MICoBot makes decisions at three levels: (1) a meta-planner considers human dialog to formulate and code a high-level collaboration strategy, (2) a planner optimally allocates the remaining steps to either agent based on the robot's capabilities (measured by a simulation-pretrained affordance model) and the estimated human's willingness to help, and (3) an action executor decides the low-level actions to perform or words to say to the human. In physical robot trials with 18 unique human participants, MICoBot significantly improves task success and user experience over a pure LLM baseline and standard agent allocation models.

     
  11. OVerSeeC: Open-Vocabulary Costmap Generation from Satellite Images and Natural Language 

    Rwik Rana, Jesse Quattrociocchi, Dongmyeong Lee, Christian Ellis, Amanda Adkins, Adam Uccello, Garrett Warnell, Joydeep Biswas 

    In collaboration with: U.S. Army Research Laboratory

    Abstract: Aerial imagery provides essential global context for autonomous navigation, enabling route planning at scales inaccessible to onboard sensing. We address the problem of generating global costmaps for long-range planning directly from satellite imagery when entities and mission-specific traversal rules are expressed in natural language at test time. This setting is challenging since mission requirements vary, terrain entities may be unknown at deployment, and user prompts often encode compositional traversal logic. Existing approaches relying on fixed ontologies and static cost mappings cannot accommodate such flexibility. While foundation models excel at language interpretation and open-vocabulary perception, no single model can simultaneously parse nuanced mission directives, locate arbitrary entities in large-scale imagery, and synthesize them into an executable cost function for planners. We therefore propose OVerSeeC, a zero-shot modular framework that decomposes the problem into Interpret–Locate–Synthesize: (i) an LLM extracts entities and ranked preferences, (ii) an open-vocabulary segmentation pipeline identifies these entities from high-resolution imagery, and (iii) the LLM uses user's natural language preferences and masks to synthesize executable costmap code. Empirically, OVerSeeC handles novel entities, respects ranked and compositional preferences, and produces routes consistent with human-drawn trajectories across diverse regions, demonstrating robustness to distribution shifts. This shows that modular composition of foundation models enables open-vocabulary, preference-aligned costmap generation for scalable, mission-adaptive global planning.

     
  12. PACER: Preference-Conditioned All-Terrain Costmap Generation  

    Luisa Mao, Garrett Warnell, Peter StoneJoydeep Biswas 

    In collaboration with: U.S. Army Research Laboratory

    Abstract: In autonomous robot navigation, terrain cost assignment is typically performed using a semantics-based paradigm in which terrain is first labeled using a pre-trained semantic classifier and costs are then assigned according to a user-defined mapping between label and cost. While this approach is rapidly adaptable to changing user preferences, only preferences over the types of terrain that are already known by the semantic classifier can be expressed. In this paper, we hypothesize that a machine-learning-based alternative to the semantics-based paradigm above will allow for rapid cost assignment adaptation to preferences expressed over new terrains at deployment time without the need for additional training. To investigate this hypothesis, we introduce and study PACER, a novel approach to costmap generation that accepts as input a single birds-eye view (BEV) image of the surrounding area along with a user-specified preference context and generates a corresponding BEV costmap that aligns with the preference context. Using both real and synthetic data along with a combination of proposed training tasks, we find that PACER is able to adapt quickly to new user preferences while also exhibiting better generalization to novel terrains compared to both semantics-based and representation-learning approaches.

     
  13. Rapid Adaptation of Particle Dynamics for Generalized Deformable Object Mobile Manipulation 

    Bohan Wu, Roberto Martín-Martín, Li Fei-Fei 

    In collaboration with: Stanford University

    Abstract: We address the challenge of learning to manipulate deformable objects with unknown dynamics. In non-rigid objects, the dynamics parameters define how they react to interactions—how they stretch, bend, compress, and move—and they are critical to determining the optimal actions to perform a manipulation task successfully. In other robotic domains, such as legged locomotion and in-hand rigid object manipulation, state-of-the-art approaches can handle unknown dynamics using Rapid Motor Adaptation (RMA). Through a supervised procedure in simulation that encodes each rigid object's dynamics, such as mass and position, these approaches learn a policy that conditions actions on a vector of latent dynamic parameters inferred from sequences of state-actions. However, in deformable object manipulation, the object's dynamics not only includes its mass and position, but also how the shape of the object changes. Our key insight is that the recent ground-truth particle positions of a deformable object in simulation capture changes in the object's shape, making it possible to extend RMA to deformable object manipulation. This key insight allows us to develop RAPiD, a two-phase method that learns to perform real-robot deformable object mobile manipulation by: 1) learning a visuomotor policy conditioned on the object's dynamics embedding, which is encoded from the object's privileged information in simulation, such as its mass and ground-truth particle positions, and 2) learning to infer this embedding using non-privileged information instead, such as robot visual observations and actions, so that the learned policy can transfer to the real world. On a mobile manipulator with 22 degrees of freedom, RAPiD enables over 80%+ success rates across two vision-based deformable object mobile manipulation tasks in the real world, under unseen object dynamics, categories, and instances.

     
  14. Real-Time Decoding of Movement Onset and Offset for Brain-Controlled Rehabilitation Exoskeleton  

    Kanishka Mitra, Satyam Kumar, Frigyes Samuel Racz, Deland Hu Liu, Ashish Deshpande, José del R. Millán 

    In collaboration with: Massachusetts Institute of Technology, Texas Instruments

    Abstract: Robot-assisted therapy can deliver high-dose, task-specific training after neurologic injury, but most systems act primarily at the limb level - engaging the impaired neural circuits only indirectly - which remains a key barrier to truly contingent, neuroplasticity-targeted rehabilitation. We address this gap by implementing online, dual-state motor imagery control of an upper-limb exoskeleton, enabling goal-directed reaches to be both initiated and terminated directly from noninvasive EEG. Eight participants used EEG to initiate assistance and then volitionally halt the robot mid-trajectory. Across two online sessions, group-mean hit rates were 61.5% for onset and 64.5% for offset, demonstrating reliable start–stop command delivery despite instrumental noise and passive arm motion. Methodologically, we reveal a systematic, class-driven bias induced by common task-based recentering using an asymmetric margin diagnostic, and we introduce a class-agnostic fixation-based recentering method that tracks drift without sampling command classes while preserving class geometry. This substantially improves threshold-free separability (AUC gains: onset +56%, p=0.0117; offset +34%, p=0.0251) and reduces bias within and across days. Together, these results help bridge offline decoding and practical, intention-driven start–stop control of a rehabilitation exoskeleton, enabling precisely timed, contingent assistance aligned with neuroplasticity goals while supporting future clinical translation.

 

  1. Searching in Space and Time: Unified Memory-Action Loops for Open-World Object Retrieval 

    Taijing Chen, Sateesh Kumar, Junhong Xu, Georgios Pavlakos, Joydeep BiswasRoberto Martín-Martín 

    Abstract: Service robots must retrieve objects in dynamic, open-world settings where requests may reference attributes (“the red mug”), spatial context (“the mug on the table”), or past states (“the mug that was here yesterday”). Existing approaches capture only parts of this problem: scene graphs capture spatial relations but ignore temporal grounding, temporal reasoning methods model dynamics but do not support embodied interaction, and dynamic scene graphs handle both but remain closed-world with fixed vocabularies. We present STAR (SpatioTemporal Active Retrieval), a framework that unifies memory queries and embodied actions within a single decision loop. STAR leverages a non-parametric long-term memory and a working memory to support efficient recall, and uses a vision-language model to select either temporal or spatial actions at each step. We introduce STARBench, a benchmark of spatiotemporal object search tasks across simulated and real environments. Experiments on STARBench and on a Tiago robot show that STAR consistently outperforms scene-graph and memory-only baselines, demonstrating the benefits of treating search in time and search in space as a unified problem.

 

  1. SCIZOR: A Self-Supervised Approach to Data Curation for Large-Scale Imitation Learning  

    Yu Zhang, Yuqi Xie, Huihan Liu, Rutav Shah, Michael Wan, Linxi Fan, Yuke Zhu 

    In collaboration with: Nvidia

    Abstract: Imitation learning advances robot capabilities by enabling the acquisition of diverse behaviors from human demonstrations. However, large-scale datasets used for policy training often introduce substantial variability in quality, which can negatively impact performance. As a result, automatically curating datasets by filtering low-quality samples to improve quality becomes essential. Existing robotic curation approaches rely on costly manual annotations and perform curation at a coarse granularity, such as the dataset or trajectory level, failing to account for the quality of individual state-action pairs. To address this, we introduce SCIZOR, the first self-supervised transition-level curation framework that requires no annotations and scales to large-scale datasets to improve the performance of imitation learning policies and modern Vision-Language-Action (VLA) models. SCIZOR targets two complementary sources of low-quality data: suboptimal data, which hinders learning with undesirable actions, and redundant data, which dilutes training with repetitive patterns. SCIZOR leverages a self-supervised task progress predictor for suboptimal data to remove samples lacking task progression, and a deduplication module operating on joint state-action representation for samples with redundant patterns. Empirically, we show that SCIZOR enables imitation learning policies and modern VLA models to achieve higher performance with less data, yielding an average improvement of 15.4% across multiple benchmarks. More information is available at: https://scizor-icra2026.github.io

     
  2. Soft Vortex Gripper for Dexterous Manipulation Using Hand-Like Robots 

    Martin Kojouharov, Dong Ho Kang, Drake Rowland, Roman Mykhailyshyn, Luis SentisAnn Majewicz Fey 

    In collaboration with: National Institute of Advanced Industrial Science and Technology (AIST)

    Abstract: Dexterous manipulation remains a constant challenge in robotics, particularly in achieving precise in-hand manipulation, force modulation, and spatial positioning. There have been many attempts at solving these issues, with varying degrees of success. These attempts include friction-enhancing surfaces, gecko-inspired adhesives, electrostatic grippers, and suction-based mechanisms which are limited by surface dependency and inadequate adaptability. We propose integrating a soft vortex gripper with rigid nozzles into the fingertips of a hand-like robotic manipulator. This design combines the malleability of soft silicone materials for delicate grasping tasks with the strength of rigid components to maintain consistent vortex formation under pressure load. The integrated gripper enhances surface friction, enables adhesion to irregular geometries, and provides more precise pressure control. We programmed and mounted the soft vortex gripper onto the fingertip of a robotic hand, which was then installed on a Roboligent OPTIMO 7-DOF robotic arm. We tested square, tapered, and rounded gripping surfaces and found that the square-faced design achieved the highest gripping force of 0.59N at 300 kPa, outperforming others by over 31%. Using the hand-like robotic arm, we tested the embedded soft vortex gripper by extracting Jenga blocks from a fully constructed tower without pre-loosening, and pulling individual playing cards from a deck. The gripper consistently succeeded in removing singular playing cards and was able to both push and pull Jenga blocks from the tower with control and precision. The experimental results support its potential as a tool for enhancing robotic dexterity, delivering consistent results across diverse manipulation tasks.

     
  3. SoftMimicGen: A Data Generation System for Scalable Robot Learning in Deformable Object Manipulation  

    Masoud Moghani, Mahdi Azizian, Animesh Garg, Yuke Zhu, Sean Huver, Ajay Uday Mandlekar 

    In collaboration with: Georgia Institute of Technology, Nvidia, University of Toronto

    Abstract: Large-scale robot datasets have facilitated the learning of a wide range of robot manipulation skills, but these datasets remain difficult to collect and scale further, owing to the intractable amount of human time, effort, and cost required. Simulation and synthetic data generation have proven to be an effective alternative to fuel this need for data, especially with the advent of recent work showing that such synthetic datasets can dramatically reduce real-world data requirements and facilitate generalization to novel scenarios unseen in real-world demonstrations. However, this paradigm has been limited to rigid-body tasks, which are easy to simulate. Deformable object manipulation encompasses a large portion of real-world manipulation and remains a crucial gap to address towards increasing adoption of the synthetic simulation data paradigm. In this paper, we introduce SoftMimicGen, an automated data generation pipeline for deformable object manipulation tasks. We introduce a suite of high-fidelity simulation environments that encompasses a wide range of deformable objects (stuffed animal, rope, tissue, towel) and manipulation behaviors (high-precision threading, dynamic whipping, folding, pick-and-place), across four robot embodiments: a single-arm manipulator, bimanual arms, a humanoid, and a surgical robot. We apply SoftMimicGen to generate datasets across the task suite, train high-performing policies from the data, and systematically analyze the data generation system. Project website: softmimicgen.github.io

     
  4. Systematic Characterization of Drilling Parameters in Concentric Tube Steerable Drilling Robots: A Comparative Study  

    Daniyal Maroufi, Yash Kulkarni, Vibhu Kanna Rajesh Kanna, Jordan P. Amadio, Mohsen Khadem, Justin E. Bird, Jeff Siewerdsen, Farshid Alambeigi 

    In collaboration with: Johns Hopkins University, University of Edinburgh, University of Texas M.D. Anderson Cancer Center

    Abstract: To establish a foundational understanding for creating J-shaped trajectories with Concentric Tube Steerable Drilling Robots (CT-SDRs), this paper presents a systematic characterization of two operational factors: drill feed rate and rotational speed. We developed and compared a custom High-Speed Drill (HSD) and a Low-Speed Drill (LSD) to analyze how these parameters affect performance in flexible robotic drills versus conventional systems utilizing rigid instruments. By integrating the CT-SDRs with a seven degree-of-freedom robotic manipulator, we conducted experiments in synthetic bone phantoms of varying densities, assessing metrics such as motor current, hole diameter, radius of curvature, and drilling time. The results reveal critical performance trade-offs, demonstrating that high-speed drilling in CT-SDRs is essential for successfully penetrating dense bone. Further, we found that while slower feed rates improve trajectory accuracy and reduce hole enlargement, they significantly increase procedural time. These findings offer a quantitative guideline for design choices, component selection, and operational control of CT-SDRs tailored to patient-specific bone quality.

     
  5. Uncertainty Guided Exploratory Trajectory Optimization for Sampling-Based Model Predictive Control  

    Oguzhan Goktug Poyrazoglu, Yukang Cao, Rahul Moorthy Mahesh, Volkan Isler 

    In collaboration with: University of Minnesota

    Abstract: Trajectory optimization depends heavily on initialization. In particular, sampling-based approaches are highly sensitive to initial solutions, and limited exploration frequently leads them to converge to local minima in complex environments. We present Uncertainty Guided Exploratory Trajectory Optimization (UGE-TO), a trajectory optimization algorithm that generates well-separated samples to achieve a better coverage of the configuration space. UGE-TO represents trajectories as probability distributions induced by uncertainty ellipsoids. Unlike sampling-based approaches that explore only in the action space, this representation captures the effects of both system dynamics and action selection. By incorporating the impact of dynamics, in addition to the action space, into our distributions, our method enhances trajectory diversity by enforcing distributional separation via the Hellinger distance between them. It enables a systematic exploration of the configuration space and improves robustness against local minima. Further, we present UGE-MPC, which integrates UGE-TO into sampling-based model predictive controller methods. Experiments demonstrate that UGE-MPC achieves higher exploration and faster convergence in trajectory optimization compared to baselines under the same sampling budget, chieving 72.1% faster convergence in obstacle-free environments and 66% faster convergence with a 6.7% higher success rate in the cluttered environment compared to the best-performing baseline. Additionally, we validate the approach through a range of simulation scenarios and real-world experiments. Our results indicate that UGE-MPC has higher success rates and faster convergence, especially in environments that demand significant deviations from nominal trajectories to avoid failures. The project and code are available at https://ogpoyrazoglu.github.io/cuniform_sampling/.

     
  6. Ventura: Adapting Image Diffusion Models for Unified Task Conditioned Navigation 

    Arthur Zhang, Xiangyun Meng, Luca Callari, Dong Ki Kim, Shayegan Omidshafiei, Joydeep Biswas, Ali-akbar Agha-mohammadi, Amirreza Shaban 

    In collaboration with: Field AI, Massachusetts Institute of Technology, University of Washington

    Abstract: Robots must adapt to diverse human instructions and operate safely in unstructured, open-world environments. Recent Vision–Language models (VLMs) offer strong priors for grounding language and perception, but remain difficult to steer for navigation due to differences in action spaces and pretraining objectives that hamper transferability to robotics tasks. Towards addressing this, we introduce Ventura, a vision–language navigation system that finetunes internet-pretrained image diffusion models for path planning. Instead of directly predicting low-level actions, Ventura generates a path mask (i.e. a visual plan) in image space that captures fine-grained, context-aware navigation behaviors. A lightweight behavior-cloning policy grounds these visual plans into executable trajectories, yielding an interface that follows natural language instructions to generate diverse robot behaviors. To scale training, we supervise on path masks derived from self-supervised tracking models paired with VLM-augmented captions, avoiding manual pixel-level annotation or highly engineered data collection setups. In extensive real-world evaluations, Ventura outperforms state-of-the-art foundation model baselines on object reaching, obstacle avoidance, and terrain preference tasks, improving success rates by 33% and reducing collisions by 54% across both seen and unseen scenarios. Notably, we find that Ventura generalizes to unseen combinations of distinct tasks, revealing emergent compositional capabilities. Videos, code, and additional materials: https://venturapath.github.io.

     
  7. Why Cognitive Robotics Matters: Lessons from OntoAgent and LLM Deployment in HARMONIC for Safety-Critical Robot Teaming 

    Sanjay Oruganti, Sergei Nirenburg, Marjorie McShane, Jesse English, Michael Roberts, Christian Arndt, Ramviyas Parasuraman, Luis Sentis 

    In collaboration with: Rensselaer Polytechnic Institute, University of Georgia

    Abstract: Robots operating alongside humans must recognize what they do not know before acting, diagnose problems from domain knowledge, and reason about action consequences. These capabilities are operational requirements, not optimization targets, and their absence produces silent and unrecoverable failures. We present a first-of-its-kind controlled comparison between OntoAgent, our content-centric cognitive architecture, and six LLMs spanning frontier and efficient tiers as drop-in replacements at the strategic layer of the same robotic system in HARMONIC. LLMs fail to verify their knowledge state before acting, even when given equivalent procedural knowledge. The deficit is architectural, not knowledge-based. Knowledge-grounded architectures must retain decision authority; LLMs contribute where their strengths apply.

     
  8. You Can't Always Get What You Want: Games of Ordered Preference  

    Dong Ho Lee, Lasse Peters, David Fridovich-Keil 

    In collaboration with: Delft University of Technology

    Abstract: We study noncooperative games, in which each player's objective is composed of a sequence of ordered- and potentially conflicting-preferences. Problems of this type naturally model a wide variety of scenarios: for example, drivers at a busy intersection must balance the desire to make forward progress with the risk of collision. Mathematically, these problems possess a nested structure, and to behave properly players must prioritize their most important preference, and only consider less important preferences to the extent that they do not compromise performance on more important ones. We consider multi-agent, noncooperative variants of these problems, and seek generalized Nash equilibria in which each player's decision reflects both its hierarchy of preferences and other players' actions. We make two key contributions. First, we develop a recursive approach for deriving the first-order optimality conditions of each player's nested problem. Second, we propose a sequence of increasingly tight relaxations, each of which can be transcribed as a mixed complementarity problem and solved via existing methods. Experimental results demonstrate that our approach reliably converges to equilibrium solutions that strictly reflect players' individual ordered preferences.