Survey
Toward General-Purpose Robots via Foundation Models: A Survey and Meta-Analysis [Paper]
graph LR;
Start["Robotic System"]-->n1["Robot Perception"];
n1-->n1.1("Passive Perception");
n1-->n1.2("State Estimation (SLAM)");
n1.2-->n1.2.1("geometry-based solutions (traditional)");
n1.2-->n1.2.2("supervised/self-supervised methods");
n1-->n1.3("Active Perception");
Start-->n2["Robot Decision-making and Planning"];
n2-->n2.1("Classical Planning");
n2.1-->n2.1.1("Search-based Planning");
n2.1-->n2.1.2("Sampling-based Planning");
n2-->n2.2("Learning-based Planning");
n2.2-->n2.2.2("Reinforcement Learning")
Start-->n3["Robot Action Generation"];
n3-->n3.1("Classical Control");
n3.1-->n3.1.1("Proportional-Integral-Derivative (PID)");
n3.1-->n3.1.2("Model Predictive Control(MPC)");
n3.1-->n3.1.3("Model Predictive Path Integral (MPPI)");
n3-->n3.2("Learning-based Control");
n3.2-->n3.2.1("Imitation Learning");
n3.2-->n3.2.2("Reinforcement Learning");
A Survey on Multimodal Large Language Models for Autonomous Driving [Paper]
DriveLLM
DriveLLM: Charting the Path Toward Full Autonomous Driving With Large Language Models [Paper]
Observation
graph TD;
Observation-->n1["Decision-making ActionSpace"];
Observation-->MapInformation;
Observation-->PerceptionResults;
Observation-->VehicleStates;
Observation-->n2["Real-time Data"];
n2-->WeatherCondition;
n2-->n3["Long-term memory (previous learned mistakes)"];
Observation-->PassengerRequests;
Reasoning
graph TD;
Reasoning-->ActionGeneration;
ActionGeneration-->slowing-down;
ActionGeneration-->speeding-up;
ActionGeneration-->pulling-over;
ActionGeneration-->...
Decision-making
graph LR;
Decision-making-->n1["Time-based decision-making rule"];
n1-->|InTime|n2["LLM-based Action"];
n1-->|TimeOut|n3["Rule-based Action"];
Self-reflection
graph LR;
n1["Simulation Check System"]-->n2["self-reflection LLM 1"];
n3["Physical Execution Result"]-->n4["self-reflection LLM 2"];
n2-->LSTM;
n4-->LSTM;
Attack Scenario
graph LR;
n1["Adversarial Attack via Human-Machine Interactions"]-->n2["Prompt Injection"]
n2-->n3["override safety check"]
n2-->n4["override foundational programming"]
n2-->n5["leverage emergency situations"]
Evaluation
Real-time Performance:
- decision-making time
- Token-per-minute (TPM)
- Decision-per-second (DPS)
Spatial-Temporal Reasoning in Dynamic Environments:
- existing LLMs process inputs independently without temporal aspects
- less effective when interacting with multiple dynamic objects (conservative)
- Mitigation:
- Responsibility-Sensitive Safety (RSS) model
- convert temporal object information (position, velocity…) into textual data
- Responsibility-Sensitive Safety (RSS) model
Proactive Decision-making:
- model can anticipate potential challenges and react accordingly
- integrating observations with commonsense reasoning
Safety
On the Safety Concerns of Deploying LLMs/VLMs in Robotics: Highlighting the Risks and Vulnerabilities [Paper]
How Secure Are Large Language Models (LLMs) for Navigation in Urban Environments? [Paper]
Semantic anomaly detection with large language models [Paper]