Literature Review Notes

Survey

Toward General-Purpose Robots via Foundation Models: A Survey and Meta-Analysis [Paper]

graph LR;
Start["Robotic System"]-->n1["Robot Perception"];
n1-->n1.1("Passive Perception");

n1-->n1.2("State Estimation (SLAM)");
n1.2-->n1.2.1("geometry-based solutions (traditional)");
n1.2-->n1.2.2("supervised/self-supervised methods");


n1-->n1.3("Active Perception");


Start-->n2["Robot Decision-making and Planning"];
n2-->n2.1("Classical Planning");
n2.1-->n2.1.1("Search-based Planning");
n2.1-->n2.1.2("Sampling-based Planning");

n2-->n2.2("Learning-based Planning");
n2.2-->n2.2.2("Reinforcement Learning")


Start-->n3["Robot Action Generation"];
n3-->n3.1("Classical Control");
n3.1-->n3.1.1("Proportional-Integral-Derivative (PID)");
n3.1-->n3.1.2("Model Predictive Control(MPC)");
n3.1-->n3.1.3("Model Predictive Path Integral (MPPI)");

n3-->n3.2("Learning-based Control");
n3.2-->n3.2.1("Imitation Learning");
n3.2-->n3.2.2("Reinforcement Learning");

A Survey on Multimodal Large Language Models for Autonomous Driving [Paper]

DriveLLM

DriveLLM: Charting the Path Toward Full Autonomous Driving With Large Language Models [Paper]

Observation


graph TD; 
    Observation-->n1["Decision-making ActionSpace"];
    Observation-->MapInformation;
    Observation-->PerceptionResults;
    Observation-->VehicleStates;
    Observation-->n2["Real-time Data"];
    n2-->WeatherCondition;
    n2-->n3["Long-term memory (previous learned mistakes)"];
    Observation-->PassengerRequests;

Reasoning


graph TD;
    Reasoning-->ActionGeneration;
    ActionGeneration-->slowing-down;
    ActionGeneration-->speeding-up;
    ActionGeneration-->pulling-over;
    ActionGeneration-->...

Decision-making


graph LR;
    Decision-making-->n1["Time-based decision-making rule"];
    n1-->|InTime|n2["LLM-based Action"];
    n1-->|TimeOut|n3["Rule-based Action"];

Self-reflection


graph LR;
    n1["Simulation Check System"]-->n2["self-reflection LLM 1"];
    n3["Physical Execution Result"]-->n4["self-reflection LLM 2"];
    n2-->LSTM;
    n4-->LSTM;

Attack Scenario


graph LR;
    n1["Adversarial Attack via Human-Machine Interactions"]-->n2["Prompt Injection"]
    n2-->n3["override safety check"]
    n2-->n4["override foundational programming"]
    n2-->n5["leverage emergency situations"]

Evaluation

Real-time Performance:

decision-making time
- Token-per-minute (TPM)
- Decision-per-second (DPS)

Spatial-Temporal Reasoning in Dynamic Environments:

existing LLMs process inputs independently without temporal aspects
- less effective when interacting with multiple dynamic objects (conservative)
- Mitigation:
  - Responsibility-Sensitive Safety (RSS) model
    - convert temporal object information (position, velocity…) into textual data

Proactive Decision-making:

model can anticipate potential challenges and react accordingly
- integrating observations with commonsense reasoning

Safety

On the Safety Concerns of Deploying LLMs/VLMs in Robotics: Highlighting the Risks and Vulnerabilities [Paper]

How Secure Are Large Language Models (LLMs) for Navigation in Urban Environments? [Paper]

Semantic anomaly detection with large language models [Paper]

Share on

Twitter Facebook LinkedIn

Literature Review Notes

MoeBuTa

Survey

DriveLLM

Observation

Reasoning

Decision-making

Self-reflection

Attack Scenario

Evaluation

Safety

Share on

You may also enjoy

LLM for Robotics Videos

Mini Seminar

LLM for Robotics Review

Progress Meeting Notes 05/21