Skip to main content
NC State Home
A one-armed, research robot working in the Intelligent Cognitive Ergonomics Lab.

Human Cognition Models to Inspire AVs in Interaction Scenes

Autonomous vehicles struggle to predict pedestrian behavior due to uncertainty, sudden changes, and longer prediction time frames.

Last Updated: 01/21/2026 | All information is accurate and up-to-date

Issues of Pedestrian Behavior Prediction Models

  • Limited generalizability towards inherent behavioral uncertainty and contingency
    • Lower accuracy in predicting sudden behavior changes
    • Reduced performance for longer prediction horizons (2-6 seconds)
  • AV-pedestrian negotiation has higher requirements compared with safety functionalities

Possible Solutions

  • Generative models for pedestrian trajectories
    • Generating multiple trajectories or trajectory heatmap
  • Rethink how human drivers negotiate with pedestrians

Driver Scene Understanding Model

We propose the event-segmentation-based scene understanding model based on the Theory of Mind to explain driver cognition during pedestrian interactions.

Main Assumption: driver and pedestrian negotiate crossing intentions

  • Intention is a commitment to certain actions within a time boundary
  • Pedestrians have present-oriented (low-level) and future-oriented (high-level) intentions
  • Pedestrian Situated Intent (PSI) is the pedestrian’s intention to cross the conflicting area before the ego-vehicle in dynamically changing situations involving the car, pedestrian, and contextual environment.
A diagram explaining the four steps in the event-segmentation-based understanding model.
Event-segmentation-based scene understanding model
  • Step 1: A driver automatically segments perceptual inputs at a coarse level (pedestrian intention).
  • Step 2: Within each segment, drivers can predict fine-level events (i.e., pedestrian actions) more accurately by comparing working memory with long-term memory.
  • Step 3: Coarse-level segmentation boundaries are identified when the prediction of fine-level events is no longer accurate, meaning estimated pedestrian intention changes.
  • Step 4: Working memory is updated to rebuild the course level segment (pedestrian intention) boundaries and loop back to step

Video Experiment Process

  1. Ask a group of representative human drivers to estimate the pedestrian situated intent changes when watching prerecorded pedestrian encountering videos from the driver’s view.
  2. From the first frame to the last frame during the pedestrian encounter
    • Each human driver needs to estimate the pedestrian’s intent to cross in front of the car
    • Provide descriptions about the reasoning process when the intent estimation changes
    • Provide driving decisions
The view out of the windshield of a vehicle on a city street showing a pedestrian is standing between the two lanes and obeying traffic. The car ahead is slowing down.
Time: 0.099s, first frame
Estimation: Not sure
The pedestrian is standing between the two lanes and obeying traffic. The car ahead is slowing down. It is a busy road with fast moving.
The view out of the windshield of a vehicle on a city street showing a pedestrian is standing between the two lanes and obeying traffic. The car ahead has already stopped.
Time: 5.244s, 2nd pause
Estimation: Not cross
The pedestrian looks like a child. He is still standing between the two lanes and obeying traffic. He has been looking behind at the other side, his body facing diagonally, and his feet pointed in the same direction. The car ahead has already stopped. There are still cars going in his way.
The view out of the windshield of a vehicle on a city street showing a pedestrian is standing between the two lanes and making back-and-forth movements. He may be looking for an opportunity to cross.
Time: 7.042s, 3rd pause
Estimation: Cross
The pedestrian is still standing between the two lanes and making back-and-forth movements. He may be looking for an opportunity to cross. Someone in the car ahead might be calling him as well. Cars have slowed down, so he may jump to the side.
The view out of the windshield of a vehicle on a city street showing a pedestrian is standing between the two lanes and is looking for an opportunity to cross.
Time: 8.685s, final estimation
Estimation: Not cross
The pedestrian has been looking to cross to the other side. Now that the opposite lanes were empty, he started to run to the other side and would not be in front of this car even though it was closer to this side.

Experiment and Data Analysis Process

A flexible and scalable annotation tool diagram for micro-level behaviors and reasonings
Flexible and Scalable Annotation Tool for Micro-Level Behaviors and Reasonings >>> NLP-based Human Reasoning Cue Extraction Algorithm
  1. Elahi, M.F., Luo, X. and Tian, R., 2020, July. A framework for modeling knowledge graphs via processing natural descriptions of vehicle-pedestrian interactions. In International Conference on Human-Computer Interaction (pp. 40-50). Cham: Springer International Publishing.
  2. Elahi, M.F., Sreeram, J.G., Luo, X. and Tian, R., 2021, September. A Novel Adaptation of Information Extraction Algorithm to Process Natural Text Descriptions of Pedestrian Encounters. In 2021 IEEE International Intelligent Transportation Systems Conference (ITSC) (pp. 1906-1912). IEEE.
  3. Sreeram, J.G., Luo, X. and Tian, R., 2021. Contextual and Behavior Factors Extraction from Pedestrian Encounter Scenes Using Deep Language Models. In Big Data Analytics and Knowledge Discovery: 23rd International Conference, DaWaK 2021, Virtual Event, September 27–30, 2021, Proceedings 23 (pp. 131-136). Springer International Publishing.
  4. Elahi, M., Tian, R., and Luo, X., 2022. Flexible and Scalable Annotation Tool to Develop Scene Understanding Datasets. Workshop on Human-in-the-Loop Data Analytics (HILDA 2022), ACM SIGMOD/PODS Conference, June 12-17, Philadelphia, PA.
  5. Elahi, M., Jing, T., Ding, Z., and Tian, R., MinDReaD: Mining Decision-Making Reasoning Data at Micro Level, International Journal of Human-Computer Interaction, (Under Revision).

Demo of Experiment Results

Benchmark Dataset

  • Pedestrian Situated Intent (PSI) Benchmark Dataset (http://situated-intent.net/pedestrian_dataset/)
  • 210 videos are randomly sampled from the naturalistic driving dataset
  • 75 subjects
    • Age ranges from 19 to 77
    • Personality and driving styles are recorded for all the subjects
    • Each subject completed 1.5 hours of training and 15 hours of video annotation experiment