Author: Hee-Seung Moon, Antti Oulasvirta, Byungjoo Lee
Conference/Journal: CHI
Year: 2023

Summary

a. Interface를 사용하는 user의 behavior를 simulate하고,
b. simulate한 데이터를 통해 학습시킨 density estimator를 통해 실제 user behavior만 보고 user-specific parameter/behavior를 추론할 수 있도록 하자!

A. Background

”Almost every HCI study ends up with Human Experiment”

Problem Definition ➡️ System/Interface Modeling ➡️ Human Validation(How the system affects human?)

Human validation is expensive and inevitable, yet hardly generalizable.

1. Human perception/control mechanism을 다 알지 못하니까
2. 사람마다 차이가 있으니까

So, goal of User Simulation/modeling is:

Interface(Hardware/Sofware)를 만들었을 때 User Validation 과정까지도 계산 가능하게 할 수는 없을까?

➡️이러한 문제(unknown mechanism)에서 취할 수 있는 접근법은 크게 두 가지

1. Black-box modeling

Data-driven method가 impactive
하지만 사람 데이터는 대규모로 모을 수가 없음

2. White-box modeling

이거를 하려면,

사람이 감각을 인지하거나(Perception), 동작하거나(Motor control), 두개를 동시에 하는 과정(Sensorimotor control)을 plausible하게 모델링 해야함.

➡️이미 꽤 많은 task에 대한 모델이 나와있음(not a concern)

+ 사람 간의 차이를 반영하는 Optimization을 위해서는, 특정 user-specific characteristic/parameter를 알 수 있어야 함

B. Key Idea

Q. “Specific user의 behavior data(trajectory, performance metric)를 보고 user-specific parameter를 어떻게 찾아내지?“

1. 결국 data-driven method가 powerful

2. 모든 variability를 span하는 사람 데이터를 모은다는 건 현실적으로 불가능함

3. user-specific parameter를 반영할 수 있는 User simulation model이 있다면 다양한 user-specific parameter를 반영한 synthetic data를 만들어낼 수 있음

$Model parameter (= θ) \to User Simulation \to Behavior (= x)$

$D_{synthetic} = {(θ^{(i)}, x^{(i)})}_{i = 1}^{N}$

4. 이를 simulation을 통한 synthetic data를 활용하는 아이디어를 제안

user-specific parameter를 반영할 수 있는 User simulation model이 있다면 합성데이터를 만들어내서 Data-driven으로 접근할 수 있음.
이렇게 학습한 모델이 실제 사람의 behavior에도 robust하다면 data scarcity 문제를 효과적으로 mitigate할 수 있음

C. Research Question

A. variability를 span하는 synthetic data를 simulation으로 만들어낼 수 있나?

➡️ 이미 사용가능한 모델들이 있음. (Biomechanically plausible model+ RL simulation)

B. 이러한 synthetic data를 통해 behavior data - model parameter 간의 mapping을 찾아낼 수 있나?

➡️ Normalizing Flow with Conditional INN

D. Model Overview

”Given user behavior data, find the most probable model parameter”

E. Core Methodology

User Behavior Encoder

관측한 user behavior 데이터(trajectory/performance metric …)을 fixed-size feature vector로 변환

Conditional INN with Normalizing Flow

Goal of network
- user-behavior 데이터 $y$ 가 주어졌을 때, plausible한 model parameter의 분포 $p (θ ∣ y)$ 를 구하는 것이 목적
- $p (θ ∣ y)$ 를 구한다는 것은 단순히 하나의 point를 추정하는 것을 넘어, 여러 개의 modal(극대값)과 신뢰도를 구한다는 뜻’
Mechanism
- 복잡한 $p (θ ∣ y)$ 의 지점이 어디인지를 바로 구하기는 어려우니까,
  
  ‘내가 아는 분포(Unit Normal)의 아는 point’를 변환하여
  
  ’알고자 하는 posterior 분포의 point’가 되도록 하는
  
  ”Normalizing Flow”를 neural net으로 학습시킴
- 그러면 수많은 point에 대해 학습된 Normalizing flow 기반으로 transformation하여 posterior 분포를 approximate할 수 있게 됨.
- approximate한 posterior에서 현재 관측한 Behavior일 때 가장 확률이 높은 model parameter modal(극대값)과 해당 값의 신뢰도를 계산할 수 있음

Training

Training Procedure

1. Model Parameter Sampling:

사전에 정의된 plausible model parameter space에서 model parameter $θ$ 를 sampling

2. User Simulation model:

sampling한 model parameter $θ$ 를 통해 behavior $\overset{y}{^}$ 를 simulate

3. Encoder Network:

behavior $\overset{y}{^}$ 를 encoder에 넣어 feature vector 추출

4. Conditional INN:

(a) Sampling한 Model parameter $θ \sim p (θ ∣ \overset{y}{^})$ 를 inverse process(Normalizing flow) $f^{- 1}$ 를 적용하여 latent variable $z \sim N (0, I)$ 로 보냄
(b) inverse process layer(Input/Hidden Layers)의 입력에 feature vector를 concat하여 condition 반영
(c) Loss 계산 후 weight $ϕ$ 업데이트
- (1) model parameter를 변환한 latent variable $z$ 가 0으로 가도록, (GT model parameter는 표준정규분포 상에서 가장 확률이 높은 곳, 평균에 위치해야 함)
- (2) 정규분포의 형태를 유지하도록 하는 Jacobian term(모든 지점이 0으로 떨어져버리면 최종 latent variable 분포가 Unit normal이 아닐테니까)

(1,2)의 과정은 예측한 posterior 분포와 실제 posterior(sample point가 속한) 간의 KL Divergence를 최소화 하는 것과 수학적으로 동일

Inference

1. Latent space의 Unit Normal Distribution으로부터 point들을 sampling( $n = 1000$ )

2. 각 Latent point들에 대해 forward transformation 수행

3. transformation layer 입력에 behavior feature vector concat하여 condition반영

4. 변환된 model parameter space 상의 point들로 posterior distribution 형성

F. Validation

Task: 3D Target Selection task(Moon2024)
User Modeling & Simulation
Inter/Intra-user Variability
Amortized Inference of user-specific parameter/behavior
Synthetic data가 Real-world 에 적용해도 될 정도로 Plausible 한가?
Synthetic data로 학습한 모델이 기존 data-driven/rule-based method만큼/보다 좋은가?
- 1. Data Aquire Cost(↔️ Human-data-based)
  - Synthetic data(Pretraining 2시간)로 사람 데이터(16명, 1020 target selection trials, 약 1시간소요 /per user)로 학습한 거 만큼의 성능을 냈다!
  - Q. 결국 사람 데이터로 학습하는게 더 성능이 좋은거 아니야?
  - A. 사람 데이터로 학습한 모델은 competent가 아닌 upper bound로 설정하는게 적절
    - Synthetic data는 사람의 behavior 기작을 “근사”한 모델로부터 나옴
    - 따라서 사람 데이터 모델은 In-distribution (학습 데이터와 테스트 데이터가 같은 분포) 학습이고, 시뮬레이션 모델은 Out-of-distribution (가상 분포에서 학습하여 현실 분포에 적용/Sim-to-Real) 학습
    - 시뮬레이션 모델이 인간 모델을 ‘이기기 위해서’가 아니라, **‘데이터 없이(Zero-shot) 얼마나 인간 모델에 근접했는가(Fidelity)‘**를 보여주는 것
    - 성능 차이의 미미함: Study 3의 결과에 따르면, 인간 데이터 모델과 시뮬레이션 모델을 사용했을 때 사용자의 타겟 선택 속도나 에러율에는 통계적으로 유의미한 차이가 없었음
  - Q. 16명 데이터만으로도 unseen 4명에 대한 성능이 이 정도면 amortized inference 안써도 되는거 아니야?
  - A. 근데 Meta 에서 16명으로 실험한 데이터로 target selection 추론한 데이터로 CD gain 최적화 패치를 deploy할 수 있을까?
    - 16명에 대한 실험 수행(1020 target selection trials, 약 1시간소요 /per user) v.s. 2시간 pretraining ➡️ 비슷한 성능이면 당연히 후자
    - 만약 Meta가 엄청난 돈과 시간을 써서 populational scale 데이터를 모았다 하더라도,
      - 만약 실험했던 것과 target 사이즈가 달라지면? 다시 테스팅해야함
      - 만약 실험했던 것과 다른 피드백(UI, visual effect, gain)으로 바뀌면? 다시 테스팅해야함
      - 만약 실험했던 것과 다른 task에 대해서도 하려면? 다시 테스팅해야함
    - Overfitting(not large-scale data)
      - 실험 participants(대학생)와 다른 나이(신경처리속도, 감각민감도, …), 팔 길이, 환자 등에 대해서 하게 되면?
      - 인간 운동 기작의 수많은 parameter space를 다 span하는 데이터를 모으려면 N=?
      - 나이가 들수록 motor capability of finger가 낫다는 밝혀진 현상을 설명하는 결과
- 2. Scalability(↔️ Human-data-based, Proximity-based)
  - 새로운 task/환경에서도 Simulation model만 있으면 적용 가능
- 3. Confidence(↔️ Proximity-based)
  - Deterministic하게 추론하는 것이 아니라 추론값의 Confidence를 함께 내놓기 때문에, Confidence가 높은 상황에서만 Optimization을 수행/개입할 수 있음

G. Contribution&Limitation

Synthetic data의 domain에서 학습된 모델이 real-human observation에서도 robust하게 동작한다는 건 새로운 paradigm을 뚫는 의의를 가짐
between-user variability는 분야를 막론하고 missing link이고 그걸 추론할 수 있다는 건 의미가 큼(e.g. 새로운 interface를 사람의 특성에 맞게 optimization하는 문제 imitation learning에서 expert demostration이 사람마다 다 달라서 생기는 문제, human noise와 core trajectory를 분리하는 문제, generated image에 대한 선호도 기반 최적화 문제)
너무 큰 문제, 그리고 밝혀지지 않은 사람 내부의 기작을 모델링하려하기 때문에
- 1. 여전히 User modeling/simulation과정에서 많은 assumption이 있고
  
  ➡️ User simulation의 복잡한 mechanism, 특히 peceptual one에 대한 detailed/realistic modeling의 필요성(e.g. 감각 정보를 어떻게 처리하여 motor control에 필요한 정보로 가공하나?)
- 2. task가 달라지면 simulation 환경구축과 학습을 다시 해야하는 문제
Application-level에서 이 방법론이 얼마나 효과적인지(scalable)더 보여줘야할 필요성
Note
- 확률 적분 변환(Probability Integral Transform)‘에 따르면, 어떤 연속적인 확률 변수라도 그 누적 분포 함수(CDF)를 통해 균등 분포(Uniform Distribution)로 변환할 수 있고, 이를 다시 역변환하여 정규 분포(Normal Distribution)로 만들 수 있음
- 기존에도 model parameter inference 방법론은 있었지만(ABC) infernece당 hours to days(e.g. Point and click은 30시간)

Quote

“If you can’t explain it to a six-year-old, you don’t understand it yourself.” - Richard Feynman

LAKESTONE

Explorer

(Moon2023)Amortized Inference with User Simulations

Summary

a. Interface를 사용하는 user의 behavior를 simulate하고,

b. simulate한 데이터를 통해 학습시킨 density estimator를 통해 실제 user behavior만 보고 user-specific parameter/behavior를 추론할 수 있도록 하자!

A. Background

”Almost every HCI study ends up with Human Experiment”

Problem Definition ➡️ System/Interface Modeling ➡️ Human Validation(How the system affects human?)

Human validation is expensive and inevitable, yet hardly generalizable.

1. Human perception/control mechanism을 다 알지 못하니까

2. 사람마다 차이가 있으니까

So, goal of User Simulation/modeling is:

Interface(Hardware/Sofware)를 만들었을 때 User Validation 과정까지도 계산 가능하게 할 수는 없을까?

➡️이러한 문제(unknown mechanism)에서 취할 수 있는 접근법은 크게 두 가지

1. Black-box modeling

Data-driven method가 impactive

하지만 사람 데이터는 대규모로 모을 수가 없음

2. White-box modeling

이거를 하려면,

사람이 감각을 인지하거나(Perception), 동작하거나(Motor control), 두개를 동시에 하는 과정(Sensorimotor control)을 plausible하게 모델링 해야함.

➡️이미 꽤 많은 task에 대한 모델이 나와있음(not a concern)

+ 사람 간의 차이를 반영하는 Optimization을 위해서는, 특정 user-specific characteristic/parameter를 알 수 있어야 함

B. Key Idea

Q. “Specific user의 behavior data(trajectory, performance metric)를 보고 user-specific parameter를 어떻게 찾아내지?“

1. 결국 data-driven method가 powerful

2. 모든 variability를 span하는 사람 데이터를 모은다는 건 현실적으로 불가능함

3. user-specific parameter를 반영할 수 있는 User simulation model이 있다면 다양한 user-specific parameter를 반영한 synthetic data를 만들어낼 수 있음

Model parameter(=θ)→User Simulation→Behavior(=x)

Dsynthetic​={(θ(i),x(i))}i=1N​

4. 이를 simulation을 통한 synthetic data를 활용하는 아이디어를 제안

user-specific parameter를 반영할 수 있는 User simulation model이 있다면 합성데이터를 만들어내서 Data-driven으로 접근할 수 있음.

이렇게 학습한 모델이 실제 사람의 behavior에도 robust하다면 data scarcity 문제를 효과적으로 mitigate할 수 있음

C. Research Question

A. variability를 span하는 synthetic data를 simulation으로 만들어낼 수 있나?

➡️ 이미 사용가능한 모델들이 있음. (Biomechanically plausible model+ RL simulation)

B. 이러한 synthetic data를 통해 behavior data - model parameter 간의 mapping을 찾아낼 수 있나?

➡️ Normalizing Flow with Conditional INN

D. Model Overview

”Given user behavior data, find the most probable model parameter”

E. Core Methodology

User Behavior Encoder

관측한 user behavior 데이터(trajectory/performance metric …)을 fixed-size feature vector로 변환

Conditional INN with Normalizing Flow

Goal of network

user-behavior 데이터 y 가 주어졌을 때, plausible한 model parameter의 분포 p(θ∣y)를 구하는 것이 목적

p(θ∣y)를 구한다는 것은 단순히 하나의 point를 추정하는 것을 넘어, 여러 개의 modal(극대값)과 신뢰도를 구한다는 뜻’

Mechanism

복잡한 p(θ∣y) 의 지점이 어디인지를 바로 구하기는 어려우니까,

‘내가 아는 분포(Unit Normal)의 아는 point’를 변환하여

’알고자 하는 posterior 분포의 point’가 되도록 하는

”Normalizing Flow”를 neural net으로 학습시킴

그러면 수많은 point에 대해 학습된 Normalizing flow 기반으로 transformation하여 posterior 분포를 approximate할 수 있게 됨.

approximate한 posterior에서 현재 관측한 Behavior일 때 가장 확률이 높은 model parameter modal(극대값)과 해당 값의 신뢰도를 계산할 수 있음

Training

Training Procedure

1. Model Parameter Sampling:

사전에 정의된 plausible model parameter space에서 model parameter θ를 sampling

2. User Simulation model:

sampling한 model parameter θ를 통해 behavior y^​ 를 simulate

3. Encoder Network:

behavior y^​를 encoder에 넣어 feature vector 추출

4. Conditional INN:

(a) Sampling한 Model parameter θ∼p(θ∣y^​) 를 inverse process(Normalizing flow) f−1를 적용하여 latent variable z∼N(0,I) 로 보냄

(b) inverse process layer(Input/Hidden Layers)의 입력에 feature vector를 concat하여 condition 반영

(c) Loss 계산 후 weight ϕ 업데이트

(1) model parameter를 변환한 latent variable z 가 0으로 가도록, (GT model parameter는 표준정규분포 상에서 가장 확률이 높은 곳, 평균에 위치해야 함)

(2) 정규분포의 형태를 유지하도록 하는 Jacobian term(모든 지점이 0으로 떨어져버리면 최종 latent variable 분포가 Unit normal이 아닐테니까)

(1,2)의 과정은 예측한 posterior 분포와 실제 posterior(sample point가 속한) 간의 KL Divergence를 최소화 하는 것과 수학적으로 동일

Inference

1. Latent space의 Unit Normal Distribution으로부터 point들을 sampling(n=1000)

2. 각 Latent point들에 대해 forward transformation 수행

3. transformation layer 입력에 behavior feature vector concat하여 condition반영

4. 변환된 model parameter space 상의 point들로 posterior distribution 형성

F. Validation

Task: 3D Target Selection task(Moon2024)

User Modeling & Simulation

Inter/Intra-user Variability

Amortized Inference of user-specific parameter/behavior

Synthetic data가 Real-world 에 적용해도 될 정도로 Plausible 한가?

$Model parameter (= θ) \to User Simulation \to Behavior (= x)$

$D_{synthetic} = {(θ^{(i)}, x^{(i)})}_{i = 1}^{N}$

user-behavior 데이터 $y$ 가 주어졌을 때, plausible한 model parameter의 분포 $p (θ ∣ y)$ 를 구하는 것이 목적

$p (θ ∣ y)$ 를 구한다는 것은 단순히 하나의 point를 추정하는 것을 넘어, 여러 개의 modal(극대값)과 신뢰도를 구한다는 뜻’

복잡한 $p (θ ∣ y)$ 의 지점이 어디인지를 바로 구하기는 어려우니까,

사전에 정의된 plausible model parameter space에서 model parameter $θ$ 를 sampling

sampling한 model parameter $θ$ 를 통해 behavior $\overset{y}{^}$ 를 simulate

behavior $\overset{y}{^}$ 를 encoder에 넣어 feature vector 추출

(a) Sampling한 Model parameter $θ \sim p (θ ∣ \overset{y}{^})$ 를 inverse process(Normalizing flow) $f^{- 1}$ 를 적용하여 latent variable $z \sim N (0, I)$ 로 보냄

(c) Loss 계산 후 weight $ϕ$ 업데이트

(1) model parameter를 변환한 latent variable $z$ 가 0으로 가도록, (GT model parameter는 표준정규분포 상에서 가장 확률이 높은 곳, 평균에 위치해야 함)

1. Latent space의 Unit Normal Distribution으로부터 point들을 sampling( $n = 1000$ )

시뮬레이션 모델이 인간 모델을 ‘이기기 위해서’가 아니라, ‘데이터 없이(Zero-shot) 얼마나 인간 모델에 근접했는가(Fidelity)‘를 보여주는 것

G. Contribution&Limitation