One Vision-Language-Action Model for GUI Agent
Qinghong (Kevin) Lin
KevinQHLin
AI & ML interests
Vision-Language Model, Video Understanding, Human-AI Interaction
Recent Activity
authored
a paper
about 11 hours ago
FocusUI: Efficient UI Grounding via Position-Preserving Visual Token Selection
upvoted
a
paper
about 12 hours ago
FocusUI: Efficient UI Grounding via Position-Preserving Visual Token Selection
authored
a paper
1 day ago
ShowUI-$π$: Flow-based Generative Models as GUI Dexterous Hands