Vermoshの探索日志
分类导航
分类:
Multimodal
多模态相关 Paper 笔记~
VLMo – Vision-Language Pre-trained Model
2025-10-11
ALBEF – Align Before Fuse
2025-10-09
ViLT – Vision-and-Language Transformer Without Convolution or Region Supervision
2025-10-08
HGNN – Hypergraph Neural Networks
2025-10-07
CLIP – Learning Transferable Visual Models From Natural Language Supervision
2025-10-06