Abstract: We propose a novel method that combines the strengths of two popular class activation mapping techniques, GradCAM++ and ScoreCAM, to improve the interpretability and localization of ...
Visual Program Distillation: Distilling Tools and Programmatic Reasoning into Vision-Language Models
Abstract: Solving complex visual tasks such as “Who invented the musical instrument on the right?” involves a composition of skills: understanding space, recognizing instruments, and also retrieving ...
Some results have been hidden because they may be inaccessible to you
Show inaccessible results