Stream: Machine Learning for Theorem Proving

Topic: Internet Explorer: Targeted Repr Learning on the Open Web

Junyan Xu (Mar 04 2023 at 02:07):

we propose dynamically utilizing the Internet to quickly train a small-scale model that does extremely well on the task at hand. Our approach, called Internet Explorer, explores the web in a self-supervised manner to progressively find relevant examples that improve performance on a desired target dataset. It cycles between searching for images on the Internet with text queries, self-supervised training on downloaded images, determining which images were useful, and prioritizing what to search for next.


Anyone to replace image search by literature search, vision model by language (or multimodal?) model, using mathlib as a starting point?

