Zulip Chat Archive

Stream: general

Topic: user_attribute, mk_cache


Scott Morrison (Mar 17 2019 at 10:12):

@Simon Hudon, is there any chance you could quickly explain how user_attribute X Y works?

I'm struggling to work out what's going on in attribute caching. In particular, in cache_cfg, the mk_cache field is meant to be a list name -> tactic X. When is this function invoked? What is the list of names passed to it?

Scott Morrison (Mar 17 2019 at 10:13):

It seems that the type X is some global cache object associated with the attribute, and the type Y is a "decorator" on every name that has been tagged with the attribute, that you can recover using attr.get_param name, which is a tactic Y.

Scott Morrison (Mar 17 2019 at 10:14):

But how do you interact with the X?

Scott Morrison (Mar 17 2019 at 10:33):

... Okay, I think I understand.

Scott Morrison (Mar 17 2019 at 10:34):

the function in mk_cache is called some subset of the times that attr.get_cache is called.

Scott Morrison (Mar 17 2019 at 10:35):

Certainly if attr.get_cache hasn't been called before, mk_cache runs. If any declarations have had the attribute added since the last call to attr.get_cache, mk_cache runs.

Scott Morrison (Mar 17 2019 at 10:35):

When nothing has changed since the last call to attr.get_cache, mk_cache may or may not run...! I'm guessing this nondeterministic behaviour might be a race condition.

Scott Morrison (Mar 17 2019 at 10:39):

This seems.... not very useful. :-) In particular, when you add the attribute to a new declaration, there's no way to update the old value of the cache, because mk_cache ... oh, can mk_cache call get_cache? But still, it seems there's no way to know which declarations have newly received the attribute since the last call to mk_cache, as mk_cache is just passed the entire list of names, so you're still having to look at all the names that carry the attribute.

Scott Morrison (Mar 17 2019 at 10:40):

and in any case, no, mk_cache cannot call get_cache (unless your goal is SIGBUS).

Simon Hudon (Mar 17 2019 at 13:12):

Interesting. I didn't know the last bit.

Simon Hudon (Mar 17 2019 at 13:15):

I haven't confirmed it but I believe this works on cache invalidation logic. Every time you add the attribute, you invalidate the cache. Then, when you call get_cache, you rebuild the cache. The idea is that you can add the attribute on a few lemmas and if you don't query the cache in between, it should fairly fast. After that, you can use the cache many time. The first time is going to be the most expensive but after that, every invocation will be very fast.

Simon Hudon (Mar 17 2019 at 13:16):

And I suspect that not building the cache incrementally is done to make the cache more reliable: mk_cache expresses a simple relation between the set of declaration that has the attribute and what the cache should be. If you make it incremental it's much harder to use

Simon Hudon (Mar 17 2019 at 13:17):

And very easy to get it wrong

Simon Hudon (Mar 17 2019 at 13:20):

The slowest thing you can do from that perspective (which I do not think is any slower than not having a cache) is have a long sequence where you set an attribute, then query the cache, then set an attribute and so on. If you want to make those faster, you can do as follows:

lemma foo1 ...
lemma foo2 ...
lemma foo3 ...
lemma foo4 ...
lemma foo5 ...

attribute [my_attribute] foo1 foo2 foo3 foo4 foo5

I don't think it will be much of a gain unless you have a lot of lemmas that don't depend on each other through your tactic.

Koundinya Vajjha (Mar 17 2019 at 19:51):

I would wager to guess that mk_cache is run 1) as soon as the tagged declaration is parsed 2) when you call get_cache and it can't find a cache 3) when you call get_cache and more declarations have been marked with the user attribute since the last time the cache was computed.

Koundinya Vajjha (Mar 17 2019 at 20:09):

I was more interested in the following scenario: If I have two declarations in two files which do not import each other, both marked with the user attribute, is there any way that running get_cache in one file will also include the cached declaration in the other file?

Koundinya Vajjha (Mar 17 2019 at 20:09):

Or do I have to have a master file which imports both files? And the cache has to be recomputed in that file?

Scott Morrison (Mar 17 2019 at 20:57):

Or do I have to have a master file which imports both files? And the cache has to be recomputed in that file?

I'm pretty certain you need a master file, and the cache will be recomputed there. Anything else would require global mutable state in the environment, and that seems unlikely.

Scott Morrison (Mar 17 2019 at 20:58):

The fact that there is a race condition is and the computation of the cache is nondeterministic it pretty easy to verify. Run the following in VSCode, and read the trace messages:

-- open native
open tactic

@[user_attribute]
meta def example_attribute : user_attribute unit unit := {
  name := `example,
  descr := "An example attribute",
  cache_cfg :=
   { mk_cache := λ ls, (do trace format!"names: {ls}"),
     dependencies := [] },
  after_set := some $ λ name priority persistent, (do trace format!"set {name}"),
  before_unset := some $ λ name persistent, (do trace format!"unset {name}")
}

def f := 3
def g := 4

section
local attribute [example] f
local attribute [-example] f
local attribute [example] f
end -- Notice no "unset" at the end of the section?

section
run_cmd success_if_fail (example_attribute.get_param `f >>= trace)

local attribute [example] f
run_cmd example_attribute.get_param `f >>= trace
end

set_option trace.user_attributes_cache true
attribute [example] f

run_cmd example_attribute.get_cache >>= trace

-- it seems non-deterministic whether this next line finds a cached value
run_cmd example_attribute.get_cache >>= trace

attribute [example] g

run_cmd example_attribute.get_cache >>= trace
run_cmd example_attribute.get_cache >>= trace

attribute [example] g

run_cmd example_attribute.get_cache >>= trace
run_cmd example_attribute.get_cache >>= trace

Scott Morrison (Mar 17 2019 at 21:04):

In any case... The reason I'm currently interested in this is my library_search tactic, which requires looking over ever declaration in the library. It actually seems to be (incredibly?) fast already, but I was hoping to see if I could use attribute caching to make it even faster. My idea had been to make an auxiliary attribute "has_been_indexed", or something, and to progressively update the map of "lemmas arranged by head symbols" I need every time someone runs library_search. But for this is help, we'd need to be able to reuse previous map, and just add to it the all the lemmas which have been added to the environment since the last call to library_search. That isn't possible, it seems. Given that I'd expect there to be a new declaration roughly every time library_search is called, the cache wouldn't be hit often: only ever for multiple invocations of library_search within the same proof block.

Scott Morrison (Mar 17 2019 at 21:04):

In any case, this isn't a roadblock. It looks like library_search may be fast enough to be usable.

Keeley Hoek (Mar 19 2019 at 09:54):

Hi @Scott Morrison, essentially just because get_options and set_options are tactic_state local, I have a way to have the lemma cache be generated only the first time library_search is used in particular a tactic block.

Keeley Hoek (Mar 20 2019 at 07:01):

Here's a working demo: https://github.com/khoek/lean-local-cache

Johan Commelin (Mar 20 2019 at 07:13):

Does that require modifying core?

Keeley Hoek (Mar 20 2019 at 07:54):

No, all above board

Johan Commelin (Mar 20 2019 at 08:04):

Ok, cool!

Scott Morrison (Mar 20 2019 at 08:31):

And in a second begin end block of the same file, it is regenerated again, right?

Keeley Hoek (Mar 20 2019 at 08:35):

Yes

Keeley Hoek (Mar 20 2019 at 08:36):

https://github.com/leanprover-community/mathlib/pull/837
I'm rocking some tests


Last updated: Dec 20 2023 at 11:08 UTC