Zulip Chat Archive

Stream: lean4 dev

Topic: Weak symbols


Tom (Sep 28 2022 at 23:46):

Hi @Leonardo de Moura at al.

would you mind posting a few more details about the weak symbol project? I'd like to start looking at it and having a few more details would help me.
Thanks,

Tom

Leonardo de Moura (Sep 28 2022 at 23:54):

Hi Tom,

Thanks for asking. I see the following steps:

  • Find out how to "tell" clang and gcc that we want to use a weak symbol for a function name. @Mario Carneiro posted a few links about it, but it would be great to have a solution that works for both clang and gcc in the platforms we care about.
  • Find out which linker options (if any) we need to use to enable weak symbols.
  • Create small experiment to ensure the approach works. Examples: a small program with two different .c files that define the same weak symbol, but the final executable contains only one copy of the function.
  • Write a function in Lean that computes a strong hash for Expr. Recall that each code specialization in Lean is uniquely identified by a Expr that does not contain free variables. We will use the hashcode to name the new function produced by our code specializer.

Best,
Leo

Mario Carneiro (Sep 28 2022 at 23:54):

The task is to confirm the concrete syntax for enabling weak symbols on the compilers / architectures we target, and make sure that it has the desired effect, such that we can feel safe emitting it in the generated C code.

Tom (Sep 29 2022 at 02:38):

@Leonardo de Moura and @Mario Carneiro

Find out how to "tell" clang and gcc that we want to use a weak symbol for a function name.

That's pretty easy! Can you just tell me whether we compile the generated code as C or as C++? My understanding is we emit "C"-like code but C can often be compiled using a c++ compiler and it seems a lot of parts of Lean (including things like the runtime are actually c++).
Depending on which one, I may pick a different approach.

Find out which linker options (if any) we need to use to enable weak symbols

No linker options required on most "Posix" OSs (Linux/Solaris etc) and up-to-date gcc/clang linkers. I would have to double-check if something is needed for MSVC but I'm not even sure if we're targeting it right now...?

Create small experiment to ensure the approach works

Sure, I can do that

Write a function in Lean that computes a strong hash for Expr

Are you thinking of something like a recursive sha256 digest of a LCNF function? That sounds good but this one will probably take me a bit longer than the rest, as I'm not sure if the lean library actually has a sha256 (or similar) hash?

However, I do want to bring up a few possible issue with the last suggestion:

1) Since the hash is very strong and depends on the entire expression, is there a chance we would end up with two different versions of the same function in two different modules if they, for example, had different compilation options/optimizations enabled?

2) This approach will not work for SOs/DLLs; or in general, will break for any form of separate compilation.
I'm sure you're familiar with it but just for completeness: Imagine that a lean program says "I want to import a compiled function from an SO"; the whole point is that you don't know the implementation, you only know the signature but without the implementation, you won't be able to know which symbol to bind to. Typically, one doesn't look for weak symbols in DLLs but I can see it happening for specializations of polymorphic functions.

3) This is related to 2) but a slightly different approach: Most modern Unix systems support things like "LD_PRELOAD". This allows the OS to load another library before loading the main executable image. Since the dynamic linker works "left to right", it will bind to the first symbol it finds. This allows easy experimentation and "injecting" of e.g. testing or other analysis code into the binary.
Libraries can use this for things like replacing malloc/free with faster/tracing versions; and e.g. clang/gcc use this mechanism for things like the address, memory and thread sanitizer frameworks.
The whole point of this mechanism is to replace a function with a different implementation; so clearly the SHA would be different.

I don't know if these usecases are important to you but I would like you to at least tell me a bit more about your view on the above. However, my understanding is that the Lean compiler already has the ability to import compiled Lean code as a shared object and it this could lead to some breakage.

As such, perhaps we should consider two classes of functions:

"weak", which should be deduplicated (and in which case, their name should be just the SHA, not a "name + SHA + something else, because you are really looking for "content addressability"/functional identity

"strong"/exported symbols, whose names could perhaps be some mangling of the name, the input and return types?

Anyway, we can discuss this more; once you let me know of the the compilation and whether we want to support MSVC, I'll do the first part quickly.

Thanks,

Tom

Mario Carneiro (Sep 29 2022 at 02:39):

That's pretty easy! Can you just tell me whether we compile the generated code as C or as C++? My understanding is we emit "C"-like code but C can often be compiled using a c++ compiler and it seems a lot of parts of Lean (including things like the runtime are actually c++).
Depending on which one, I may pick a different approach.

Both. The generated code is usually treated as C, but it has ifdefs in it so it can be compiled in C++ mode as well

Mario Carneiro (Sep 29 2022 at 02:40):

It might help to take a look at the generated .c files to get a sense of how it works

Mario Carneiro (Sep 29 2022 at 02:40):

https://github.com/leanprover/lean4/blob/master/stage0/stdlib/Lean/Parser/Basic.c

Mario Carneiro (Sep 29 2022 at 02:42):

the lean.h header is probably where we would set up some macros for the weak attribute that we can apply to generated functions

Mario Carneiro (Sep 29 2022 at 02:45):

1) Since the hash is very strong and depends on the entire expression, is there a chance we would end up with two different versions of the same function in two different modules if they, for example, had different compilation options/optimizations enabled?

Two different versions of the same function is okay. We would like to avoid it but it's fine if this happens sometimes, and we should not and cannot rely on deduplication for soundness.

Two versions of the function with different compilation options sounds like a nightmare. I don't think this is a supported configuration.

Mario Carneiro (Sep 29 2022 at 02:46):

A specific example of an optimization that would do bad things if we applied it inconsistently is removing unused arguments from a function. If we have already produced .o files for the upstream files then this would probably not be caught at compile time and we would just get UB

Mario Carneiro (Sep 29 2022 at 02:48):

2) This approach will not work for SOs/DLLs; or in general, will break for any form of separate compilation.
I'm sure you're familiar with it but just for completeness: Imagine that a lean program says "I want to import a compiled function from an SO"; the whole point is that you don't know the implementation, you only know the signature but without the implementation, you won't be able to know which symbol to bind to. Typically, one doesn't look for weak symbols in DLLs but I can see it happening for specializations of polymorphic functions.

I don't think this is an issue. We would only do this for specializations, and this requires that the body of the function be known to lean anyway.

Mario Carneiro (Sep 29 2022 at 02:50):

If you use LD_PRELOAD, any and all breakage is on you. You can do arbitrary damage to a library using this mechanism and it should never have existed

Mario Carneiro (Sep 29 2022 at 02:52):

"weak", which should be deduplicated (and in which case, their name should be just the SHA, not a "name + SHA + something else, because you are really looking for "content addressability"/functional identity

One reason to use the function's name in the mangling is so that it's not completely opaque to people looking at it in a debugger. I am fairly sure this is also why rust uses name + SHA in their mangling scheme

Mario Carneiro (Sep 29 2022 at 02:55):

I don't think we care that much about providing pure content addressability. It's fine if definition names are mixed in the SHA, since we're just trying to avoid @List.map Nat from being generated twice

Tom (Sep 29 2022 at 04:45):

Two versions of the function with different compilation options sounds like a nightmare.

This happens in very large projects when you need to e.g. compile some parts of your code in debug and others in optimized. I realize that Lean doesn't have a debugger right now but probably not worth precluding that. Anyway, the functions would have different hashes so it should be fine.

We would only do this for specializations

But presumably specializations could be generated inside of SOs... right?

If you use LD_PRELOAD, any and all breakage is on you.

That's understood. My point is that this mechanism will basically preclude it. And as I described above, there are some valid uses for it (e.g. developer/debug tools)

One reason to use the function's name in the mangling is so that it's not completely opaque to people

I expected that to be the case but wanted to mention it anyway. There are some possible advantages to this but probably not worth it without better tooling support.

I don't think we care that much about providing pure content addressability.

There are some possible nice consequences to this design choice but I agree it's unlikely to be the right fit for Lean atm.

Thanks!

Mario Carneiro (Sep 29 2022 at 04:48):

This happens in very large projects when you need to e.g. compile some parts of your code in debug and others in optimized. I realize that Lean doesn't have a debugger right now but probably not worth precluding that. Anyway, the functions would have different hashes so it should be fine.

Oh I'm aware some people do this, but most compilers explicitly disclaim responsibility for what happens if you do.

Mario Carneiro (Sep 29 2022 at 04:52):

If you use LD_PRELOAD, any and all breakage is on you.

That's understood. My point is that this mechanism will basically preclude it. And as I described above, there are some valid uses for it (e.g. developer/debug tools)

I don't see how. At the end of the day you will get a pile of functions, and if some have been deduplicated and you LD_PRELOAD replace it you will also be replacing all the deduplicated copies. If that's not what you wanted, too bad. This isn't supported.

Mario Carneiro (Sep 29 2022 at 04:53):

Treating that kind of debugger-like interaction with the code as observable behavior would basically make all compiler optimizations invalid

Mario Carneiro (Sep 29 2022 at 04:56):

We would only do this for specializations

But presumably specializations could be generated inside of SOs... right?

That's one of the examples which explains why we can't guarantee deduplication even if we wanted to. If a specialization has been compiled into two independent dynamically linked libraries, there is nothing we can do to make them have the same address anymore. Same thing if you load an SO into a static executable.

Tom (Sep 29 2022 at 05:26):

Oh I'm aware some people do this, but most compilers explicitly disclaim responsibility for what happens if you do.

Do you have any source/link for these disclaimers for any of the main C/C++ toolchains (gcc/clang)? I'd like to read up on it.
I'm aware that MSVC has potentially multiple incompatible runtimes but that's less of a compiler issue, per se.
I understand care has to be taken when mixing such .os but have never seen any docs explicitly prohibiting/disclaiming this.
If this were the case, even loading debug SOs into e.g. a production Lean executable (or, say, into a Python executable built with optimizations on) would cause issues.

I don't see how. At the end of the day you will get a pile of functions

I still feel like we're still not understanding each other/I'm not explaining myself properly.
It's not that important right now so perhaps we can discuss this a bit in person on the call on Friday.

That's one of the examples which explains why we can't guarantee deduplication even if we wanted to.

A couple of thoughts:

I am sure you are aware of this so I mention it just for clarity - the "deduplication" still happens. The (non-hidden) symbols will resolve to the same address: That's why globals/statics/function statics will have the same address irrespective of which dynamic module you're in. What doesn't happen is that a duplicate function will still exists in (potentially several) binaries but that's less of an issue; the code will stay cold and likely just paged out by the OS.

But I'm not worried about deduplication in the SO case - I am worried about how to find the symbol in the first place. I don't know/think this is currently possible in Lean but consider the following scenario:

1) In my SO, have a polymorphic function foo
2) The function gets monomophized and then specialized for Nat during the build of the SO (and as such, will be weak and will have a hash in its name)

In an ideal scenario, in order to use the shared library, I would just need the equivalent "declaration" for foo (i.e. without its source) in my executable.

1) I build my executable against this interface (perhaps a module/interface file that Leo alluded to before?)
2) I call foo 3. But because I don't have the source, I would not be able to find the symbol because I won't be able to generate the hash.

But also, forget about Lean. It should, ostensibly, be possible to write a SO in lean and then perhaps call it from another language altogether. Unless we consider the external interface, a change in implementation of the function would break the external API/ABI.

Again, maybe we don't care. Or, maybe we don't care right now. Still, sounds like we are effectively discussing at least an informal ABI and so I want to bring these things up.

Hope that makes sense.

Cheers!

Tom

Mario Carneiro (Sep 29 2022 at 06:35):

If this were the case, even loading debug SOs into e.g. a production Lean executable (or, say, into a Python executable built with optimizations on) would cause issues.

Most linux toolchains don't explicitly say that these kinds of things are not allowed, but they underspecify what will happen and generally don't put too much emphasis on correctness in the face of such things. In particular, it is a false equivalence to say that just because the behavior is undefined there must be issues. It's quite possible for there not to be issues in the majority of plausible use cases while still not being something the compiler has explicit support for.

Mario Carneiro (Sep 29 2022 at 06:45):

I am sure you are aware of this so I mention it just for clarity - the "deduplication" still happens. The (non-hidden) symbols will resolve to the same address: That's why globals/statics/function statics will have the same address irrespective of which dynamic module you're in. What doesn't happen is that a duplicate function will still exists in (potentially several) binaries but that's less of an issue; the code will stay cold and likely just paged out by the OS.

This is only true if the call to the function is exposed to the linker. If the compiler inlines the function and/or does not give them external linkage then the linker may not be able to find the relocations in order to make them point to the same place.

Mario Carneiro (Sep 29 2022 at 06:49):

1) In my SO, have a polymorphic function foo

This is not possible. You would need a custom SO format in order to encode these kinds of properties, and lean already has that - it's the olean file. Loading SOs of unknown provenance is only ever going to work with C FFI style linkage, it will not participate in specialization or other lean-level optimizations.

Mario Carneiro (Sep 29 2022 at 06:51):

But also, forget about Lean. It should, ostensibly, be possible to write a SO in lean and then perhaps call it from another language altogether. Unless we consider the external interface, a change in implementation of the function would break the external API/ABI.

Again, maybe we don't care. Or, maybe we don't care right now. Still, sounds like we are effectively discussing at least an informal ABI and so I want to bring these things up.

A stable ABI for lean suitable for FFI sounds way out of scope. The lingua franca of FFI is C, and C doesn't have template functions or specialization.

Sebastian Ullrich (Sep 29 2022 at 07:31):

So has there been any discussion about using the linker's Identical Code Folding instead...? Since any of the issues discussed above would not even come up with that approach.

Leonardo de Moura (Sep 29 2022 at 13:28):

@Sebastian Ullrich It would be great to evaluate the linker's Identical Code Folding: robustness, performance, etc too. BTW, I was expecting the "weak symbol" solution to be simple. If it is not too complicated, I prefer to have our own solution that we fully control.

Sebastian Ullrich (Sep 29 2022 at 14:12):

We should definitely evaluate it first, though for weak symbols we are also dependent on the linker behavior and possibly on the platform behavior as mentioned above. For example, the text I shared in the other thread seemed to imply that weak symbols may not actually lead to deduplication, though I don't know if that is true.

Sebastian Ullrich (Sep 29 2022 at 14:15):

A slight upside of ICF would be that we can continue to use nice symbol names without hashes, and in fact make them even nicer than right now since every specialization site would use its own symbol name without reuse (or at least reuse limited to the current file). If I understand it correctly, the linker will preserve these symbol names and simply have them point at the same code location.

Sebastian Ullrich (Sep 29 2022 at 14:15):

A good thing in any case is that at least for users we can assume that all Lean code will be linked by the same linker, so if an approach looks good with current lld, it seems improbable that that will change for the worse in the future.

Sebastian Ullrich (Sep 29 2022 at 14:18):

Also interesting part from the mold readme, which is the only other linker I can see us potentially switching to in the future:

Similarly, BFD, gold an lld support Identical Comdat Folding (ICF) as yet another size optimization. ICF merges two or more read-only sections that happen to have the same contents and relocations. To do that, we have to find isomorphic subgraphs from larger graphs. I implemented a new algorithm for mold, which is 5x faster than lld to do ICF for Chromium (from 5 seconds to 1 second).

Sebastian Ullrich (Sep 29 2022 at 14:20):

For reference, for Chromium we are talking about 3GB of linker inputs

Tom (Sep 29 2022 at 16:48):

@Mario Carneiro

I think we can continue some of this discussion tomorrow, it feels like the text bandwidth is a bit low for such an in-depth discussion.

@Sebastian Ullrich

linker's Identical Code Folding

That's a good idea! One thought though - if we fold two different specializations which happen to generate the same code, could this cause any issues with anything to do with "addresses"? I think lean has an *Ref type, and also allows C/C++ FFI, so I wonder if that could come into play. (I've not had a chance to read the FFI section of the docs yet, so please forgive my ignorance in this area)

Sebastian Ullrich (Sep 29 2022 at 16:53):

I believe this is the difference between --icf=safe and --icf=all. But I can't think of any reason why we should guarantee that distinct symbols cannot be observed with the same address.

Tom (Sep 29 2022 at 16:54):

A slight upside of ICF would be that we can continue to use nice symbol names without hashes

I actually like the hashing approach because it would guard us against (certain) types of ABI breaks and compatibility issues (i.e. what C++ defines at the ODR). Violations of the ODR are really difficult to catch (and debug) and just lead to undefined behavior. My discussion with @Mario Carneiro is mostly about trying to nail down more of the requirements and get a better understanding of the team's thinking and a potential roadmap in this area.

Also, ICF will only apply when compiling an executable, so we still need a weak symbol story for when linking against SOs.

Finally, if we're concerned about size/linker performance, it's worth considering split dwarf. For large programs, not constantly loading the massive debug symbol tables from disk cuts on IO significantly and that's a major performance boost for such large programs.

Tom (Sep 29 2022 at 16:56):

@Leonardo de Moura

How do you want the "example code" posted? As a GH repo or just inline here? Unfortunately, I don't have easy access to a Windows box (and not sure if you want a MSVC version) but the g++/clang examples are working.

Tom (Sep 29 2022 at 17:56):

Ok, NVM - going to post here

Tom (Sep 29 2022 at 17:58):

main.cpp

#include <stdio.h>

char const *greeting();

int main()
{
    printf("%s\n", greeting());
}

a.cpp

#include "config.h"

// This is an example of using a weak symbol
// on a definition
char const *WEAK_SYMBOL
greeting()
{
#ifdef __cplusplus
    return "Hello from C++";
#else
    return "Hello from C";
#endif
}

b.cpp

#include "config.h"

// This is an example of using the attribute
// on a declaration
char const *greeting() WEAK_SYMBOL;

char const *greeting()
{
#ifdef __cplusplus
    return "Hello from C++";
#else
    return "Hello from C";
#endif
}

config.h

#if defined(__GNUC__) || defined(__clang__)

// NOTE: We could also decide to put our weak symbols
// into a separate linker section in order to facilitate
// introspection/debugging
#define WEAK_SYMBOL __attribute__((weak))

#elif defined(_MSC_VER)
#error MSVC is not supported
// but we could use __declspec(selectany)

#endif

Makefile

gcc_cpp:
    g++ main.cpp a.cpp b.cpp -o $@

gcc_c:
    gcc -x c main.cpp a.cpp b.cpp -o $@

clang_cpp:
    clang main.cpp a.cpp b.cpp -o $@

clang_c:
    clang -x c main.cpp a.cpp b.cpp -o $@

.DEFAULT_GOAL := test
.PHONY: test
test: gcc_cpp gcc_c clang_cpp clang_c
    @echo "** gcc / c++ **"
    @./gcc_cpp
    @nm -C gcc_cpp | grep greeting

    @echo "** gcc / c **"
    @./gcc_c
    @nm -C gcc_c | grep greeting

    @echo "** clang / c++ **"
    @./clang_cpp
    @nm -C clang_cpp | grep greeting

    @echo "** clang / c **"
    @./clang_c
    @nm -C clang_c | grep greeting

clean:
    rm -f gcc_cpp gcc_c clang_cpp clang_c

Tom (Sep 29 2022 at 18:00):

Just stick these into the same directory and then type make.

It will compile four programs, [c, c++] x [clang, gcc]; run all of them and grep out the greeting symbol to show it's weak.

Please LMK if that's what yo had in mind.

Tom (Sep 29 2022 at 18:01):

This is the expected output

   make test
** gcc / c++ **
Hello from C++
0000000000000652 W greeting()
** gcc / c **
Hello from C
0000000000000657 W greeting
** clang / c++ **
Hello from C++
00000000004011d0 W greeting()
** clang / c **
Hello from C
00000000004011d0 W greeting

Tom (Sep 29 2022 at 18:06):

Also note that the linker has discarded the duplicates, as expected.

Sebastian Ullrich (Sep 29 2022 at 19:22):

It has discarded the duplicate symbol, but not the code

0000000000401143 <greeting>:
  401143:       55                      push   %rbp
  401144:       48 89 e5                mov    %rsp,%rbp
  401147:       b8 04 20 40 00          mov    $0x402004,%eax
  40114c:       5d                      pop    %rbp
  40114d:       c3                      ret
  40114e:       55                      push   %rbp
  40114f:       48 89 e5                mov    %rsp,%rbp
  401152:       b8 11 20 40 00          mov    $0x402011,%eax
  401157:       5d                      pop    %rbp
  401158:       c3                      ret

You need -ffunction-sections -Wl,--gc-sections to get rid of it

Sebastian Ullrich (Sep 29 2022 at 20:03):

Tom said:

I actually like the hashing approach because it would guard us against (certain) types of ABI breaks and compatibility issues (i.e. what C++ defines at the ODR).

I'm not sure what you mean by that. As Mario said, there are no plans for a stable ABI.

Also, ICF will only apply when compiling an executable, so we still need a weak symbol story for when linking against SOs.

ICF applies to any kind of object. While it will not get rid of duplicates across dynamic linking boundaries, some duplication seems acceptable compared to the overhead of going through the GOT.

Sebastian Ullrich (Sep 29 2022 at 20:12):

With weak symbols do we have to pay for indirection even within the same shared library? That may be significant for libleanshared.so. With ICF we would keep specializations static like today.

Mario Carneiro (Sep 29 2022 at 20:21):

I don't see any reason not to enable ICF. It sounds like an LTO-style feature that requires no code changes, so if we can afford the link time cost then why not?

Mario Carneiro (Sep 29 2022 at 20:22):

even if we implement all this weak symbol specialization stuff ICF will still be able to find other examples that just happen to be identical unrelated functions

Tom (Sep 29 2022 at 20:26):

You need -ffunction-sections -Wl,--gc-sections to get rid of it

Thanks for catching - I forgot to re-add when I rewrite my CMakeLists.txt into the simplified Makefile to post here.

Tom (Sep 30 2022 at 00:35):

I actually like the hashing approach because it would guard us against (certain) types of ABI breaks and compatibility issues (i.e. what C++ defines at the ODR).

I'm not sure what you mean by that. As Mario said, there are no plans for a stable ABI.

I just mean that, inadvertently, someone will compile two binaries and link them to Lean and because the symbols will have a hash, they will not be found, rather than being "accidentally" linked. I think that's good because it stops people from chasing phantom problems.

With weak symbols do we have to pay for indirection even within the same shared library?
With ICF we would keep specializations static like today.

Weak symbols cannot be static (they need external linkage, which makes sense). However, the default symbol visibility is set to "hidden", so it doesn't matter! "weakness" is orthogonal to the GOT/PLT lookup. See the following example:

Tom (Sep 30 2022 at 00:41):

a.cpp

char const *greeting();

char const *greeting2() __attribute__((visibility("default")));
char const *greeting2()
{
    return greeting();
}

b.cpp

#include "config.h"

char const *greeting() WEAK_SYMBOL;
char const *greeting()
{
#ifdef __cplusplus
    return "Hello from C++";
#else
    return "Hello from C";
#endif
}

Now run:

   g++ a.cpp b.cpp -shared -o a.so -fvisibility=hidden -fPIC

Looking at nm -DC a.so, you should see that greeting is not in the dynamic section

<snip>
000000000000057a T greeting2()

and using objdump -d a.so you see that

000000000000057a <_Z9greeting2v>:
 57a:   55                      push   %rbp
 57b:   48 89 e5                mov    %rsp,%rbp
 57e:   e8 02 00 00 00          callq  585 <_Z8greetingv>
 583:   5d                      pop    %rbp
 584:   c3                      retq

0000000000000585 <_Z8greetingv>:
 585:   55                      push   %rbp
 586:   48 89 e5                mov    %rsp,%rbp
 589:   48 8d 05 0d 00 00 00    lea    0xd(%rip),%rax        # 59d <_fini+0x9>
 590:   5d                      pop    %rbp
 591:   c3                      retq

So, as expected, the call to greeting goes through a relative jump rather than through an indirect jump.

Sebastian Ullrich (Sep 30 2022 at 07:41):

Ah, that's good! It's especially important on Windows where we can only have a very limited amount of exported symbols.

Sebastian Ullrich (Sep 30 2022 at 07:43):

Tom said:

I just mean that, inadvertently, someone will compile two binaries and link them to Lean and because the symbols will have a hash, they will not be found, rather than being "accidentally" linked. I think that's good because it stops people from chasing phantom problems.

static prevents accidental linking just as well.

With weak symbols do we have to pay for indirection even within the same shared library?
With ICF we would keep specializations static like today.

Weak symbols cannot be static (they need external linkage, which makes sense)

Sure, but what I meant is that with ICF I wouldn't use weak symbols at all.

Tom (Sep 30 2022 at 16:03):

All good points, and ultimately it's not my call - happy to defer to others. For the sake of completeness (and something we can perhaps discuss in the meeting), here are a few other thoughts:

  • icf may not be as portable a weak symbols because not all target platform linkers may not support it (Especially with LLVM on the way)
  • gcc documents that its icf is behaviorally different from gold's. Not sure if that could lead to issues (see here: https://gcc.gnu.org/onlinedocs/gcc/Optimize-Options.html)
  • I'm not 100% sure of the impact of ICF on debug symbols (and perhaps stack traces), but I've heard of some issues (which may have been fixed now). I also cannot say anything about ICF behavior on MSVC, I have no experience there.
  • (This now enters speculation area on my part so please treat with a pinch of salt). You suggested icf=safe (so I assume you're referring to gold). However, the docs say that it will only fold functions whose address is not taken. My understanding of instances is they are effectively just dictionaries of function pointers; could this lead to issues with functions not getting folded?

Cheers!

Mario Carneiro (Sep 30 2022 at 16:30):

@Tom Do you think you could make an example using ICF and comparing results to the weak symbol example above?

Sebastian Ullrich (Sep 30 2022 at 16:41):

I won't make it to today's meeting unfortunately

Tom (Sep 30 2022 at 23:15):

Mario Carneiro said:

Tom Do you think you could make an example using ICF and comparing results to the weak symbol example above?

Sure, and let me create an example when both are useful :smile:

But in summary of the discussion today: It seemed we were lean-ing (ha!) toward using my suggestion of using both weak symbols and then enabling folding, too, because it gives the compiler more control and is not completely reliant on the implementation details of the various linkers. Please LMK if I misunderstood.

Tom (Oct 01 2022 at 04:56):

Here's the extended example, as promised. Note that I have created two structurally equivalent types. By including the function reduceX in both translation units, I create multiple (weak) definitions.

test.h

struct X
{
    int a = 0;
    int b = 0;
};

// create the same weak symbol in two translation units
int reduceX(X const &x) __attribute__((weak));

int reduceX(X const &x)
{
    return x.a + x.b;
}

struct Y
{
    int w = 0;
    int z = 0;
    float r = 0;
};

a.cpp

#include "test.h"

#include <cstdio>

int reduceY(Y const &y);

int main()
{
    X x{1, 2};
    Y y{3, 4};

    std::printf("reduceX result = %d\n", reduceX(x));
    std::printf("reduceY result = %d\n", reduceY(y));
#ifdef WITH_ADDR
    //if defined, the address of the two functions will be taken
    std::printf("folded = %d\n", (void *)reduceX == (void *)reduceY);
#endif
}

b.cpp

#include "test.h"

int reduceY(Y const &y);

int reduceY(Y const &y)
{
    return y.w + y.z;
}

Makefile

weak_only: a.cpp b.cpp
    g++ a.cpp b.cpp -o $@

weak_only_addr: a.cpp b.cpp
    g++ -DWITH_ADDR a.cpp b.cpp -o $@

ld_icf: a.cpp b.cpp
    g++ a.cpp b.cpp -ffunction-sections -fipa-icf -o $@

gold_icf_safe: a.cpp b.cpp
    g++ a.cpp b.cpp -ffunction-sections -fuse-ld=gold -Wl,--icf=safe -o $@

gold_icf_safe_addr: a.cpp b.cpp
    g++ -DWITH_ADDR a.cpp b.cpp -ffunction-sections -fuse-ld=gold -Wl,--icf=safe -o $@

gold_icf_all: a.cpp b.cpp
    g++ a.cpp b.cpp -ffunction-sections -fuse-ld=gold -Wl,--icf=all -o $@

gold_icf_all_addr: a.cpp b.cpp
    g++ -DWITH_ADDR a.cpp b.cpp -ffunction-sections -fuse-ld=gold -Wl,--icf=all -o $@

TGTS := weak_only weak_only_addr ld_icf \
        gold_icf_safe gold_icf_safe_addr gold_icf_all gold_icf_all_addr

define helper
    @echo $(1)
    @./$(2)
    @nm -C $(2) | grep reduce
    @objdump -C --disassemble=main $(2) | grep call | grep reduce
    @echo
endef

.DEFAULT_GOAL := test
.PHONY: test
test: $(TGTS)
    $(call helper,"** gcc/ld standard **","weak_only")
    $(call helper,"** gcc/ld standard - address taken **","weak_only_addr")
    $(call helper,"** gcc/ld icf **","ld_icf")
    $(call helper,"** gcc/gold icf safe**","gold_icf_safe")
    $(call helper,"** gcc/gold icf safe - address taken **","gold_icf_safe_addr")
    $(call helper,"** gcc/gold icf all **","gold_icf_all")
    $(call helper,"** gcc/gold icf all - address taken **","gold_icf_all_addr")


clean:
    rm -f $(TGTS)

The makefile dumps the (C++ filtered) symbols, as well as grepping out the function calls from the main function to verify their assembly call addresses.

Tom (Oct 01 2022 at 05:02):

This is the output

   make test
g++ a.cpp b.cpp -o weak_only
g++ -DWITH_ADDR a.cpp b.cpp -o weak_only_addr
g++ a.cpp b.cpp -ffunction-sections -fipa-icf -o ld_icf
g++ a.cpp b.cpp -ffunction-sections -fuse-ld=gold -Wl,--icf=safe -o gold_icf_safe
g++ -DWITH_ADDR a.cpp b.cpp -ffunction-sections -fuse-ld=gold -Wl,--icf=safe -o gold_icf_safe_addr
g++ a.cpp b.cpp -ffunction-sections -fuse-ld=gold -Wl,--icf=all -o gold_icf_all
g++ -DWITH_ADDR a.cpp b.cpp -ffunction-sections -fuse-ld=gold -Wl,--icf=all -o gold_icf_all_addr
** gcc/ld standard **
reduceX result = 3
reduceY result = 7
0000000000001121 W reduceX(X const&)
00000000000011cb T reduceY(Y const&)
    116e:       e8 ae ff ff ff          call   1121 <reduceX(X const&)>
    1190:       e8 36 00 00 00          call   11cb <reduceY(Y const&)>

** gcc/ld standard - address taken **
reduceX result = 3
reduceY result = 7
folded = 0
0000000000001121 W reduceX(X const&)
0000000000001201 T reduceY(Y const&)
    116e:       e8 ae ff ff ff          call   1121 <reduceX(X const&)>
    1190:       e8 6c 00 00 00          call   1201 <reduceY(Y const&)>

** gcc/ld icf **
reduceX result = 3
reduceY result = 7
0000000000001121 W reduceX(X const&)
00000000000011cb T reduceY(Y const&)
    116e:       e8 ae ff ff ff          call   1121 <reduceX(X const&)>
    1190:       e8 36 00 00 00          call   11cb <reduceY(Y const&)>

** gcc/gold icf safe**
reduceX result = 3
reduceY result = 7
00000000000006c1 W reduceX(X const&)
00000000000006c1 T reduceY(Y const&)
 70e:   e8 ae ff ff ff          call   6c1 <reduceY(Y const&)>
 730:   e8 8c ff ff ff          call   6c1 <reduceY(Y const&)>

** gcc/gold icf safe - address taken **
reduceX result = 3
reduceY result = 7
folded = 0
00000000000006c1 W reduceX(X const&)
00000000000007a1 T reduceY(Y const&)
 70e:   e8 ae ff ff ff          call   6c1 <reduceX(X const&)>
 730:   e8 6c 00 00 00          call   7a1 <reduceY(Y const&)>

** gcc/gold icf all **
reduceX result = 3
reduceY result = 7
00000000000006c1 W reduceX(X const&)
00000000000006c1 T reduceY(Y const&)
 70e:   e8 ae ff ff ff          call   6c1 <reduceY(Y const&)>
 730:   e8 8c ff ff ff          call   6c1 <reduceY(Y const&)>

** gcc/gold icf all - address taken **
reduceX result = 3
reduceY result = 7
folded = 1
00000000000006c1 W reduceX(X const&)
00000000000006c1 T reduceY(Y const&)
 70e:   e8 ae ff ff ff          call   6c1 <reduceY(Y const&)>
 730:   e8 8c ff ff ff          call   6c1 <reduceY(Y const&)>

Summary:

I deliberately designed the test so you get both equality (in the case of weak symbols) and structural equivalence (two structurally equivalent -but nominally different - functions/types.

1) Weak symbols work in all cases
2) gcc's code folding doesn't do anything in this case (even if the symbols are both weak or if the structures are actually structurally equivalent - I tried both)
3) gcc with gold + safe will fold the code, but switches off in the case when the addresses are used (as per the docs)
4) gcc with gold + all works as expected - the code is always folded

Tom (Oct 02 2022 at 23:45):

@Mario Carneiro - any thoughts on the example above? Does that satisfy what you were looking for? It'll be easier for me to make any adjustments on the weekend rather than during the week so if there's anything that's not clear, please LMK and I can try to have it ready at least prior to the meeting tomorrow. Thanks!

Mario Carneiro (Oct 02 2022 at 23:46):

Does that work on all the platforms we test in CI?

Mario Carneiro (Oct 02 2022 at 23:48):

if gcc icf all doesn't require any code changes, I would suggest turning it on in a PR and benchmarking it

Tom (Oct 02 2022 at 23:49):

It should, assuming we only build with gcc/clang and a recent version of binutils.

However, my understanding of the point of the example your were looking for was to contrast weak symbols and linker folding, so in this case it didn't seem to matter if it's portable or not.

Mario Carneiro (Oct 02 2022 at 23:54):

the question is whether it is worth enabling, or if it's too brittle and/or causes issues

Tom (Oct 02 2022 at 23:54):

and benchmarking it

What would that mean here? Just looking at the resulting Lean binary code size?

Or is there a collection of benchmarks I can run? I don't see how this would meaningfully affect runtime performance unless we're looking at some very minor i-cache behaviors which I find unlikely.

I would suggest turning it on in a PR

What do you mean? "Turn it on for code generated by lean"? Turn it for the lean compiler itself (I assume you mean for stage 1? Stage 0 seems to have a lot of C++ code and I think enabling that there would lead to problems.

Mario Carneiro (Oct 02 2022 at 23:55):

benchmarking it meaning !bench which collects a bunch of metrics including code size as well as performance

Mario Carneiro (Oct 02 2022 at 23:56):

Stage 0 seems to have a lot of C++ code and I think enabling that there would lead to problems.

Yes, that's exactly why we should enable it and make this more than just speculation

Tom (Oct 02 2022 at 23:56):

Would you be able to please point me at either some docs/READMEs about bench, or at least give an example how to run it?
Sorry for stupid questions, I appreciate the help.

Mario Carneiro (Oct 02 2022 at 23:57):

my guess is that this won't cause problems because we don't write the kind of code you are worried about, but if something breaks during deployment this will make everything more concrete

Mario Carneiro (Oct 02 2022 at 23:58):

I think you have to be leo to actually kick the benchmark bot, but you can do the PR

Mario Carneiro (Oct 03 2022 at 00:00):

Stage 1 and stage 0 are ~the same (stage 0 is stage 1 from a few commits prior)

Tom (Oct 03 2022 at 00:01):

Yes, that's exactly why we should enable it and make this more than just speculation

I'm confused by that statement.

If stage 0 contains a lot of C++ code (and it does - it seems to be e.g. using the C++ standard library; std::string etc) I would strongly consider avoiding linker folding, as I've mentioned in previous conversations.

If C code generated by the Lean codegen only creates "C" functions without relying on e.g. C++ templates and the standard library, then I think we are potentially in safe territory (modulo any address comparisons but you and others have said not to worry about it) and we should feel ok about turning this flag on for such binaries.

Which one are we talking about here?

Mario Carneiro (Oct 03 2022 at 00:01):

I'm talking about enabling aggressive linker folding in the C++ code

Mario Carneiro (Oct 03 2022 at 00:02):

yes, the kind that you are concerned about

Mario Carneiro (Oct 03 2022 at 00:02):

how are we supposed to know if there is actually a problem otherwise

Tom (Oct 03 2022 at 00:37):

In which case I would just eschew the experiment.

There is a documented history of problems in C++ when enabling linker folding. MSVC, for one, did it a while back and had to roll back the changes because it broke people's code. This is what I've been concerned about all along.

Since C++ can always assume that pointer and object identities are preserved between different instantiations of types and functions - and parts of the compiler depend on C++ - I don't see how we can enable it.

Even though it may work today, when you e.g. update the compiler (or its standard library) something might break. Even more so, now that LLVM is being pulled in, I don't think we can make any claims about the validity of any linker folding as the LLVM implementation could change in arbitrary ways; and part of the problem is that the breakage may well be silent, which seems highly undesirable.

Mario Carneiro (Oct 03 2022 at 04:22):

@Tom Unlike MSVC, we're not trying to defend against issues in arbitrary code, only in the lean implementation. I assert that the issues you are worrying about don't occur in that code because you need to write contrived code to hit the issue, and the lean implementation doesn't do that. This isn't an issue of depending on UB that could break our code in the future - as long as we don't use pointer identity of template functions (which, again, why would one ever think that is okay?) then we're fine AFAICT.

The only way to find out if it's an issue in practice is to do the experiment, and audit the code to be sure that none of those issues might pop up in the future. It's not a black box or something users can extend, so we can just double check that the crazy code you are talking about doesn't happen.

Siddharth Bhat (Oct 03 2022 at 18:25):

@Mario Carneiro AFAIU, Tom is arguing arguing that though Lean4 contains no contrived code, some dependency of Lean4 (eg. GMP, LLVM) very well may contain contrived code. Thus, turning on linker folding could cause bugs in a dependency of the Lean compiler (eg. LLVM). Furthermore, this bug could be silent (are we confident that our tests will uncover such a bug?)

Mario Carneiro (Oct 03 2022 at 18:26):

This is still pure speculation though. It's extremely non-actionable

Sebastian Ullrich (Oct 03 2022 at 18:27):

It also only matters for LLVM if we do switch it to static linking

Mario Carneiro (Oct 03 2022 at 18:29):

if we actually uncovered such an issue in a dependency, I would report it as a bug (or get it to be prominently labeled as not compatible with ICF)

Tom (Oct 04 2022 at 01:37):

This is still pure speculation though.

Seems a bit harsh, TBH (and not the first time in this thread).

I've gone ahead an implemented every follow-up you've requested, and taken care to explain why it's a concern and why it could become one in the future, and laid out examples when it has happened in practice.

As you know, I am inexperienced in the codebase, and don't have an oracle so am erring on the side of caution. Perhaps our respective risk profiles/concerns are different but I would appreciate your help/guidance instead of just classifying the above information as "speculation" - it seems a little dismissive.

As I have mentioned before, I am happy to enable and test the linker folding. I'm not afraid of a good technical back-and-forth but would appreciate a toning down the the rhetoric. :thank_you:

Mario Carneiro (Oct 04 2022 at 02:40):

I don't mean that rhetorically. Unless and until we get data on whether and how big a problem it is, it is speculation. The point of the PR is to get data, I don't know how else to express that. I understand that you are worried about bugs, but that's why we have a test suite. If there's an issue, let's find out what it looks like and what we can do to resolve it. Being afraid to do a test because something could break makes no sense to me at all.

Mario Carneiro (Oct 04 2022 at 02:43):

To be clear, what I consider speculation is not that there is a potential for issues when crazy C++ code meets linker optimizations, but rather that crazy C++ code and linker optimizations will combine into a bug in the lean codebase.

Mario Carneiro (Oct 04 2022 at 02:49):

If we grant your premise that there might be a latent bug caused by linker folding, please tell me what path you see to eliminating this possibility. What is the review protocol such that we can be confident that the risks are sufficiently mitigated? (This is what I mean when I talk about it being "actionable". A vague threat which can't be falsified is difficult to work with.)

Tom (Oct 04 2022 at 05:10):

I don't mean that rhetorically

Fair enough, let's move on. No problem, sometimes text is hard.

For completeness, I am trying to get a build of the compiler going before Friday's meeting, if not sooner, to test as requested.

Please see below the other answers:

but that's why we have a test suite.

I am still confused by this and completely open to the fact that I'm misunderstanding you -please hear me out.

I grant you that we may be able to show that the Lean compiler doesn't suffer from this problem. The part I don't get is: How does the test suite help with showing that some arbitrary program written in a mix of Lean and C or C++ doesn't exhibit it?
The initial problem, as I understood from @Leonardo de Moura , was to apply weak symbols to the generated code in order to reduce code size/duplication. As such, I understood the subsequent discussion of linker optimizations has been predominantly aimed at the results of Lean compilation, not just the compiler itself.

Hence I thought the point people were interested in is turning on linker folding for Lean-compiled outputs (i.e. the binaries the compiler produces), and to the best of my understanding lake could build mixed binaries of Lean and C++ code, no? (including static linking?).

How does the test suite help here?

but rather that crazy C++ code

This is the part I am still unclear on. For example, if I write

int x();
int y();

std:: vector fns = {x, y};

in C++, I can assume that this container contains two distinct elements and build further invariants around that premise. It doesn't seem that "crazy" to me.

please tell me what path you see to eliminating this possibility

I assumed we would still pursue @Leonardo de Moura 's original idea of using weak symbols. As I summarized above after our last meeting, he seemed in favor of giving more control to the Lean compiler, which weak symbols would give it. I then viewed linker folding just as a "cherry on top" and after the first part, I wouldn't expect it to have an appreciable effect; and if it's still needed we could look at using icf=safe rather than icf=all.

What is the review protocol such that we can be confident that the risks are sufficiently mitigated?

Not using icf=all for now would not create the problem in the first place, right?

A vague threat which can't be falsified

I have not used any "vague threats" but believe we're going back to a somewhat un-constructive territory here.
I've done my best to support everything I've said with examples and feel I've taken the time to explain and clarify any misunderstandings.

If you could provide your guidance regarding my questions or possible confusion above, I would appreciate your and @Leonardo de Moura 's feedback (and perhaps @Sebastian Ullrich, since he was the first to suggest linker folding)

Many thanks.

Mario Carneiro (Oct 04 2022 at 06:13):

I grant you that we may be able to show that the Lean compiler doesn't suffer from this problem. The part I don't get is: How does the test suite help with showing that some arbitrary program written in a mix of Lean and C or C++ doesn't exhibit it?

We aren't trying to support arbitrary C++ in lean programs. We're trying to support the C++ in the lean core, plus an open-ended amount of lean-generated C code using very particular patterns. For FFI, there will always be a bunch of gotchas and I don't mind this being one more thing to tell people to keep in mind.

Mario Carneiro (Oct 04 2022 at 06:17):

Hence I thought the point people were interested in is turning on linker folding for Lean-compiled outputs (i.e. the binaries the compiler produces), and to the best of my understanding lake could build mixed binaries of Lean and C++ code, no? (including static linking?).

You could say the same thing about linking assembly code - you could link assembly, and there are a ton of observable things that the compiler will stomp all over in that code. If you do that the consequences are on you. I don't see that as a reason to scale back optimizations as long as lean code avoids the issue by construction.

Mario Carneiro (Oct 04 2022 at 06:18):

in C++, I can assume that this container contains two distinct elements and build further invariants around that premise. It doesn't seem that "crazy" to me.

That does seem crazy to me. Comparing function pointers for equality is a fundamentally suspicious operation. The only direction that makes sense to me is that if two function pointers compare equal then calling one is equivalent to calling the other, although even this is suspicious because of pointer provenance. You have to be writing some really sketchy code for this question to even arise.

Mario Carneiro (Oct 04 2022 at 06:23):

I assumed we would still pursue Leonardo de Moura 's original idea of using weak symbols. As I summarized above after our last meeting, he seemed in favor of giving more control to the Lean compiler, which weak symbols would give it. I then viewed linker folding just as a "cherry on top" and after the first part, I wouldn't expect it to have an appreciable effect; and if it's still needed we could look at using icf=safe rather than icf=all.

I'm interested to see ICF first because by all accounts it's a much simpler thing to do, just flipping a switch somewhere. I don't think that the two approaches are in opposition. As I and others said at the meeting, we should start an ICF experiment with the most aggressive setting and scale back in response to issues, not the other way around.

Mario Carneiro (Oct 04 2022 at 06:34):

A vague threat which can't be falsified

I have not used any "vague threats" but believe we're going back to a somewhat un-constructive territory here.
I've done my best to support everything I've said with examples and feel I've taken the time to explain and clarify any misunderstandings.

I think you misunderstood what I mean by "threat" here. There is some threat / danger that we will have written a bug and that the bug will go undetected. We are trying to assess the risk here, and my point is that the first step is to gather data by doing the most aggressive version of everything in a test and seeing what that unearths. If nothing shows up and the numbers look good then it will already be within our usual margin of error. We do a fair bit of testing on users anyway so it isn't the end of the world if an issue comes up in one of the dependent projects and we have to re-evaluate. And if nothing shows up even then, I think we're golden.

The not-useful direction is to say that because the possibility of a bug persists, we had best avoid the path altogether, because more testing, more auditing, more anything isn't going to eliminate that possibility completely. There is always the possibility of bugs as long as lean remains formally unverified, which is why we have procedures to mitigate the risk and proceed anyway, and backpedal later if it turned out not to be a good idea.

Mario Carneiro (Oct 04 2022 at 06:40):

Tom said:

For example, if I write

int x();
int y();

std:: vector fns = {x, y};

in C++, I can assume that this container contains two distinct elements and build further invariants around that premise. It doesn't seem that "crazy" to me.

By the way, could you quote where the standard promises that x and y will compare unequal? This sounds suspicious to me - I would guess that the standard says it's implementation defined.

Mario Carneiro (Oct 04 2022 at 06:59):

It's worse than I thought - the C++ standard is worded ambiguously on this point. Here's a SO post where people argue over what it means. My reading is that it's legal to put different functions in the same place, and have == do pointer equality, since the wording says that function pointers should compare equal if they have the same address but this is satisfied in this case. However, this is a change to the observable behavior of the program, and that's what the icf=safe vs icf=all thing is about.

Tom (Oct 04 2022 at 16:03):

We aren't trying to support arbitrary C++ in lean programs.

I'm just being practical. Seems that code in the near- to medium- future will likely include mixed binaries.

As I and others said at the meeting, we should start an ICF experiment with the most aggressive setting and scale back in response to issues, not the other way around.

I have to admit I don't remember anyone else following this line except you but in general, the statement does seem to illustrate an interesting difference in our world views/risk tolerance.

I think you misunderstood what I mean by "threat" here.

No, I don't think so - I understand your points, I was asking for a change in rhetoric. I have for example not characterized your ideas as being e.g. "cavalier" because you want to turn everything up to 11 and then see what breaks. It's just a difference in world view.
It would make it easier for me to stay engaged in a positive way.

By the way, could you quote where the standard promises that x and y will compare unequal?

Ok, I don't want this to stray too far from the topic at hand but my reading of 7.6.10 point 3 implies this:

“ - Otherwise, if the pointers are both null, both point to the same function, or both represent the same address (6.8.2), they compare equal.

— Otherwise, the pointers compare unequal.

I don't want to go down the road of giving the subsequents statements of what "same function" etc means. I have also reached out to a few of my friends on the committee to get some additional input. I don't think this blocks the progress thought; I'll let you know what they say.

Cheers!

Mario Carneiro (Oct 04 2022 at 16:08):

If x() and y() are functions at the same address, then it would seem the third clause is satisfied

Sebastian Ullrich (Oct 04 2022 at 16:10):

Fwiw, I don't mind being conservative and choosing --icf=safe if there isn't a large binary size difference to all. And I don't mind the weak symbols approach either, in which case ICF savings might be small enough to not justify the additional (linker) effort.

Mario Carneiro (Oct 04 2022 at 16:11):

My expectation is that the difference between icf=safe and icf=all will be huge, because almost every lean function is address-taken by its boxed version

Mario Carneiro (Oct 04 2022 at 16:12):

I have seen some indications that it might be possible to mark specific functions as ICF candidates, in which case we could just do this on all lean functions

Sebastian Ullrich (Oct 04 2022 at 16:13):

The boxed version should not take an address, though using it in a partial application of course does

Mario Carneiro (Oct 04 2022 at 16:15):

ah yes you're right, it's actually lean_alloc_closure calls that need it

Sebastian Ullrich (Oct 04 2022 at 16:20):

Mario Carneiro said:

I have seen some indications that it might be possible to mark specific functions as ICF candidates, in which case we could just do this on all lean functions

Oh nice, this should close any gap between safe and all at least for the LLVM backend https://llvm.org/docs/Extensions.html#sht-llvm-addrsig-section-address-significance-table

Sebastian Ullrich (Oct 04 2022 at 16:22):

ICF can cause problems when
the program depends on functions having a unique address, both gold
and LLD have an --icf=safe mode that limits the scope of the
optimization to avoid folding sections that are "address-significant".
The implementation in gold uses linker heuristics such as relocation
type to determine address significance. The implementation in LLD uses
information generated by the compiler, which is placed in .addrsig.

Tom (Oct 04 2022 at 16:35):

Hi @Sebastian Ullrich, I just got a build of lean and am planning to run the experiments with both safe and all and report back.

After running the tests on a clean checkout, I am getting a number of test failures (without any modifications). Is that expected on master?

(It's a fairly small number:

96% tests passed, 64 tests failed out of 1567

I'm not sure what the expectations on HEAD are)

Mario Carneiro (Oct 04 2022 at 16:40):

no, all tests pass on master

Sebastian Ullrich (Oct 04 2022 at 16:43):

Yeah, that's not expected to happen

Tom (Oct 04 2022 at 16:47):

Ok, thanks. I wanted to check first if perhaps just certain releases/tags had to be "stable" while HEAD might be a little more "WIP".
I'll dig, thanks for the confirmation.

Mario Carneiro (Oct 04 2022 at 16:50):

There are links to CI runs for each commit which you can investigate if you want to track down a divergence

Mario Carneiro (Oct 04 2022 at 16:50):

like this > https://github.com/leanprover/lean4/actions/runs/3182701601/jobs/5188989980

Tom (Oct 04 2022 at 23:50):

As an aside, I've heard back from some of the people I know that are on the committee/deep in the C++ world and the consensus is that the intent is that distinct functions do, in fact, keep distinct addresses.
I have a couple of additional "gems", just for general interest:

1) If you enable -flto, in the case when the compiler previously returned "true" for the function identity equality, it will actually go back and "fix it up" to ensure they actually no longer compare equal!
2) There is a defect report (DR1400) that is supposed to specifically address this issue and I feel the wording does clarify
3) This also seems to have been such an issue that some compilers will generate/fold the two functions and just pad one of them with a bunch of nops to give them distinct addresses despite the code being "the same". You can test it using the code/makefile I provided above but you may also need to mark reduceX/Y as [[gnu::noinline]] to ensure the LTO doesn't just inline everything :smile:

Sebastian Ullrich (Oct 05 2022 at 08:55):

I opened https://github.com/leanprover/lean4/pull/1689 for some quick benchmarking experiments, first enabling fine-grained, GC-ed sections and then safe ICF in a second run (unfortunately I've already made a mess of the history). Happy to push more simple experiments.

Leonardo de Moura (Oct 05 2022 at 12:16):

@Sebastian Ullrich This is great. It would be great to disable the specialization reuse between different compilation units in the old code generator, and then test whether ICF can merge the different copies. To disable specialization reuse between different compilation units, we have to move
https://github.com/leanprover/lean4/blob/85c468c8532d023f733c583fc23774c1733bdf0e/src/Lean/Compiler/Specialize.lean#L87
to a non-persistent environment extension.
That is, we need to split
https://github.com/leanprover/lean4/blob/85c468c8532d023f733c583fc23774c1733bdf0e/src/Lean/Compiler/Specialize.lean#L107
into two environment extensions.

  • A persistent one containing specInfo : SMap Name SpecInfo
  • A non-persistent one containing cache : SMap Expr Name

Sebastian Ullrich (Oct 05 2022 at 13:56):

Yes, if someone could assemble that and ideally test if stage2 passes, I'd be happy to cherry-pick that! As it is right now, I'm already struggling to get the much simpler changes to work correctly, hehe...

Sebastian Ullrich (Oct 05 2022 at 15:11):

I pushed a much simpler version, though in the whole PR there is something weird going on where only the test/ binaries are affected, not leanshared. Even though we do benchmark stage 2.

Sebastian Ullrich (Oct 05 2022 at 15:21):

For -ffunction-sections --gc-sections alone I guess it makes sense that leanshared is not affected as all functions should be reachable from an exported symbol

Tom (Oct 05 2022 at 16:20):

Hi, I managed to get my changes to build on Linux last night (still very hacky as I'm not that familiar with the current build and leanc) and my first result was that stage1 libleanshared.so's size didn't change at all with icf all. Did you observe the same thing?

Sebastian Ullrich (Oct 05 2022 at 16:22):

We should know soon when the bot replies to the PR :)

Sebastian Ullrich (Oct 05 2022 at 16:26):

Huh, looks like the binary actually got smaller, i.e. --icf=all was able to remove more code than deactivating the specialization cache added

Tom (Oct 05 2022 at 20:04):

I didn't try pushing a PR before testing locally, largely because I'm still trying to get to grips with the build system. There seems to be multiple layers (I read the bootstrapping doc) but also leanc appears to be involved. After a few failed attempts this is what I came up with:

cmake -DLEAN_EXTRA_CXX_FLAGS="-ffunction-sections" -DLEAN_EXTRA_LINKER_FLAGS="-fuse-ld=gold -Wl,--icf=all" -DLEANSHARED_LINKER_FLAGS="-fuse-ld=gold -Wl,--icf=all" ../..

I looked at the linking step for libleanshared.so and I saw the options seemed correct.

I can look at your PR later to see what you've done but if you have any comments on the above/could tell me what I'm doing wrong, that would also help me learn a bit more about the build process.

Thanks!

Sebastian Ullrich (Oct 05 2022 at 20:07):

LEAN_EXTRA_LINKER_FLAGS should subsume LEANSHARED_LINKER_FLAGS. And LEAN_EXTRA_CXX_FLAGS is not really relevant as the C++ code doesn't matter here, you'll want LEANC_EXTRA_FLAGS like in the PR

Tom (Oct 05 2022 at 20:11):

I had originally tried that and couldn't get them to appear on the leanc.sh command line for stage1 until I added LEANSHARED_LINKER_FLAGS. It's possible I made a mistake (at that point I had already been at this for a long time) but that wasn't my experience.

Sebastian Ullrich (Oct 05 2022 at 20:11):

You'll find the LEANC_* flags inside leanc.sh itself, not on the cmdline

Tom (Oct 05 2022 at 22:29):

I had tried that, unsuccessfully :smile:

Tom (Oct 05 2022 at 22:29):

I'll check out your PR later, see what I did wrong.

Sebastian Ullrich (Oct 06 2022 at 09:40):

So here's my preliminary analysis of these numbers:

  • Just adding -fdata-section -ffunction-sections -Wl,--gc-sections reduces tests/compiler binaries by 37% on average, so we should just make it the default immediately imo. No effect on Lean itself as explained above.
  • --icf=safe/all reduces libleanshared.so by 0%/10% and tests/compiler by another 6%/7%. safe not affecting leanshared at all may be explained by the fact that folding exported symbols in a shared library is unsafe, and we export all functions except for initializers. A semi-conservative choice might be to use safe for linking Lean user binaries, as we don't know what else they might be statically linking against, but all for linking leanshared as we assume there's no sufficiently crazy C++ code in there, and we want to reduce it in the future anyway.
  • Finally, disabling the specialization cache cross-module incurs an unavoidable overhead of 12% compilation time, 4% total build time, and 13% C lines. Which is not great, but might be tolerable in exchange for better separate compilation. The leanshared binary size increases by 14%, which icf=all manages to get down to -2%, i.e. it was able to negate some but not all of the increase compared to the 10% savings in the previous bullet point.

Sebastian Ullrich (Oct 06 2022 at 10:36):

Frankly I'm surprised ICF is that effective. Because it definitely can't fold any function that had a closed term extracted, as that leads to a unique global mutable variable the linker definitely can't fold away. What I don't know is whether this would improve with https://github.com/leanprover/lean4/issues/467 - is the linker able to fold identical .rodata sections?

Sebastian Ullrich (Oct 06 2022 at 11:41):

And the test segfaults on macOS are certainly concerning

Tom (Oct 06 2022 at 17:24):

Wow, thanks for trying that and the write up! Very interesting numbers.

A semi-conservative choice might be to use safe for linking Lean user binaries, as we don't know what else they might be statically linking against, but all for linking leanshared as we assume there's no sufficiently crazy C++ code in there

That's exactly what I had in mind in the ongoing discussion above

folding exported symbols in a shared library is unsafe

I think this boils down to the argument about function identity

I want to revisit another idea I threw out in the beginning: What if we used weak functions but only named the implementation using the content addressable hash? That way we could actually control what gets folded and would not be beholden to the linker doing a "good job". It's also not going to fall afaul of any C++/linking rules. One of the objections was the introduction of "non-human" understandable symbols but the Posix toolchains actually provide a capability for symbol aliases. So, perhaps we could have two symbols: In the code we would invoke the "named" ones - while giving the linker the ability to fully fold the "hashed" weak ones.

Sebastian Ullrich (Oct 06 2022 at 18:11):

Sounds interesting, can we do that in C?

Sebastian Ullrich (Oct 06 2022 at 18:16):

I'm not sure anymore if the symbol name is a big issue - we most often see them in stack traces, in which case any kind of folding will basically show a random representative. So as long as the specialized function itself is still included in the symbol name, it's probably fine.

Tom (Oct 07 2022 at 00:21):

Sebastian Ullrich said:

Sounds interesting, can we do that in C?

Yes, to the best of my knowledge: https://gcc.gnu.org/onlinedocs/gcc-4.7.2/gcc/Function-Attributes.html
I can't link directly to the section but it's the first one on the page.

Tom (Oct 07 2022 at 00:22):

Ok this is what I get

main.c

#include <stdio.h>
#include <stdlib.h>

// sha256 of "my function"
char const *f_12350a3b39aaba8c7514ea42a83129d968ecec3aedd9fe8590cf7b820b9242b8() __attribute__((weak));

char const *f_12350a3b39aaba8c7514ea42a83129d968ecec3aedd9fe8590cf7b820b9242b8()
{
    char const *res = "Hello world";
    abort();
    return res;
}

char const *greeting() __attribute__((weak, alias("f_12350a3b39aaba8c7514ea42a83129d968ecec3aedd9fe8590cf7b820b9242b8")));

int main()
{
    printf("%s\n", greeting());
}

gcc main.c

Tom (Oct 07 2022 at 00:23):

   nm -C a.out | grep -E "greeting|f_"
0000000000001169 W f_12350a3b39aaba8c7514ea42a83129d968ecec3aedd9fe8590cf7b820b9242b8
0000000000001169 W greeting

Both weak, same address, so it's a valid alias.

Tom (Oct 07 2022 at 00:24):

BUT when I run it in gdb, this is what I see after abort gets called

(gdb) where
#0  __pthread_kill_implementation (no_tid=0, signo=6, threadid=140737351509824) at ./nptl/pthread_kill.c:44
#1  __pthread_kill_internal (signo=6, threadid=140737351509824) at ./nptl/pthread_kill.c:78
#2  __GI___pthread_kill (threadid=140737351509824, signo=signo@entry=6) at ./nptl/pthread_kill.c:89
#3  0x00007ffff7dc3476 in __GI_raise (sig=sig@entry=6) at ../sysdeps/posix/raise.c:26
#4  0x00007ffff7da97f3 in __GI_abort () at ./stdlib/abort.c:79
**#5  0x0000555555555185 in greeting ()**
#6  0x0000555555555197 in main ()

Note it has the aliased name, not the hash.

Tom (Oct 07 2022 at 00:29):

Note I compiled this as C (as per your question); in C++ it would be necessary to mark the hashed function as extern "C" because the alias is to the (mangled) symbol

Sebastian Ullrich (Oct 07 2022 at 08:14):

Nice, looks good. So can we rely on gdb/backtrace()/... showing the alias symbol name instead of the original? And folding of weak symbols still works I assume?

Sebastian Ullrich (Oct 07 2022 at 14:13):

Just adding -fdata-section -ffunction-sections -Wl,--gc-sections reduces tests/compiler binaries by 37% on average, so we should just make it the default immediately imo. No effect on Lean itself as explained above.

done: https://github.com/leanprover/lean4/pull/1700

Sebastian Ullrich (Oct 07 2022 at 19:31):

For the record, at the end of today's meeting we decided that while these experiments were extremely informative, especially for any future work in this direction, at this point we want to keep the deduplication in the compiler, not move it to the linker. We can revisit the topic when it seems likely that we would save more time from improved separate compilation than we lose on redundant compilation.

Sebastian Ullrich (Oct 07 2022 at 19:36):

@Tom You had more remarks in the chat that we didn't manage to get back to in the meeting, do you want to continue the discussion here so it doesn't get lost? I think you mentioned the "key" that we would use for deduplication. My assumption was that we would keep the current key, which is the full application of the specialized function (just possibly hashed) regardless of where deduplication happens.

Sebastian Ullrich (Oct 07 2022 at 19:37):

Ah, as also mentioned in Leo's first message above

Tom (Oct 08 2022 at 00:39):

Hi @Sebastian Ullrich ,

Thanks for the follow up. I am happy to leave things as they are for now; the weak symbol project seemed like a good place for me because it would allow me to contribute (and learn more about LCNF e.g. to generate the hashes). However, if the current decision is to move on, I am not married to it and I don't think I'll forget what we've discussed here. So, it's really up to you and the other more experienced devs here to help set the direction because I don't know enough yet to be able to decide one way or another.

It seems some of the discussion around separate compilation and modules is related - and quite interesting - so perhaps this will come up again sooner rather than later. It might be interesting to look at the C++ module system design because I see a lot similarities coming up so the experience and practical aspects may be enlightening : Separate compilation vs graph-based dependencies; specializations/weak symbols and templates being defined and exported by modules etc. etc.

If you're interested in carrying on, or finishing off some of the details, LMK. In the meantime, I'm happy to move on. I have plenty to occupy myself with in terms of simultaneously trying to learn the language, the compiler internals etc etc. If something comes to mind, please don't hesitate to reach out.

Tom

Tom (Oct 08 2022 at 00:40):

Ah, as also mentioned in Leo's first message above

I'm not sure which one you're referring to, sorry

Sebastian Ullrich (Oct 08 2022 at 08:22):

Ah, I just meant this part:
Leonardo de Moura said:

Recall that each code specialization in Lean is uniquely identified by a Expr that does not contain free variables. We will use the hashcode to name the new function produced by our code specializer.

Sebastian Ullrich (Oct 08 2022 at 08:23):

Leo has now reimplemented cross-module specialization reuse in the new compiler: https://github.com/leanprover/lean4/commit/9eb641e7da4145c0f5fc14390e0222f255e2b659

Sebastian Ullrich (Oct 08 2022 at 08:25):

@Tom I haven't looked at C++ modules at all yet (except for a few dunks on it on Twitter I believe...), do you know of a good write-up of it regarding separate compilations?


Last updated: Dec 20 2023 at 11:08 UTC