Zulip Chat Archive

Stream: general

Topic: memory leak in server?


Kevin Buzzard (Feb 26 2018 at 23:00):

invalid import: data.set.basic
excessive memory consumption detected at 'replace' (potential solution: increase memory consumption threshold)

Kevin Buzzard (Feb 26 2018 at 23:01):

Why do those happen?

Chris Hughes (Feb 26 2018 at 23:01):

restart lean. That sort of thing happend to me all the time lately

Kevin Buzzard (Feb 26 2018 at 23:02):

I feel like it's happening more and more often to me too at the minute. Is Lean a bit less stable than it was or something?

Kevin Buzzard (Feb 26 2018 at 23:02):

I am trying to define a structure sheaf, I could do without random errors :-)

Chris Hughes (Feb 26 2018 at 23:05):

Also simp randomly fails, and then succeeds without me doing anything.

Kevin Buzzard (Feb 27 2018 at 00:00):

Yeah we got it working in the end.

Kenny Lau (Feb 27 2018 at 00:01):

nice

Kevin Buzzard (Feb 27 2018 at 08:05):

I wanted to knock up a quick "proof by induction" example for pedagogical reasons and I choose to prove that the sum of the first n integers (or first n+1 integers for you CS guys) equalled n(n+1)/2.

Kevin Buzzard (Feb 27 2018 at 08:06):

I had to decide on my infrastructure so I went for list.sum (list.range (succ n))

Kevin Buzzard (Feb 27 2018 at 08:07):

but then I realised that to apply the inductive step I needed some result of the form "the sum from 0 to n+1 of f(i) equals (the sum from 0 to n of f(i)) + f(n+1)"

Kevin Buzzard (Feb 27 2018 at 08:07):

and I couldn't spot this immediately in list, which made me wonder whether I was using the wrong inductive type to do this.

Gabriel Ebner (Feb 27 2018 at 08:12):

I also get the feeling that this issue got worse. It seems like there is a new memory leak in server mode. @Sebastian Ullrich Any ideas?

Simon Hudon (Feb 27 2018 at 08:13):

(moving to (no topic))

Sebastian Ullrich (Feb 27 2018 at 08:35):

It's certainly annoying. Not sure how to debug something like this - first find a test case like a script doing trivial edits, confirm the issue, and then...?

Gabriel Ebner (Feb 27 2018 at 08:40):

The "human test script" that I used was: open default.lean, change something in init/core.lean, go back to default.lean, repeat. Yes, debugging this is not easy. Since this seems to be a regression, we can try to bisect it.

Gabriel Ebner (Feb 27 2018 at 08:41):

Last time we had such an issue, I spent hours in valgrind+gdb, chasing pointers to see which data is alive at the moment. Unfortunately none of the memory issues were real leaks in the past, the memory was always reachable via caches/etc.

Mario Carneiro (Feb 27 2018 at 08:43):

What about opportunistic cache reclaiming?

Gabriel Ebner (Feb 27 2018 at 08:44):

We kill our child threads if they are idle for a few milliseconds, and their caches die with them. This was sufficient in the past.

Sebastian Ullrich (Feb 27 2018 at 08:55):

@Gabriel Ebner Have you ever used massif before? Maybe I should give it a try.

Gabriel Ebner (Feb 27 2018 at 09:29):

I don't think so. I mainly used the memcheck monitor commands.

Sebastian Ullrich (Mar 07 2018 at 19:28):

@Gabriel Ebner There seems to be a shared_ptr cycle, assuming valgrind isn't lying:

==13508== 4,260,044 (272 direct, 4,259,772 indirect) bytes in 1 blocks are definitely lost in loss record 13,808 of 13,808
==13508==    at 0x4C2E0EF: operator new(unsigned long) (in /usr/lib/valgrind/vgpreload_memcheck-amd64-linux.so)
==13508==    by 0xC0FFA3: __gnu_cxx::new_allocator<std::_Sp_counted_ptr_inplace<lean::module_info, std::allocator<lean::module_info>, (__gnu_cxx::_Lock_policy)2> >::allocate(unsigned long, void const*) (new_allocator.h:104)
...
==13508==    by 0xDBE2A8: std::shared_ptr<lean::module_info> std::make_shared<lean::module_info, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, lean::module_src, long&>(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >&&, lean::module_src&&, long&) (shared_ptr.h:636)
==13508==    by 0xDBA2BD: lean::fs_module_vfs::load_module(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, bool) (module_mgr.cpp:446)
==13508==    by 0xBF4BEE: lean::server::load_module(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, bool) (server.cpp:699)
==13508==    by 0xDB70AB: lean::module_mgr::build_module(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, bool, lean::rb_tree<lean::name, lean::name_quick_cmp>) (module_mgr.cpp:157)

Gabriel Ebner (Mar 08 2018 at 07:16):

@Sebastian Ullrich I think the module_mgr refactoring accidentally created a cyclic reference: https://github.com/leanprover/lean/commit/e8c057f1de65917ebe3501b81b8e09a198af25ff#diff-958f55d9f5b994ecd756301253778733R279

Gabriel Ebner (Mar 08 2018 at 07:16):

diff --git a/src/library/module_mgr.cpp b/src/library/module_mgr.cpp
index c0a0cc8ec..0c961b1a2 100644
--- a/src/library/module_mgr.cpp
+++ b/src/library/module_mgr.cpp
@@ -269,6 +269,7 @@ void module_mgr::build_lean(std::shared_ptr<module_info> const & mod, name_set c
     }
     auto initial_env = m_initial_env;
+    auto id = mod->m_id;
     mod->m_result = map<module_info::parse_result>(
         get_end(snapshots),
         [=](module_parser_result const & res) {
@@ -276,7 +277,7 @@ void module_mgr::build_lean(std::shared_ptr<module_info> const & mod, name_set c
             lean_always_assert(res.m_snapshot_at_end);
             parse_res.m_loaded_module = cache_preimported_env(
-                    export_module(res.m_snapshot_at_end->m_env, mod->m_id),
+                    export_module(res.m_snapshot_at_end->m_env, id),
                     initial_env, [=] { return ldr; });
             parse_res.m_opts = res.m_snapshot_at_end->m_options;
diff --git a/src/library/tactic/tactic_state.cpp b/src/library/tactic/tactic_state.cpp
index d134ba745..5b9676729 100644
--- a/src/library/tactic/tactic_state.cpp
+++ b/src/library/tactic/tactic_state.cpp
@@ -842,11 +842,11 @@ vm_obj io_run_tactic(vm_obj const &, vm_obj const & tac, vm_obj const &) {
     tactic_state s = mk_tactic_state_for(vm.env(), vm.get_options(), "_io_run_tactic",
                                          metavar_context(), local_context(), mk_true());
     vm_obj r = invoke(tac, to_obj(s));
-    if (tactic::is_result_success(r)) {
-        return mk_io_result(tactic::get_result_value(r));
-    } else {
+    // if (tactic::is_result_success(r)) {
+    //     return mk_io_result(tactic::get_result_value(r));
+    // } else {
         return mk_io_failure("tactic failed"); // TODO(Leo): improve exception message
-    }
+    // }
 }
 unsigned tactic_user_state::alloc(vm_obj const & v) {

Gabriel Ebner (Mar 08 2018 at 07:18):

Although I still get leaks when working on the core library. Do you have an automated test for the leak?

Sebastian Ullrich (Mar 08 2018 at 08:52):

@Gabriel Ebner Nice, now memcheck reports 0 lost! How did you even find that?

Sebastian Ullrich (Mar 08 2018 at 08:56):

I'm using a --server file with the following content on a clean-oleaned tree

{"seq_num": 0, "command": "roi", "mode": "open-files", "files": [{"file_name":"library/init/core.lean", "ranges":[]}]}
{"seq_num": 0, "command": "sync", "file_name": "library/init/core.lean"}
{"seq_num": 0, "command": "sync_output"}

Sebastian Ullrich (Mar 08 2018 at 09:49):

Doing 2 trivial changes to core.lean, before vs after fix

Sebastian Ullrich (Mar 08 2018 at 09:49):

The humps should signify the edits

Sebastian Ullrich (Mar 09 2018 at 10:20):

@Gabriel Ebner By the way, I found a "leak" in that we don't reset the memory pools of the main thread until the very end. But that should only contribute a few MB in the long run I suppose.


Last updated: Dec 20 2023 at 11:08 UTC