Zulip Chat Archive
Stream: new members
Topic: Is the Zulip chat data available for noncommercial use?
Yicheng Tao (Apr 08 2024 at 15:36):
I think Zulip chat contains valuable information for Lean users, which turns out to be a great source of data for AI training. I noticed the morph prover already used Zulip data when training their language model. I wonder if I could use these data to train a model for semantic searching, since the search utility of Zulip doesn't always satisfy my need.
This model would only be used for personal or educational reasons. I'm not sure if there is a privacy policy that restrict the use of these data. And do I need to crawl the data myself? Or if Zulip offers archived data like arXiv does?
Jon Eugster (Apr 10 2024 at 08:42):
I know that https://leanprover-community.github.io/archive/ is publicly available, which might be easier for extracting data. I don't know how often that get's updated, apparently the last one was Dec 2023
Yicheng Tao (Apr 11 2024 at 01:02):
Jon Eugster 发言道:
I know that https://leanprover-community.github.io/archive/ is publicly available, which might be easier for extracting data. I don't know how often that get's updated, apparently the last one was Dec 2023
Thanks for your answer. It seems still need some web crawling work, but it may be easier than directly extract data from Zulip(though Zulip seems offer official API for doing so. I haven't tried.). I hope this page is still under maintenance. I'll use these archived data first.
Yicheng Tao (Apr 11 2024 at 01:40):
Oh I found the data at https://github.com/leanprover-community/archive/tree/main/zulip_json. I don't know whether it was generated using https://github.com/zulip/zulip-archive.
Eric Wieser (Apr 11 2024 at 07:38):
It was not, but instead generated with my fork of that repo
Yicheng Tao (May 25 2024 at 08:12):
Eric Wieser 发言道:
It was not, but instead generated with my fork of that repo
I noticed that the data before 2024 contain a lot about lean3. Apparently we need more about lean4. Can you update the archive? It seems that, without the admin identity, I can do little with the zulip API. @Eric Wieser
Last updated: May 02 2025 at 03:31 UTC