Zulip Chat Archive

Stream: general

Topic: zuilp archive bug


ZHAO Jiecheng (Dec 20 2023 at 03:08):

The bug: Topics that start with non-ASCII characters like "✔ ∑' n, n * r ^ n (11 messages, latest: Jul 28 2023 at 13:58)" have inaccessible links in the archive (resulting in a 404 error)

I've found it helpful to discuss Zulip posts with ChatGPT when I'm confused. However, ChatGPT can't directly read Zulip, but it can read the Zulip archive. Unfortunately, there's a bug in the archive, and it's also slightly outdated. I attempted to troubleshoot by checking https://github.com/leanprover-community/leanprover-community.github.io but couldn't resolve it. (I can not find the code do the achieve.) Can anyone assist?

Scott Morrison (Dec 20 2023 at 04:04):

Unfortunately I think now that zulip has web-accessible streams, no one is maintaining the archive anymore.

ZHAO Jiecheng (Dec 20 2023 at 06:15):

Scott Morrison said:

Unfortunately I think now that zulip has web-accessible streams, no one is maintaining the archive anymore.

Unfortunately, even with public access, ChatGPT still can not render the page and read the content...

ZHAO Jiecheng (Dec 20 2023 at 07:50):

@Eric Wieser could you please initiate the archive procedure? And maybe give me some clue to fix the bug. "Topics that start with non-ASCII characters like "✔ ∑' n, n * r ^ n (11 messages, latest: Jul 28 2023 at 13:58)" have inaccessible links in the archive (resulting in a 404 error)"

ZHAO Jiecheng (Dec 20 2023 at 08:23):

Another question: Why the last commit of https://github.com/leanprover-community/archive is in Feb but the last achieve time is in Aug? Shouldn't the contents in github.io be the contents in the repo?

Eric Wieser (Dec 20 2023 at 08:58):

ZHAO Jiecheng said:

Another question: Why the last commit of https://github.com/leanprover-community/archive is in Feb but the last achieve time is in Aug? Shouldn't the contents in github.io be the contents in the repo?

No, this is deliberate as it avoids using more git storage

Eric Wieser (Dec 20 2023 at 08:59):

The CI job uses the repo as a starting point and downloads messages that are later than it

Eric Wieser (Dec 20 2023 at 08:59):

There is a checkbox when running CI for whether to update the source repo afterwards

Eric Wieser (Dec 20 2023 at 09:00):

I think there is a known bug with url encoding in the archive, but changing the encoding scheme would break any existing links to the archive

ZHAO Jiecheng (Dec 20 2023 at 09:42):

Eric Wieser said:

I think there is a known bug with url encoding in the archive, but changing the encoding scheme would break any existing links to the archive

For pages with broken links, it's typically safe to update their URL encoding. The main concern is to ensure that the functional parts of the links remain the same while correcting the encoding problems.

Eric Wieser (Dec 20 2023 at 09:43):

Right, but this rules out updating to a new version of the archive software upstream, which both fixes the bug and changes the scheme

ZHAO Jiecheng (Dec 20 2023 at 09:46):

Eric Wieser said:

Right, but this rules out updating to a new version of the archive software upstream, which both fixes the bug and changes the scheme

Do you mean the update of https://github.com/zulip/zulip-archive ?

Eric Wieser (Dec 20 2023 at 09:47):

Yes, we are using a much older version hosted at my fork

ZHAO Jiecheng (Dec 20 2023 at 09:49):

Eric Wieser said:

Yes, we are using a much older version hosted at my fork

Do you mean this one https://github.com/eric-wieser/zulip-archive ? Maybe I can upgrade it.

Eric Wieser (Dec 20 2023 at 09:56):

Upgrading it breaks all the links

Utensil Song (Dec 20 2023 at 10:53):

Maybe it's viable to change the existing links in the archive to the new scheme?

Eric Wieser (Dec 20 2023 at 11:07):

I've kicked off a CI job to update the archive data


Last updated: Dec 20 2023 at 11:08 UTC