Zulip Chat Archive

Stream: Machine Learning for Theorem Proving

Topic: OpenAI "make your own GPT"

Kevin Buzzard (Nov 07 2023 at 22:21):

Lots of people on Twitter going on about this new "make your own language model" feature which OpenAI just demoed a day or so ago. What happens if you attempt to make an algebraic geometry chatbot e.g. by uploading a ton of stuff from the Stacks Project? Presumably these things are still hallucinating left right and centre? Is there really any breakthrough here or is it just about OpenAI enabling users to fine tune a model?

Adam Topaz (Nov 07 2023 at 22:32):

I don't know whether openai is actually finetuning anything with this feature. As far as I understand, it just asks GPT to make a system prompt, and maybe provides some integration with a vector database for retrieval augmented generation (based on the documents you upload). OTOH, I think they did just introduce the ability to fine-tune GPT4.

Adam Topaz (Nov 07 2023 at 22:35):

Indeed:
Screenshot-2023-11-07-at-15-35-00-OpenAI-Platform.png

Kevin Buzzard (Nov 07 2023 at 22:40):

Can you upload the entireity of the stacks project?

Adam Topaz (Nov 07 2023 at 22:41):

I don't know. I did hear that they charge quite a lot for storage with this feature...

Kevin Buzzard (Nov 07 2023 at 22:44):

I only ask because I'm giving a talk to a bunch of undergrads in Edinburgh on Thursday and it's always nice to talk about recent developments.

Adam Topaz (Nov 07 2023 at 22:45):

I don't know whether they even rolled out this feature to the public yet.

Adam Topaz (Nov 07 2023 at 22:46):

But at least it seems that I have access to the "assistant" variant at the API level.

Junyan Xu (Nov 08 2023 at 00:55):

Here's what I get by pasting the source of the Schemes chapter (173,208 characters) of the Stacks project into Anthropic's Claude 2, which has a 100k-token context length, without asking any questions (takes 30s-1min):
image.png

Adam Topaz (Nov 08 2023 at 01:00):

FWIW one of openai’s other announcements yesterday was an increased context window for gpt4. I think over 100K tokens?

Junyan Xu (Nov 08 2023 at 01:00):

Yeah! 128K iirc

Junyan Xu (Nov 08 2023 at 01:03):

You could also try https://www.reddit.com/r/LocalLLaMA/comments/166je92/llama2_with_128k_context_length_thanks_to_yarn/

Junyan Xu (Nov 08 2023 at 01:05):

I was told the Morph people trained on some textbooks, maybe Stacks is there?

Junyan Xu (Nov 08 2023 at 01:07):

There's a 128K version of Mistral 7B as well: https://github.com/jquesnelle/yarn#mistral
and Code Llama natively supports 100K.

Min-Hsien Weng (Nov 14 2023 at 09:48):

Junyan Xu said:

Here's what I get by pasting the source of the Schemes chapter (173,208 characters) of the Stacks project into Anthropic's Claude 2, which has a 100k-token context length, without asking any questions (takes 30s-1min):
image.png

Input length is a common pain point for transformer-based models. The original BERT model takes 512 tokens as input, but the Longformer model modified its attention mechanism to handle longer inputs.
https://huggingface.co/docs/transformers/model_doc/longformer

However, my personal experience with the Longformer model was not as impressive as with the LLaMa2-70B transformer, although both have the same input length (4096 tokens).
https://huggingface.co/blog/llama2

Perhaps the capabilities of LLMs depend on more than just context length, e.g. how we fine-tune the model (reinforcement learning with human feedback). Note that further experiments are required to confirm it.

Abhishek Anand (Nov 14 2023 at 23:15):

is there some technical documentation going somewhat deep (e.g. 1-2 hour read) on how LLMs are fine tuned using reinforcement learning? does that tweak the parameters of the underlying model that predicts the next word, or is that another neural network or other computational layer on top?

Min-Hsien Weng (Nov 16 2023 at 04:17):

Abhishek Anand said:

is there some technical documentation going somewhat deep (e.g. 1-2 hour read) on how LLMs are fine tuned using reinforcement learning? does that tweak the parameters of the underlying model that predicts the next word, or is that another neural network or other computational layer on top?

This blog about reinforcement learning from human feedback (RLHF) is very useful. It explains how to improve large language models (LLMs) using RL without using too much technical terms. Highly recommended.
https://huggingface.co/blog/rlhf

RLHF constructs a reward model that accurately reflects human preferences and utilizes this model to fine-tune LLMs, enabling them to generate text that closely aligns with human feedback.

RLHF's fine-tuning method, using reinforcement learning, is quite different from other fine-tuning methods. It keeps the original LLM unchanged while making a copy, fine-tuning and updating the copy one. LLMs have a large number of parameters, so only some are updated during fine-tuning to improve efficiency. The update rule is to maximize the reward of the model by updating the parameters.

So RLHF fine-tuning modifies the parameters of LLMs directly, unlike other fine-tuning methods that add new layers to the model.

Zhangir Azerbayev (Nov 16 2023 at 18:08):

Abhishek Anand said:

is there some technical documentation going somewhat deep (e.g. 1-2 hour read) on how LLMs are fine tuned using reinforcement learning? does that tweak the parameters of the underlying model that predicts the next word, or is that another neural network or other computational layer on top?

I highly recommend this talk by John Schulman.

Last updated: May 02 2025 at 03:31 UTC