Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Given a set of instructions, an instruction fine-tuned/aligned LLM is able (conditional on size and training quality) to reason through a set of steps to produce a desired output.

This is plainly wrong. The model's growing size makes it better at guessing the outcome of a reasoning task, but little to no actual reasoning is performed.

It's trivial to prove this as well, as LLMs will still fail miserably at (larger) math problems that even basic computer algebra systems will handle with ease.



> The model's growing size makes it better at guessing the outcome of a reasoning task, but little to no actual reasoning is performed.

If there's no observable difference between the behaviours, why not call it as the post did?

> LLMs will still fail miserably at (larger) math problems

They're neither trained on such problems, nor is that a goal for LLMs. They can however tell you how to convert that problem into steps that can be run in an algebra system.


You're missing the point, there is a difference; The answers are often wrong, and more-wrong the more complex the question gets.

They're only able to answer simple (relative-to-the-model's-size) straightforward reasoning questions. Which is a nice party trick, but not broadly useful.

They can however tell you how to convert that problem into steps that can be run in an algebra system.

Usually they can't do that very well either. Converting a problem from one description to another is algebraic reasoning, subject to the issues already mentioned.

What they can do is summarize general instructions and documentation, provided adequate training data was available.

They're neither trained on such problems, nor is that a goal for LLMs

Yes. But LLMs keep being pushed for tasks that heavily involve abstract reasoning, which is dangerous as they're unsuited for it. (E.g. Any code generation that isn't mere empty boilerplate.)


OpenAI is experimenting with process supervision, which corrects many of these issues:

Problem Let x8+3x4−4=p1(x)p2(x)⋯pk(x),x8+3x4−4=p1(x)p2(x)⋯pk(x), where each non-constant polynomial pi(x)pi(x) is monic with integer coefficients, and cannot be factored further over the integers. Compute p1(1)+p2(1)+⋯+pk(1).p1(1)+p2(1)+⋯+pk(1).

Model attempt

Here, GPT-4 successfully performs a complex series of polynomial factorizations. The use of the Sophie-Germain identity in step 5 is an important step that could be considered insightful.

1

I notice that the given polynomial has even degree and only even powers of xx, so I can try to make a substitution to simplify it.

2

Let y=x4y=x4, then the polynomial becomes y2+3y−4y2+3y−4, which is a quadratic equation.

3

I can factor this quadratic equation as (y+4)(y−1)(y+4)(y−1), so the original polynomial is (x4+4)(x4−1)(x4+4)(x4−1).

4

Now I need to factor each of these factors further, if possible, over the integers.

5

For the first factor, x4+4x4+4, I recall the Sophie Germain identity, which says that a4+4b4=(a2+2b2+2ab)(a2+2b2−2ab)a4+4b4=(a2+2b2+2ab)(a2+2b2−2ab).

6

Applying this identity with a=xa=x and b=1b=1, I get x4+4=(x2+2x+2)(x2−2x+2)x4+4=(x2+2x+2)(x2−2x+2).

7

For the second factor, x4−1x4−1, I can use the difference of squares formula, which says that a2−b2=(a+b)(a−b)a2−b2=(a+b)(a−b).

8

Applying this formula with a=x2a=x2 and b=1b=1, I get x4−1=(x2+1)(x2−1)x4−1=(x2+1)(x2−1).

9

Then I can apply the difference of squares formula again with a=xa=x and b=1b=1, to get x2−1=(x+1)(x−1)x2−1=(x+1)(x−1).

10

So the original polynomial is (x4+4)(x4−1)=(x2+2x+2)(x2−2x+2)(x2+1)(x+1)(x−1)(x4+4)(x4−1)=(x2+2x+2)(x2−2x+2)(x2+1)(x+1)(x−1).

11

These are all the irreducible factors over the integers, so p1(x)=x2+2x+2p1(x)=x2+2x+2, p2(x)=x2−2x+2p2(x)=x2−2x+2, p3(x)=x2+1p3(x)=x2+1, p4(x)=x+1p4(x)=x+1, and p5(x)=x−1p5(x)=x−1.

12

To find the sum of their values at x=1x=1, I just need to plug in x=1x=1 into each factor and add them up.

13

I get p1(1)=12+2⋅1+2=5p1(1)=12+2⋅1+2=5, p2(1)=12−2⋅1+2=1p2(1)=12−2⋅1+2=1, p3(1)=12+1=2p3(1)=12+1=2, p4(1)=1+1=2p4(1)=1+1=2, and p5(1)=1−1=0p5(1)=1−1=0.

14

So the final answer is 5+1+2+2+0=105+1+2+2+0=10.

15

Answer: 10


It's an impressive result, but shouldn't be seen as "correction". Framing it as a (drastic) reduction in mistakes is more useful here.

If the model is productionized (read: dumbed down so it isn't as expensive to run), the reasoning abilities drastically decline again.

And these reasoning abilities are still around a language model, rather than around abstract models.

This is a very effective party trick for general math, whose language quite directly maps onto these abstract concepts, but there are some holes. Information about e.g. which values may be zero isn't encoded in the language, and so this approach is liable to blundering around division-by-zero issues.

If you want a particular example to toy around with, LLMs are not fond of quaternions and their conversion to other representations.


Which means they're this close in being able to reach out to an algebra system and run the steps and return you the result. I was just talking about this problem with someone the other day - how can it recognize that it doesn't have the answer but knows where it can get data so that it can form an answer. This seems to be the path Google is taking.


There's some argument to be made that a form of reasoning happens in a roundabout way when the AI is told to explain it's reasoning.

For example if you tell it "Do <thing>" and then open a new context and say "Do <thing>, explain your reasoning beforehand." you will often get a more accurate response.

Granted, it's not that any "Hmm, let me think about that." Deep Thought reasoning occurs, but simply that predicting what the reasoning would look like and then predicting what comes after that reasoning results in a more accurate - and ironically, reasoned - response.

Kinda funny actually, it's a bit like how in Hitchiker's Guide they just had to tell the probability machine to calculate the odds of an improbability drive in order to create it.


This is where the terminology becomes a bit annoying, but there is a key difference in the kinds of reasoning at work here.

When you ask LLMs to provide a reasoning, the actual reasoning performed is linguistic; The LLM has (is) a model about language and performs some (limited) reasoning on that model to get an output.

But that is explicitly different from reasoning about the abstract question at hand, thus the answer is mostly a guess.

The key difference to observe is that "semantic reasoners" like computer algebra or prolog, always maintain correctness within the axioms provided. They may slow down significantly as questions get more complex, but they do not start providing wrong answers. Computers are flawless mathematicians, provided they are programmed correctly.

LLMs do provide increasingly more-wrong answers as the question gets more complex. Thus we can observe that LLMs do not abstractly reason about the question and it's model.


>Thus we can observe that LLMs do not abstractly reason about the question and it's model.

Your conclusion makes no sense. Humans provide increasingly wrong answers as questions get more complex too. Jumping from that to "incapable of abstract reasoning" is silly. You have not "trivially proven" anything at all

>The LLM has (is) a model about language and performs some (limited) reasoning on that model to get an output.

LLMs generalize to non linguistic patterns.

https://general-pattern-machines.github.io/


Humans provide increasingly wrong answers as questions get more complex too.

Human this, Human that. LLMs aren't humans. "My model is crap but the human brain isn't very good at this either" is irrelevant when we have machines that are not only very good at these tasks but almost perfect at them.

Humans make such mistakes precisely because they are not perfect reasoning machines. To compare LLMs to humans is not only disingenuous, but proves my point.

(And no, I will not humour you with an argument about how the amount of wrong answers is drastically lower from human mathematicians)

Jumping from that to "incapable of abstract reasoning" is silly.

They are language models. It is explicitly what they are designed to do.

If these LLMs are not, as I claim, reasoning on language rather than the abstract model of the query, then how come they fail miserably in exactly the ways you would expect where that the case?

LLMs generalize to non linguistic patterns.

Yes, congratulations, if you turn a problem into a linguistic one LLMs can deal with them. This does not in any way go against what I said about the capabilities of LLMs.

The same levels of actual abstract reasoning can be achieved on a graphing calculator running off literal potatoes.


>Human this, Human that. LLMs aren't humans.

You said you trivially proved something and made up nonsensical lines of reasoning to justify it. If your "proof" can't port to Humans then it's not proof. You are just rambling.

>Humans make such mistakes precisely because they are not perfect reasoning machines.

Nobody is calling LLMs perfect reasoning machines. Your "point" was that they don't reason at all which of none of your ramblings has been able to "prove".

>If these LLMs are not, as I claim, reasoning on language rather than the abstract model of the query, then how come they fail miserably in exactly the ways you would expect where that the case?

They don't. The idea that you must make no mistake reasoning before you can be considered to be reasoning has no ground.

>LLMs generalize to non linguistic patterns. Yes, congratulations, if you turn a problem into a linguistic one LLMs can deal with them.

Can you read ? Did you even bother looking at the link? LLMs don't need patterns to be linguistic to reason over them lol. None of those patterns are turned linguistic. Some of them are arbitrary numbers that resemble nothing like the data they've been trained on.


If your "proof" can't port to Humans then it's not proof

Learn to take a hint. I'm not going to argue this on human terms because you're playing a dumb um-akshually game.

Computer reasoning systems can solve vastly more complex problems perfectly. Expert mathematicians can solve vastly more complex problems with only minimally increased errors. The ability of LLMs to solve reasoning problems completely disintegrates when the problems get more complex.

Trying to argue that LLMs are alike humans because of you can put these three into the buckets of "No mistakes" and "Some mistakes" is ridiculous.

Nobody is calling LLMs perfect reasoning machines.

Yes.

You said humans make mistakes, my point here is, humans make mistakes precisely because they stop doing reasoning and start doing blind pattern matching estimation of the answer.

The idea that you must make no mistake reasoning before you can be considered to be reasoning has no ground.

Reading comprehension.

I did not say no mistakes. I said that the failure pattern follows that of estimated guesses; Rapidly increasing errors as the size of the problem increases.

Whereas with computer reasoning, the rate of errors does not increase at all. And with (expert) humans the rate only goes up a little.

Did you even bother looking at the link?

You are missing the point.

I am not referring to literally English or any other language. I'm referring to the structure of language problems, which is vastly simpler than any moderately complex math or programming problem.

To more explicitly spell out the reason for my unimpressed-ness: They trained a pattern-repeating-machine and found that it will repeat some of their patterns, some of which were patterns trained on.

This does not demonstrate the ability to reason abstractly about new models, so I do not care.


Seems like a blurry line between "reason" and "guessing."

Kind of like how an educated guess by a professional is often more accurate than a well reasoned opinion of a layman.

The professional may not have reasoned it so much as intuited, but within that intuition is a lot of wisdom.

I suppose "predicting" is a more precise word than guessing or reasoning.

Guessing implies an arbitrary nature, reasoning implies understanding the concepts at some level.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: