Similar Posts

Subscribe
Notify of
21 Answers
Oldest
Newest Most Voted
Inline Feedbacks
View all comments
TechnikTim
1 year ago

42 is right.

Please do not use LLM for mathematics….

LLMs and Maths don’t work. For this, there’s a lot of computer on the Internet.

chatgpt and other LLMs should be used exclusively for text or programming and similar. Not for scientific, facts or mathematics. This tool has not been developed. This then works as well as a screw with nem hammer to hit the wall.

Can work and hold, but it doesn’t have to.

CSANecromancer
1 year ago
Reply to  TechnikTim

This then works as well as a screw with nem hammer to hit the wall.

“The master marvels and can’t glau’m, you can also screw with a hammer.” :

diskutant5
3 months ago
Reply to  TechnikTim

What happened after 10 months 😉

TechnikTim
3 months ago
Reply to  diskutant5

?
LLMs still can’t have a math. And that’ll be the same. It doesn’t make any sense to use LLMs for math…

In Mathe, you don’t need to predict the next token but logical sequences. Where are logical sequences? With normal code. This is why chatgpt also uses a kind of “cocket computer addon”. To teach an LLM math would run like this:

What is 1+1?

13

No, 2… What is 1+2?

2

On the basis of the learned, an LLM estimates which token (so simply said word) could come next. Therefore, with smaller LLMs (4b and smaller) there are no reasonable high-quality sentences. You can’t really appreciate it.
You can also take your autocorrection on your phone and often tap the left word. This is exactly what makes an LLM, only that the suggestions are better than that of auto correction.

diskutant5
3 months ago

And why can o3 (and o1) suddenly math and logic, what you just explained, what does not work?

-> For test time computing. Here, “think” the models before they respond. This is the difference to 3.5, 4 and 4o (at ChatGPT). The model has a buffer between question and answer that allows a thinking process. He starts to generate similar problems (Chain of thought (-Prompting).

This principle already uses o1, which is why it is much better in math etc. At o3, the principle was greatly optimized. Now o3 is really a mathematics, physics and coding expert.

Please inform yourself before you spread your outdated (Fehl) information:)

diskutant5
3 months ago

Open AI promotes the o3 model as a step towards AGI. At ARC-AGI, a test to assess how efficient an AI system can acquire new capabilities outside the data on which it was trained, o1 reached a score of between 25 and 32 of 100 percent. The following shall apply: 85 percent as a “human level”. According to Open AI, o3 already reached 87.5% of the points.

[…]

The o3 model is expected to achieve a score of 69.7 percent in the AIME 2024 mathematics test. The model gives the wrong answer only once per test. In scientific questions on PhD level, o3 reached 87.7 percent in the test GPQA Diamond.

https://www.faz.net/pro/digitalwirtschaft/kuenstliche-intelligenz/liveticker-zu-12-days-of-open-ai-sam-altman-stell-open-ais-neues-sprachmodell-o3-vor-faz-110155698.html

I think o3 will have its price, but LLM is LLM. The LLMs are not only better, but also more efficient with time. This is to wait, but as I said – LLM is LLM and this one reaches more than you thought. Is on PhD levels -> Also in Math!

Your information has long been outdated.

diskutant5
3 months ago

Trying o1? Ever heard of o3?

Then go to Google and research. Especially to o3 times please look!

The model seems particularly strong in math and programming. With a codeforces rating of 2727 points, it plays in the top league. In the American Invitational Mathematics Exam test, it also boasted 69.7 percent and even set up a new record at EpochAI’s frontier Math-Benchmark.

https://www.golem.de/news/argumentationsfaehigkeiten-openai-stellung-neue-chatgpt-modellfamilie-o3-vor-2412-191924.html

o3 solves math problems for which professional math experts need hours or even weeks.

So, the time has changed and LLMs have reached the point where they master mathematics, physics and coding better than humans. o3 can also solve logic tasks.

Gehilfling
1 year ago

42 is right.

A proof that you can’t trust in all the AI.

52 = 5 x 8^1 + 2 x 8^0 = 40 + 2 = 42.

TheQ86
1 year ago
Reply to  Gehilfling

ChatGPT is a Language model.It cannot count independently. Therefore it should not be trusted in mathematical results without a counter sample

Gehilfling
1 year ago
Reply to  YaHobby

That’s it.

geheim007b
1 year ago

42 is correct, and this is a great example of how good GTP broadcasts. language and structure so well constructed that you could believe it… and that is also the task of GPT (not computing)

W00dp3ckr
1 year ago

No, it’s not right, but it’s not good at that.

Learner724
1 year ago

42 is correct. Fun Fact: 42 Should the answer be on everything, according to any source

Learner724
1 year ago
Reply to  YaHobby

I’m getting to 52

Learner724
1 year ago
Reply to  YaHobby

What is the quinary system

Learner724
1 year ago

Ahso.. I put 5high2 roof right at the second 12