[D66] STaR: 'Bootstrapping Reasoning With Reasoning'

Wed Mar 6 09:22:51 CET 2024

L.S.

Een kantiaanse kritiek van de pure (rationale) rede zal wel niet aan 
Zelikman en de googlegasten/OpenAI besteed zijn...

Het is niet toevallig dat "treason en reason" slechts 1 letter van 
elkaar verschillen in het engels.

Hoe meer pre-AGI systemen zichzelf kunnen leren redeneren, hoe minder we 
op de rede kunnen vertrouwen. Reguleren middels wetgeving is onmogelijk 
want je kunt nu eenmaal onze stralende AI-zon aan de hemel niet 
reguleren, net zomin als je de Goden kan reguleren...

R.O.

--

https://arxiv.org/pdf/2203.14465.pdf

STaR: Self-Taught Reasoner
Bootstrapping Reasoning With Reasoning
Eric Zelikman∗1, Yuhuai Wu∗12, Jesse Mu1, Noah D. Goodman1
1Department of Computer Science, Stanford University
2 Google Research
{ezelikman, yuhuai, muj, ngoodman}@stanford.edu

Abstract
Generating step-by-step "chain-of-thought" rationales improves language 
model performance on complex reasoning tasks like mathematics or 
commonsense question-answering. However, inducing language model 
rationale generation currently requires either constructing massive 
rationale datasets or sacrificing accuracy by using only few-shot 
inference. We propose a technique to iteratively leverage a small number 
of rationale examples and a large dataset without rationales, to boot-
strap the ability to perform successively more complex reasoning. This 
technique, the "Self-Taught Reasoner" (STaR), relies on a simple loop: 
generate rationales to answer many questions, prompted with a few 
rationale examples; if the generated answers are wrong, try again to 
generate a rationale given the correct answer; fine-tune on all the 
rationales that ultimately yielded correct answers; repeat. We show
that STaR significantly improves performance on multiple datasets 
compared to a model fine-tuned to directly predict final answers, and 
performs comparably to fine-tuning a 30× larger state-of-the-art 
language model on CommensenseQA. Thus, STaR lets a model improve itself 
by learning from its own generated reasoning

--

digitaltrends.com
What is Project Q*? The mysterious AI breakthrough, explained | Digital 
Trends
By Luke Larsen November 24, 2023
10–13 minutes

What is OpenAI Q*? The mysterious breakthrough that could ‘threaten 
humanity’

Among the whirlwind of speculation around the sudden firing and 
reinstatement of OpenAI CEO Sam Altman, there’s been one central 
question mark at the heart of the controversy. Why was Altman fired by 
the board to begin with?

Contents

     What is Project Q*?
     Was Q* really why Sam Altman was fired?
     Is it really the beginning of AGI?

We may finally have part of the answer, and it has to do with the 
handling of a mysterious OpenAI project with the internal codename, “Q*” 
— or Q Star. Information is limited, but here’s everything we know about 
the potentially game-changing developments so far.

Before moving forward, it should be noted that all the details about 
Project Q* — including its existence — comes from some fresh reports 
following the drama around Altman’s firing. Reporters at Reuters said on 
November 22 that it had been given the information by “two people 
familiar with the matter,” providing a peek behind the curtain of what 
was happening internally in the weeks leading up to the firing.

According to the article, Project Q* was a new model that excelled in 
learning and performing mathematics. It was still reportedly only at the 
level of solving grade-school mathematics, but as a beginning point, it 
looked promising for demonstrating a previously unseen intelligence from 
the researchers involved.

Seems harmless enough, right? Well, not so fast. The existence of Q* was 
reportedly scary enough to prompt several staff researchers to write a 
letter to the board to raise the alarm about the project, claiming it 
could “threaten humanity.”

On the other hand, other attempts at explaining Q* aren’t quite as novel 
— and certainly aren’t so earth-shattering. The Chief AI scientist at 
Meta, Yann LeCun, tweeted that Q* has to do with replacing 
“auto-regressive token prediction with planning” as a way of improving 
LLM (large language model) reliability. LeCun says all of OpenAI’s 
competitors have been working on, and that OpenAI made a specific hire 
to address this problem.

LeCun’s point doesn’t seem to be that such a development isn’t important 
— but that it’s not some unknown development that no other AI 
researchers aren’t currently discussing. Then again, in the replies to 
this tweet, LeCun is dismissive of Altman, saying he has a “long history 
of self-delusion” and suggests that the reporting around Q* don’t 
convince him that a significant advancement in the problem of planning 
in learned models has been made.
Was Q* really why Sam Altman was fired?
Sam Altman at the OpenAI developer conference.
OpenAI

 From the very beginning of the speculation around the firing of Sam 
Altman, one of the chief suspects was his approach to safetyism. Altman 
was the one who pushed OpenAI to turn away from its roots as a 
non-profit and move toward commercialization. This started with the 
public launch of ChatGPT and the eventual roll-out of ChatGPT Plus, both 
of which kickstarted this new era of generative AI, causing companies 
like Google to go public with their technology as well.

The ethical and safety concerns around this technology being publicly 
available have always been present, despite all the excitement behind 
how it has already changed the world. Larger concerns about how fast the 
technology was developing have been well-documented as well, especially 
with the jump from GPT-3.5 to GPT-4. Some think the technology is moving 
too fast without enough regulation or oversight, and according to the 
Reuters report, “commercializing advances before understanding the 
consequences” was listed as one of the reasons for Altman’s initial firing.

Although we don’t know if Altman was specifically mentioned in the 
letter about Q* mentioned above, it’s also being cited as one of the 
reasons for the board’s decision to fire Altman — which has since been 
reversed.

It’s worth mentioning that just days before he was fired, Altman 
mentioned at an AI summit that he was “in the room” a couple of weeks 
earlier when a major “frontier of discovery” was pushed forward. The 
timing checks out that this may have been in reference to a breakthrough 
in Q*, and if so, would confirm Altman’s intimate involvement in the 
project.

Putting the pieces together, it seems like concerns about 
commercialization have been present since the beginning, and his 
treatment of Q* was merely the final straw. The fact that the board was 
so concerned about the rapid development (and perhaps Altman’s own 
attitude toward it) that it would fire its all-star CEO is shocking.

To douse some of the speculation, The Verge was reportedly told by “a 
person familiar with the matter” that the supposed letter about Q* was 
never received by the board, and that the “company’s research progress” 
wasn’t a reason for Altman’s firing.

We’ll need to wait for some additional reporting to come to the surface 
before we ever have a proper explanation for all the drama.
Is it really the beginning of AGI?

AGI, which stands for artificial general intelligence, is where OpenAI 
has been headed from the beginning. Though the term means different 
things to different people, OpenAI has always defined AGI as “autonomous 
systems that surpass humans in most economically valuable tasks,” as the 
Reuters report says. Nothing about that definition has reference to 
“self-aware systems,” which is often what presume AGI means.

Still, on the surface, advances in AI mathematics might not seem like a 
big step in that direction. After all, we’ve had computers helping us 
with math for many decades now. But the powers given to Q* aren’t just a 
calculator. Having learned literacy in math requires humanlike logic and 
reasoning, and researchers seem to think it’s a big deal. With writing 
and language, an LLM is allowed to be more fluid in its answers and 
responses, often giving a wide range of answers to questions and 
prompts. But math is the exact opposite, where often there is just a 
single correct answer to a problem. The Reuters report suggests that AI 
researchers believe this kind of intelligence could even be “applied to 
novel scientific research.”

Obviously, Q* seems to still be in the beginnings of development, but it 
does appear to be the biggest advancement we’ve seen since GPT-4. If the 
hype is to be believed, it should certainly be considered a major step 
in the road toward AGI, at least as it’s defined by OpenAI. Depending on 
your perspective, that’s either cause for optimistic excitement or 
existential dread.

But again, let’s not forget the remarks from LeCun mentioned above. 
Whatever Q* is, it’s probably safe to assume that OpenAI isn’t the only 
research lab attempt the development. And if it ends up not actually 
being the reason for Altman’s firing as The Verge report insists, maybe 
it’s not as big of a deal as the Reuters report claims.

Images generated by artificial intelligence (AI) have been causing 
plenty of consternation in recent months, with people understandably 
worried that they could be used to spread misinformation and deceive the 
public. Now, ChatGPT maker OpenAI is apparently working on a tool that 
can detect AI-generated images with 99% accuracy.

According to Bloomberg, OpenAI’s tool is designed to root out user-made 
pictures created by its own Dall-E 3 image generator. Speaking at the 
Wall Street Journal’s Tech Live event, Mira Murati, chief technology 
officer at OpenAI, claimed the tool is “99% reliable.” While the tech is 
being tested internally, there’s no release date yet.

Most people distrust AI and want regulation, says new survey

A person's hand holding a smartphone. The smartphone is showing the 
website for the ChatGPT generative AI.	

Most American adults do not trust artificial intelligence (AI) tools 
like ChatGPT and worry about their potential misuse, a new survey has 
found. It suggests that the frequent scandals surrounding AI-created 
malware and disinformation are taking their toll and that the public 
might be increasingly receptive to ideas of AI regulation.

The survey from the MITRE Corporation and the Harris Poll claims that 
just 39% of 2,063 U.S. adults polled believe that today’s AI tech is 
“safe and secure,” a drop of 9% from when the two firms conducted their 
last survey in November 2022.

GPT-4: how to use the AI chatbot that puts ChatGPT to shame

A laptop opened to the ChatGPT website.	

People were in awe when ChatGPT came out, impressed by its natural 
language abilities as an AI chatbot. But when the highly anticipated 
GPT-4 large language model came out, it blew the lid off what we thought 
was possible with AI, with some calling it the early glimpses of AGI 
(artificial general intelligence).

The creator of the model, OpenAI, calls it the company's "most advanced 
system, producing safer and more useful responses." Here's everything 
you need to know about it, including how to use it and what it can do.

What is GPT-4?
GPT-4 is a new language model created by OpenAI that can generate text 
that is similar to human speech. It advances the technology used by 
ChatGPT, which is currently based on GPT-3.5. GPT is the acronym for 
Generative Pre-trained Transformer, a deep learning technology that uses 
artificial neural networks to write like a human.

--

technologyreview.com
Unpacking the hype around OpenAI’s rumored new Q* model
Melissa Heikkilä
6–7 minutes

This story is from The Algorithm, our weekly newsletter on AI. To get 
stories like this in your inbox first, sign up here.

Ever since last week’s dramatic events at OpenAI, the rumor mill has 
been in overdrive about why the company’s chief scientific officer, Ilya 
Sutskever, and its board decided to oust CEO Sam Altman.

While we still don’t know all the details, there have been reports that 
researchers at OpenAI had made a “breakthrough” in AI that had alarmed 
staff members. Reuters and The Information both report that researchers 
had come up with a new way to make powerful AI systems and had created a 
new model, called Q* (pronounced Q star), that was able to perform 
grade-school-level math. According to the people who spoke to Reuters, 
some at OpenAI believe this could be a milestone in the company’s quest 
to build artificial general intelligence, a much-hyped concept referring 
to an AI system that is smarter than humans. The company declined to 
comment on Q*.

Social media is full of speculation and excessive hype, so I called some 
experts to find out how big a deal any breakthrough in math and AI would 
really be.

Researchers have for years tried to get AI models to solve math 
problems. Language models like ChatGPT and GPT-4 can do some math, but 
not very well or reliably. We currently don’t have the algorithms or 
even the right architectures to be able to solve math problems reliably 
using AI, says Wenda Li, an AI lecturer at the University of Edinburgh. 
Deep learning and transformers (a kind of neural network), which is what 
language models use, are excellent at recognizing patterns, but that 
alone is likely not enough, Li adds.

Math is a benchmark for reasoning, Li says. A machine that is able to 
reason about mathematics, could, in theory, be able to learn to do other 
tasks that build on existing information, such as writing computer code 
or drawing conclusions from a news article. Math is a particularly hard 
challenge because it requires AI models to have the capacity to reason 
and to really understand what they are dealing with.

A generative AI system that could reliably do math would need to have a 
really firm grasp on concrete definitions of particular concepts that 
can get very abstract. A lot of math problems also require some level of 
planning over multiple steps, says Katie Collins, a PhD researcher at 
the University of Cambridge, who specializes in math and AI. Indeed, 
Yann LeCun, chief AI scientist at Meta, posted on X and LinkedIn over 
the weekend that he thinks Q* is likely to be “OpenAI attempts at planning.”

People who worry about whether AI poses an existential risk to humans, 
one of OpenAI's founding concerns, fear that such capabilities might 
lead to rogue AI. Safety concerns might arise if such AI systems are 
allowed to set their own goals and start to interface with a real 
physical or digital world in some ways, says Collins.

But while math capability might take us a step closer to more powerful 
AI systems, solving these sorts of math problems doesn’t signal the 
birth of a superintelligence.

“I don’t think it immediately gets us to AGI or scary situations,” says 
Collins.  It’s also very important to underline what kind of math 
problems AI is solving, she adds.

“Solving elementary-school math problems is very, very different from 
pushing the boundaries of mathematics at the level of something a Fields 
medalist can do,” says Collins, referring to a top prize in mathematics.

Machine-learning research has focused on solving elementary-school 
problems, but state-of-the-art AI systems haven’t fully cracked this 
challenge yet. Some AI models fail on really simple math problems, but 
then they can excel at really hard problems, Collins says. OpenAI has, 
for example, developed dedicated tools that can solve challenging 
problems posed in competitions for top math students in high school, but 
these systems outperform humans only occasionally.

Nevertheless, building an AI system that can solve math equations is a 
cool development, if that is indeed what Q* can do. A deeper 
understanding of mathematics could open up applications to help 
scientific research and engineering, for example. The ability to 
generate mathematical responses could help us develop better 
personalized tutoring, or help mathematicians do algebra faster or solve 
more complicated problems.

This is also not the first time a new model has sparked AGI hype. Just 
last year, tech folks were saying the same things about Google 
DeepMind’s Gato, a “generalist” AI model that can play Atari video 
games, caption images, chat, and stack blocks with a real robot arm. 
Back then, some AI researchers claimed that DeepMind was “on the verge” 
of AGI because of Gato’s ability to do so many different things pretty 
well. Same hype machine, different AI lab.

And while it might be great PR, these hype cycles do more harm than good 
for the entire field by distracting people from the real, tangible 
problems around AI. Rumors about a powerful new AI model might also be a 
massive own goal for the regulation-averse tech sector. The EU, for 
example, is very close to finalizing its sweeping AI Act. One of the 
biggest fights right now among lawmakers is whether to give tech 
companies more power to regulate cutting-edge AI models on their own.

OpenAI’s board was designed as the company’s internal kill switch and 
governance mechanism to prevent the launch of harmful technologies. The 
past week’s boardroom drama has shown that the bottom line will always 
prevail at these companies. It will also make it harder to make a case 
for why they should be trusted with self-regulation. Lawmakers, take note.