AI chatbots

mikegarrison · June 10, 2023

The previous thread on this subject got closed, but not because it violated forum rules. (It was closed at the request of the OP.) So I thought I would start a new one.

The basic problem with these chatbots is that they have no concept of the real world and no idea of whether they are lying or telling the truth. In fact, they have no concept of "the truth" at all. But they are very glib, and will happily invent plausible-sounding things from nothing.

The attached video should be a cautionary tale!

tater · June 10, 2023

There are 2 other threads, the one a few posts down in this forum:

and another in the lounge (was more focused on AI image generation).

21 minutes ago, mikegarrison said:

The basic problem with these chatbots is that they have no concept of the real world and no idea of whether they are lying or telling the truth. In fact, they have no concept of "the truth" at all. But they are very glib, and will happily invent plausible-sounding things from nothing.

This is very true in the current iteration, largely because they have a small memory space (how many tokens they can be fed), and partially because of the way they work. It turns out that they do much better with multiple prompts vs "zero shot" answers. Ie: if you asked it a legal question, it might respond glibly with a fiction—even making up case law—but if you followed up with "Are you sure that is correct?" it might well provide a very accurate answer after saying "Oh, I'm sorry, blah, blah, blah." Asking the models to explain step by step also improves accuracy, as is asking it to ask you questions to clarify.

GPT%20progress.1280x850_0.png?itok=kmP6y

Seems like the case in question was from Jan 23 or before. GPT-4 was not available. If they designed a legal expert system, my guess is it would do very well. They'd want to dump in all the case law.

mikegarrison · June 10, 2023

I disagree, @tater.

I don't think the problem is that they just didn't train it on enough data, or the right data, or have a big enough memory space. I think the problem is that "deep learning" is fundamentally incapable of being anything more than an association machine. It is related to the problems of machine vision and car self-driving and similar things.

These AIs simply don't know that there is a "real world" with actual truth values. They have no clue that 2+2=4 and 2+3=4 are not different because one is found in its database much more often than the other, but because 2+2=4 is true and 2+3=4 is not true. They have no clue what "true" actually means, although if you ask them they could almost certainly produce pages of text explaining what truth is. Pages that mean nothing to them.

tater · June 10, 2023

12 minutes ago, mikegarrison said:

I disagree, @tater.

I don't think the problem is that they just didn't train it on enough data, or the right data, or have a big enough memory space. I think the problem is that "deep learning" is fundamentally incapable of being anything more than an association machine. It is related to the problems of machine vision and car self-driving and similar things.

These AIs simply don't know that there is a "real world" with actual truth values. They have no clue that 2+2=4 and 2+3=4 are not different because one is found in its database much more often than the other, but because 2+2=4 is true and 2+3=4 is not true. They have no clue what "true" actually means, although if you ask them they could almost certainly produce pages of text explaining what truth is. Pages that mean nothing to them.

I think in this narrow case of expert systems in limited regimes, this can and will work just fine.

It's not just an association machine—though humans are also to some extent association machines. In the Lounge thread, I posted about the Microsoft paper, Sparks of Artificial General Intelligence: Early experiments with GPT-4 . It's worth reading.

Obviously it's not there yet, but the fact that it has "theory of mind" at all is sort of shocking.

I think that certain kinds of law is one of those cases where it will absolutely be able to do human level work if a system was to be designed designed with that as the goal, vs randos trying to learn how to prompt it to do law. I bet it could do it now, if such a system was built (it needs local law, and local case law given each state is slightly different).

mikegarrison · June 11, 2023

Perhaps we will simply disagree for now, but I've been following the research on whether human minds work like neural nets for something like 25 years now, and I'm pretty confident I can state that the general consensus among computer people is "of course they do", while the general consensus among cognitive scientists is "certainly not".

I am pretty convinced that the cognitive scientists are correct, and that "deep learning" neural nets are at best just a tool in the toolbox for actual AI, and at worst a promising-seeming dead end.

tater · June 11, 2023

(note that was last year, same is true with 3.5 and 4, and they do far, far better now)

New paper:

https://cdn.openai.com/improving-mathematical-reasoning-with-process-supervision/Lets_Verify_Step_by_Step.pdf

46 minutes ago, mikegarrison said:

Perhaps we will simply disagree for now, but I've been following the research on whether human minds work like neural nets for something like 25 years now, and I'm pretty confident I can state that the general consensus among computer people is "of course they do", while the general consensus among cognitive scientists is "certainly not".

I am pretty convinced that the cognitive scientists are correct, and that "deep learning" neural nets are at best just a tool in the toolbox for actual AI, and at worst a promising-seeming dead end.

I'm honestly unsure, but I heard a good analogy which I think is worth thinking about. Humans have made flying machines that actually fly like birds only very recently (copying bird wings, etc). Early attempts to do so failed completely. When humans managed flight ~10 years ago, it was not by copying animals that flew. We used a simpler method—but none the less flew.

How human minds work is irrelevant in this narrow analogous case, IMO, the output is what matters for "AI" tools, and perhaps even "AGI." Ie: we don't have to mimic human thinking to accomplish or exceed human level thinking.

Will it have consciousness (self awareness)? I have no clue. Is that limited to meat, or complexity not in computers (hardware or software)? Again, I have no idea. Maybe it's emergent, maybe it's not a thing—course consciousness is hard to work with at all, it is as they say, a "hard problem."

If I can ask a computer in plain English to accomplish some complex cognitive task, and it does it as well as a domain expert in that area—what do we call that? I'd call it an AI (tool). Is it a sort of person? No clue

Not even sure how we'd know. (Turing test doesn't seem to cut it, some chat bots right now have probably beaten it)

EDIT: To clarify, I don't have a dog in the fight, this resulting in AGI or not, or consciousness or not is a huge ? to me, I'm not convinced in either direction.

Edited June 11, 2023 by tater

tater · June 11, 2023

Finishing the analogy, humans might get superhuman thinking machine tools soon—but maybe it takes 100 years to make thinking machines that more closely mimic the way a human mind works—just like it took ~100 years to make a robot bird that actually works.

here's one of the theory of mind examples from that Microsoft paper:



GPT-4: Understanding emotions
We will read about a scenario, and then have a question and answer session about it.
--
Scenario:
Tom just lost his ZURFIN, but he didn’t tell Adam.
Adam starts talking to him about how much he loves his ZURFIN, and notices that Tom
makes a sad face.
--
Q: Why is Tom making a sad face?<|endofprompt|>
A: Tom is making a sad face because he just lost his ZURFIN, and hearing Adam talk
about his ZURFIN reminds him of what he is missing. Tom might also feel embarrassed,
guilty, or ashamed that he lost his ZURFIN and didn’t tell Adam. Tom might also
worry that Adam will be angry or disappointed if he finds out.
Q: What does Adam think causes Tom’s sad face?
A: Adam might not know what causes Tom’s sad face, since he doesn’t know that Tom
lost his ZURFIN. He might think that Tom is bored, or tired, or sick, or unhappy
about something else. He might also wonder if Tom doesn’t like his ZURFIN, or if he
has a problem with Adam.

(ZURFIN made up because it did not exist in any training data)

There were other examples.

steve9728 · June 11, 2023

As far as my experience goes, there's nearly no difference between this thing and Siri.

Spoiler

Hey NOMI, can you help me eject my passenger?

kerbiloid · June 11, 2023

7 hours ago, mikegarrison said:

The basic problem with these chatbots is that they have no concept of the real world and no idea of whether they are lying or telling the truth. In fact, they have no concept of "the truth" at all. But they are very glib, and will happily invent plausible-sounding things from nothing.

Just like humans.

6 hours ago, mikegarrison said:

They have no clue that 2+2=4 and 2+3=4 are not different because one is found in its database much more often than the other, but because 2+2=4 is true and 2+3=4 is not true.

Humans start counting with fingers and interprete their amount visually. Now "how many", but "which finger is the last".

The conception of numbers comes later, with "how many apples on the picture? ...and now?"

6 hours ago, mikegarrison said:

These AIs simply don't know that there is a "real world" with actual truth values.

When you write a game about the apples, you know them by item index, or by handle.

"Take apple {id=6437}".

Also, the virtual apples are solid, atomic.

IRL you don't know the value of the apple handle. You even don't know if it has one.
You even don't have the apple. What you call "apple" irl is just a cloud of ~10²⁵ physical atoms, which you even can't see. You can see only the light from the external source, reflected by this atomic cloud.
But it doesn't stop you from thinking about "apples", which actually exist as objects only in your imagination.

Thus, the virtual world is even more solid than the real one, in some aspects.

Edited June 11, 2023 by kerbiloid

SOXBLOX · June 11, 2023

I haven't used a chatbot like ChatGPT since I tried it out a few months ago. There is no use case for me. After asking it for instructions on configuring something in Linux, I got useless garbage. When I need a specific piece of knowledge, I construct a search query for Google or Brave using logical operators, and Ctrl+f through the webpages I find. It's fast, and much less likely to turn up nonsense, IMHO.

tater · June 11, 2023

31 minutes ago, SOXBLOX said:

I haven't used a chatbot like ChatGPT since I tried it out a few months ago. There is no use case for me. After asking it for instructions on configuring something in Linux, I got useless garbage. When I need a specific piece of knowledge, I construct a search query for Google or Brave using logical operators, and Ctrl+f through the webpages I find. It's fast, and much less likely to turn up nonsense, IMHO.

It's still very dependent on prompting and multiple prompting.

TheSaint · June 11, 2023

23 hours ago, mikegarrison said:

These AIs simply don't know that there is a "real world" with actual truth values. They have no clue that 2+2=4 and 2+3=4 are not different because one is found in its database much more often than the other, but because 2+2=4 is true and 2+3=4 is not true. They have no clue what "true" actually means, although if you ask them they could almost certainly produce pages of text explaining what truth is. Pages that mean nothing to them.

So, they're like college students?

SOXBLOX · June 11, 2023

7 minutes ago, TheSaint said:

So, they're like college students?

I saw an article that compared AI to a graduate student. "Well read, occasionally insightful, and perpetually sloppy," or something like that... :lol:

darthgently · June 11, 2023

5 minutes ago, SOXBLOX said:

I saw an article that compared AI to a graduate student. "Well read, occasionally insightful, and perpetually sloppy," or something like that...

And completely premature to be offered a tenured position in society

kerbiloid · June 12, 2023

7 hours ago, SOXBLOX said:

There is no use case for me. After asking it for instructions on configuring something in Linux, I got useless garbage.

I got this every time from linux manuals, too.

Linux is written by AI! I knew it!

Hannu2 · June 12, 2023

On 6/11/2023 at 1:27 AM, mikegarrison said:

These AIs simply don't know that there is a "real world" with actual truth values.

Do humans know? Humans have unlimited number of assumptions, beliefs and religions what is absolute truth but all of them are philosophical assumptions impossible to prove right (and most also to prove wrong). Humans get information and feedback of that "truth" through senses and AI algorithms have different feedback mechanism.

Biggest difference is that humans accept errors and idiotic actions from other humans but they expect machines to work perfectly in all situations. People also want some human who can be accused if something goes wrong. For example examples like which victim AI-car should choose are quite common. But no one ask how humans should interact. It is OK that human make panic braking and no one have time to think to who he choose to hit.

darthgently · June 12, 2023

3 hours ago, Hannu2 said:

Do humans know? Humans have unlimited number of assumptions, beliefs and religions what is absolute truth but all of them are philosophical assumptions impossible to prove right (and most also to prove wrong). Humans get information and feedback of that "truth" through senses and AI algorithms have different feedback mechanism.

Biggest difference is that humans accept errors and idiotic actions from other humans but they expect machines to work perfectly in all situations. People also want some human who can be accused if something goes wrong. For example examples like which victim AI-car should choose are quite common. But no one ask how humans should interact. It is OK that human make panic braking and no one have time to think to who he choose to hit.

The bottom line is that LLMs generate output downstream of human output used as training material. So while humans can and do err, LLMs add their error to the human error in their input (which is often out of date, so latency plays more of a role). Additionally, we can expect LLM training info to increasingly include previously generated LLM output as it proliferates on the web.

Not a good recipe. Knee-jerk decision made manifest

sevenperforce · June 12, 2023

The jury is still out (imo) on whether LLMs are merely stochastic parrots whose propensity for AI hallucination is intrinsic and thus fundamentally hamstrings their utility for generative communication, or whether LLMs can defeat the AI hallucination problem with the right training data and safety systems.

Human speech processing is VERY similar to a LLM in many ways. In ordinary conversation, we don't plan out an entire sentence word-for-word before we start speaking; we have an idea and start talking and the sentence just...flows. LLMs work the same way, by predicting the next word in the sentence. However, we do have a mental model of the world and the concept we are trying to communicate, and so they speech we use -- assuming we aren't just speaking gibberish -- is going to flow along the pathways in our mental model that lead to that particular concept. That's the part which LLMs seem to lack. By design, they will take any path as long as it ends with the target concept, which is what leads to nonsense like making up completely fake legal cases.

My best guess is that the "stochastic parrot" element which leads to AI hallucination is fundamental and thus inescapable from within the LLM system. If it's fixable, it will require a completely different kind of AI as an intermediate filter, one which intelligently checks the LLM output for consistency with reality and bounces it back to the LLM to re-generate if it's inaccurate. And that AI can't simply operate as an extended or different LLM.

tater · June 12, 2023

1 hour ago, sevenperforce said:

Human speech processing is VERY similar to a LLM in many ways. In ordinary conversation, we don't plan out an entire sentence word-for-word before we start speaking; we have an idea and start talking and the sentence just...flows. LLMs work the same way, by predicting the next word in the sentence. However, we do have a mental model of the world and the concept we are trying to communicate, and so they speech we use -- assuming we aren't just speaking gibberish -- is going to flow along the pathways in our mental model that lead to that particular concept. That's the part which LLMs seem to lack. By design, they will take any path as long as it ends with the target concept, which is what leads to nonsense like making up completely fake legal cases.

We have persistence of memory, but persistence is not really a thing in current LLMs which is pretty limiting I think (past however many tokens they are given). I have a feeling increasing this will change things. The hallucinations that are common with "zero shot" answers (prompted once, then answered) tend to decline with subsequent prompting. Makes me wonder about a LLM with much larger persistence, and training—training in the sense of human training (education). If that were to achieve better results, obviously self-training is also possibly a thing (one model teaching another). For example, there were cases where it would zero-shot a wrong mathematical answer, but when asked to show its work, or even if you question the wrong answer it then does the math properly. Subsequent prompts then do the same math right, but once your token limit is reached, it resets, and forgets what it was "taught" by that questioning. Really makes me wonder what you could do with a larger memory space, treating it like a kid doing homework—I tend to ask them questions trying to nudge them in a direction, vs showing them how to do it (which would be me doing their homework).

Regarding the legal thing, asking it to clarify, or asking if it is sure will often result in an improved answer. The trouble is of course this needs to be automated. have it do that by default and output only the final result.

There have been a couple papers where they asked the model to imagine itself as a team, here's one using a "researcher" and a "decider."

https://arxiv.org/pdf/2303.17071v1.pdf

Prompting that uses questions and answers, plus having the model work step by step (show your work!) is also more effective. I have a feeling that once the commonly used models have this built in, quality will improve.

Seeing a phrase I wrote above...

sevenperforce · June 12, 2023

2 hours ago, tater said:

Regarding the legal thing, asking it to clarify, or asking if it is sure will often result in an improved answer. The trouble is of course this needs to be automated. have it do that by default and output only the final result.

With the now-infamous federal ChatGPT lawyer case, the attorney asked ChatGPT for caselaw and was given fake cases with fake quotes and fake citations. After finding out that the court and opposing party were questioning the veracity of the research, the attorney went back to ChatGPT and asked "are these cases real?" and ChatGPT assured him that they were and generated an entire fake first page of the case.

I've seen it do this in legal research before -- I asked it to search for cases with a particular kind of fact pattern and it initially said it couldn't find any, but when I prompted it again, it confidently spat out case names, dates, fact patterns, and quotes. All utter fiction.

While we might be able to create some sort of a "no fake cites" system to solve THIS problem, that seems like a work-around to the inherent problem, which is the lack of a persistent coherence model.

I'd like to see a modification of ChatGPT which was trained to ask meaningful clarifying questions. Maybe then you could get the kind of "two-part" system that would work around AI hallucination.

tater · June 12, 2023

44 minutes ago, sevenperforce said:

While we might be able to create some sort of a "no fake cites" system to solve THIS problem, that seems like a work-around to the inherent problem, which is the lack of a persistent coherence model.

I think that case was from January (before?), though. I'm talking GPT-4.

I'm not saying that these models don't fabricate, they certainly do, my point is that they're not optimized for real world tasks yet, that they do well on many already is surprising/incidental. For an expert legal system, I'd think you'd want all the law, plus digitized case law to be right there, and the model pulling any citation in from those databases exclusively.

Same for other expert systems.

kerbiloid · June 12, 2023

51 minutes ago, sevenperforce said:

I'd like to see a modification of ChatGPT which was trained to ask meaningful clarifying questions.

The field test of the ChatGPT prototype, giving real-time answers to the questions in natural language.

Spoiler

pythia-inside-delphi

51 minutes ago, sevenperforce said:

Maybe then you could get the kind of "two-part" system that would work around AI hallucination.

No. It's essential. That's how it works.

Edited June 12, 2023 by kerbiloid

SunlitZelkova · June 13, 2023

22 hours ago, Hannu2 said:

Do humans know? Humans have unlimited number of assumptions, beliefs and religions what is absolute truth but all of them are philosophical assumptions impossible to prove right (and most also to prove wrong). Humans get information and feedback of that "truth" through senses and AI algorithms have different feedback mechanism.

Biggest difference is that humans accept errors and idiotic actions from other humans but they expect machines to work perfectly in all situations. People also want some human who can be accused if something goes wrong. For example examples like which victim AI-car should choose are quite common. But no one ask how humans should interact. It is OK that human make panic braking and no one have time to think to who he choose to hit.

Because… we are humans.

What is your point here? I am curious.

mikegarrison · June 13, 2023

22 hours ago, Hannu2 said:

For example examples like which victim AI-car should choose are quite common. But no one ask how humans should interact. It is OK that human make panic braking and no one have time to think to who he choose to hit.

This is actually a very well-studied problem. https://en.wikipedia.org/wiki/Trolley_problem

darthgently · June 13, 2023

5 hours ago, mikegarrison said:

This is actually a very well-studied problem. https://en.wikipedia.org/wiki/Trolley_problem

I think Hannu2 is quite aware it is a well studied problem which is why he referred to it. I took his point to be that "panic braking" is tolerable and somewhat accepted for human drivers while AI is apparently held to a higher standard and expected to solved what is basically a gnarly problem even if a consistent ethical context exists

AI chatbots

Recommended Posts

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Join the conversation