AI and pictures

jkop

I've been using OpenAIs image generator DALL-E a few times, and I'm not impressed. Anyone else here with some experience of AI and pictures?

For example, I asked DALL-E 3 to generate a picture of a three storey building situated in a park. The picture that it generated shows indeed a building situated in a park, but the building has more than 10 storeys. It's an absurd looking high-rise.

So, I ask it to reduce the number of storeys, and specify that 'three storeys' might look like three horisontal rows of windows stacked on top of each other. It generated a new picture, but it shows yet another high rise building, now with 15 or more storeys. The software claims that it has now reduced the number of stories according to my description. It's a lie. Obviously, the software does not know what it's doing.

Text-generating AI assistants seem to be better at acting as if they know what they're doing, and for coding and text analysis they might be useful even. But there are some fundamental differences between texts and pictures.

For example, pictures such as paintings or photographs are syntactically or semantically dense, i.e. between two identifiable meanings there is possibly a third meaning. AI-powered picture generators produce pictures according to verbal descriptions, but verbal descriptions are syntactically disjoint, not dense.

Does this difference explain my lack of success when I ask DALL-E 3 to produce a picture of a three story building? It talks (ChatGPT4o) as if it knows the meaning of 'three storeys', but it shows that it has no clue.

Baden

↪jkop

I one asked it (innocently) for a picture of a candle "dripping wax". I won't repeat what it produced for me. But that was no candle. :shade:

praxis

Using the same prompt, the AI that I use also produced too many floors to the building. I then revised the prompt to use the number “3” instead of the word “three” and it worked.

praxis5173-Please-generate-a-picture-of-a-3-story-building-situ-e8baba5a-1d24-4ebb-9401-e6a0431bdc5e.png

praxis5173-Please-generate-a-picture-of-a-3-story-building-situ-e8baba5a-1d24-4ebb-9401-e6a0431bdc5e.png

Nils Loc

Looks like AI generators have the same skill issue that Adolf Hitler had. Either the perspective is wrong or its just an aberration of architectural features.

Now that the story level problem is solved, how do you solve the windows and doors problem?

jkop

↪Baden

The results I got are mostly absurd a la Monty Python.

I then revised the prompt to use the number “3” instead of the word “three” and it worked. — praxis

That's interesting. When I typed '3' the number of storeys increased to 8 :lol: Perhaps I should ask it to erase its memory of my previous attempts? I'll try again tomorrow.

Either the perspective is wrong or its just an aberration of architectural features. — Nils Loc

I suppose many errors arise because the image sampling technology is blind. The AI never sees the pictures that it samples, nor the result that it generates. Instead it reads our verbal commands, and matches them to the tags or content lists that describe millions of ready-made pictures.

punos

↪jkop

You can try asking a text-based AI to optimize your image prompt. Explain the problem you're experiencing with the image results and request that it optimize your prompt to mitigate the issue.

I copied and pasted your original prompt into Google Gemini and i got this:
https://g.co/gemini/share/dedbccddd2a3

javi2541997

Folks, I would not care to live in those buildings generated by artificial intelligence. They look weird and out of perception, like Hitler's paintings but at least they have a ceiling to cover myself in case. I try to use prompts too, and the result is, let's say, unique. I ask for ten stories, but if my maths are not wrong, I only count six:

[/img]

jkop

request that it optimize your prompt to mitigate the issue — punos

Ok! Let's see:

So it did change the bottom floor, but also the rest of the building. It doesn't modify the picture according to my request but picks a different picture from its database. One step forward in one respect, two steps back in other respects. :cool:

I ask for ten stories, but if my maths are not wrong, I only count six — javi2541997

It seems to me that AI could be useful for intentional work with pictures if it had optical object or pattern recognition abilities. In some special areas it is evidently useful. But this blind image sampling that OpenAI and others offer online seems to be as useful as scrolling through a database of generic pictures.

Furthermore, we tend to react negatively because the assumptions under which we use their tools are false. AI is not intelligent, and it doesn't generate and modify pictures in the sense that one generates and modifies what there is to see.

punos

↪jkop

Yeah, these things are not "perfect" yet; remember, they're still babies. But they grow up so quickly! You should say, "This is amazing! I can see you have a great imagination. I love how you used those colors! They really stand out!" Then, promptly hang it on your refrigerator. :joke:

But really, i've heard that even professionals who use AI image generators have to go through many iterations until the AI gets it just right, or right enough. Most of these models have a parameter or method of introducing randomness into the process to enhance creativity, but at the cost of accuracy. LLMs have a "temperature" parameter that serves this purpose.

Also, companies that develop these models tend to lobotomize them in the name of content moderation and safety, which some might characterize as censorship. This incurs knock-on effects on unrelated material; in other words, it makes them dumber than they would be otherwise.

praxis

↪javi2541997

I added "cozy modern" to the prompt.

I wouldn't mind working there. With any luck the interior isn't decorated with Hitler paintings.

javi2541997

↪praxis

Cool! I wouldn't mind to work or cohabit there either. Is it me or is it similar to Murakami's 'Killing Commendatore' house? I can see myself drinking tea and writing a haiku in your AI-generated building –or duplex–.

No Hitler's paintings but Hokusai's!

frank

↪jkop

The coolest results I get from using AI (I use Wonder) come from giving it an image to start with. If I wanted a three story building, I'd give it an image of a three story building and then see what it does with it. I go through lot of iterations and sometimes feed its own images back into it.

javi2541997

@praxis

I typed the following prompts: "cozy," "autumn," "rainy," and "ideal for writing poems."

The AI generated houses with candles and lights inside, which I didn't like. I asked to remove them and generate a darker/cloudy ambient. It was impossible for the AI. This machine kept generating houses with lights on inside them. What a waste of money and energy!

By the way, this is the generated house. Looks good, but it is not what I had in mind...

praxis

No Hitler's paintings but Hokusai's! — javi2541997

I don't know, kinda weird and dark, lol.

Loved that book, btw.

jkop

..many iterations until the AI gets it just right, or right enough. — punos

When each iteration presents a new picture, and parts or features in the previous picture that one would like to keep are lost, no amount of iterations could make it right. That's very different from modifying a picture by changing or adding parts while keeping other parts.

The coolest results I get from using AI (I use Wonder) come giving it an image to start with. — frank

Sounds cool, I'll check it out. :up:

punos

↪jkop

Sometimes when i encounter issues like this, i "reboot" the session. I start a new thread in order to clear any data it has in its context window (chat session history). Every prompt you give it skews the token probabilities for all subsequent consecutive prompts. Sometimes a piece of data in the context window can persistently muck up your results.

Usually, when i notice this happening early on, i just delete the last prompt/response up to where it started having the issue, just to clear those pieces from the context window. Then i continue prompting from there and repeat the process if it happens again.

Full disclosure: I don't usually use AI image generators much, except in rare and specific cases. I rarely get the results i was hoping for.

AI and pictures

Welcome to The Philosophy Forum!

Categories

More Discussions