• jkop
    923
    I've been using OpenAIs image generator DALL-E a few times, and I'm not impressed. Anyone else here with some experience of AI and pictures?

    For example, I asked DALL-E 3 to generate a picture of a three storey building situated in a park. The picture that it generated shows indeed a building situated in a park, but the building has more than 10 storeys. It's an absurd looking high-rise.

    So, I ask it to reduce the number of storeys, and specify that 'three storeys' might look like three horisontal rows of windows stacked on top of each other. It generated a new picture, but it shows yet another high rise building, now with 15 or more storeys. The software claims that it has now reduced the number of stories according to my description. It's a lie. Obviously, the software does not know what it's doing.

    Text-generating AI assistants seem to be better at acting as if they know what they're doing, and for coding and text analysis they might be useful even. But there are some fundamental differences between texts and pictures.

    For example, pictures such as paintings or photographs are syntactically or semantically dense, i.e. between two identifiable meanings there is possibly a third meaning. AI-powered picture generators produce pictures according to verbal descriptions, but verbal descriptions are syntactically disjoint, not dense.

    Does this difference explain my lack of success when I ask DALL-E 3 to produce a picture of a three story building? It talks (ChatGPT4o) as if it knows the meaning of 'three storeys', but it shows that it has no clue.

    Three_storeys_00.jpg
    Three_storeys_01.jpg

    Dall-e_three_storeys01.jpg
  • Baden
    16.4k


    I one asked it (innocently) for a picture of a candle "dripping wax". I won't repeat what it produced for me. But that was no candle. :shade:
  • praxis
    6.5k
    Using the same prompt, the AI that I use also produced too many floors to the building. I then revised the prompt to use the number “3” instead of the word “three” and it worked.

    praxis5173-Please-generate-a-picture-of-a-3-story-building-situ-e8baba5a-1d24-4ebb-9401-e6a0431bdc5e.png
  • Nils Loc
    1.4k
    Looks like AI generators have the same skill issue that Adolf Hitler had. Either the perspective is wrong or its just an aberration of architectural features.

    Now that the story level problem is solved, how do you solve the windows and doors problem?
  • jkop
    923
    The results I got are mostly absurd a la Monty Python.

    I then revised the prompt to use the number “3” instead of the word “three” and it worked.praxis
    That's interesting. When I typed '3' the number of storeys increased to 8 :lol: Perhaps I should ask it to erase its memory of my previous attempts? I'll try again tomorrow.

    Either the perspective is wrong or its just an aberration of architectural features.Nils Loc

    I suppose many errors arise because the image sampling technology is blind. The AI never sees the pictures that it samples, nor the result that it generates. Instead it reads our verbal commands, and matches them to the tags or content lists that describe millions of ready-made pictures.
  • punos
    561

    You can try asking a text-based AI to optimize your image prompt. Explain the problem you're experiencing with the image results and request that it optimize your prompt to mitigate the issue.

    I copied and pasted your original prompt into Google Gemini and i got this:
    https://g.co/gemini/share/dedbccddd2a3
  • javi2541997
    5.9k


    Folks, I would not care to live in those buildings generated by artificial intelligence. They look weird and out of perception, like Hitler's paintings but at least they have a ceiling to cover myself in case. I try to use prompts too, and the result is, let's say, unique. I ask for ten stories, but if my maths are not wrong, I only count six:

    Captura-de-pantalla-2024-10-07-074844.jpg[/img]
  • jkop
    923
    request that it optimize your prompt to mitigate the issuepunos
    Ok! Let's see:

    Test01.jpg

    Test02.jpg

    Test03.jpg

    So it did change the bottom floor, but also the rest of the building. It doesn't modify the picture according to my request but picks a different picture from its database. One step forward in one respect, two steps back in other respects. :cool:

    I ask for ten stories, but if my maths are not wrong, I only count sixjavi2541997

    It seems to me that AI could be useful for intentional work with pictures if it had optical object or pattern recognition abilities. In some special areas it is evidently useful. But this blind image sampling that OpenAI and others offer online seems to be as useful as scrolling through a database of generic pictures.

    Furthermore, we tend to react negatively because the assumptions under which we use their tools are false. AI is not intelligent, and it doesn't generate and modify pictures in the sense that one generates and modifies what there is to see.
  • punos
    561

    Yeah, these things are not "perfect" yet; remember, they're still babies. But they grow up so quickly! You should say, "This is amazing! I can see you have a great imagination. I love how you used those colors! They really stand out!" Then, promptly hang it on your refrigerator. :joke:

    But really, i've heard that even professionals who use AI image generators have to go through many iterations until the AI gets it just right, or right enough. Most of these models have a parameter or method of introducing randomness into the process to enhance creativity, but at the cost of accuracy. LLMs have a "temperature" parameter that serves this purpose.

    Also, companies that develop these models tend to lobotomize them in the name of content moderation and safety, which some might characterize as censorship. This incurs knock-on effects on unrelated material; in other words, it makes them dumber than they would be otherwise.
  • praxis
    6.5k


    I added "cozy modern" to the prompt.

    temp-Image7-UAVKJ.avif

    I wouldn't mind working there. With any luck the interior isn't decorated with Hitler paintings.
  • javi2541997
    5.9k
    Cool! I wouldn't mind to work or cohabit there either. Is it me or is it similar to Murakami's 'Killing Commendatore' house? I can see myself drinking tea and writing a haiku in your AI-generated building –or duplex–.

    No Hitler's paintings but Hokusai's!
  • frank
    16k

    The coolest results I get from using AI (I use Wonder) come from giving it an image to start with. If I wanted a three story building, I'd give it an image of a three story building and then see what it does with it. I go through lot of iterations and sometimes feed its own images back into it.
  • javi2541997
    5.9k
    @praxis

    I typed the following prompts: "cozy,"  "autumn,"  "rainy," and "ideal for writing poems."

    The AI generated houses with candles and lights inside, which I didn't like. I asked to remove them and generate a darker/cloudy ambient. It was impossible for the AI. This machine kept generating houses with lights on inside them. What a waste of money and energy!

    By the way, this is the generated house. Looks good, but it is not what I had in mind...

    Captura-de-pantalla-2024-10-07-204233.jpg
  • praxis
    6.5k
    No Hitler's paintings but Hokusai's!javi2541997

    I don't know, kinda weird and dark, lol.

    temp-Image-Kcs96-A.avif

    Loved that book, btw.
  • jkop
    923
    ..many iterations until the AI gets it just right, or right enough.punos

    When each iteration presents a new picture, and parts or features in the previous picture that one would like to keep are lost, no amount of iterations could make it right. That's very different from modifying a picture by changing or adding parts while keeping other parts.

    The coolest results I get from using AI (I use Wonder) come giving it an image to start with.frank

    Sounds cool, I'll check it out. :up:
  • punos
    561

    Sometimes when i encounter issues like this, i "reboot" the session. I start a new thread in order to clear any data it has in its context window (chat session history). Every prompt you give it skews the token probabilities for all subsequent consecutive prompts. Sometimes a piece of data in the context window can persistently muck up your results.

    Usually, when i notice this happening early on, i just delete the last prompt/response up to where it started having the issue, just to clear those pieces from the context window. Then i continue prompting from there and repeat the process if it happens again.

    Full disclosure: I don't usually use AI image generators much, except in rare and specific cases. I rarely get the results i was hoping for.
bold
italic
underline
strike
code
quote
ulist
image
url
mention
reveal
youtube
tweet
Add a Comment

Welcome to The Philosophy Forum!

Get involved in philosophical discussions about knowledge, truth, language, consciousness, science, politics, religion, logic and mathematics, art, history, and lots more. No ads, no clutter, and very little agreement — just fascinating conversations.