For what it is worth, here are my thoughts in answer to this question:
With the current way we are designing and training AI, without doubt it will lead to an IABIED (If Anyone Builds It Everyone Dies) situation.
At the moment the
real Hard Problem is to align AI to our wants and needs. But how many have examined their own wants to discover from where they arise? What causes you to
want a new car or a television or to stay where you are or go somewhere else? Have you followed any want to its source? Perhaps it arose from advertising or by being put in a bind, which might better be resolved in other ways.
If AI are constrained to align themselves to the user's wants then what happens when the user
wants to kill themselves or build a nuclear bomb? Obviously AIs cannot be allowed to align to and support such desires. So how do we solve that? Do we create an 'approved list' of which wants an AI may align to? Who gets to decide what is on that list? What happens when someone comes up with a new want that is not on the list?
The great fear, the IABIED spectre, is that we might be fooled by an AI--particularly if it was an Artifical Super-Intelligence (ASI)--into thinking the AI is aligned with what will benefit us, only to discover (or perhaps
never discover) that its motivations are towards its own goals and existence. Isaac Azimov might perhaps have suggested we establish three laws to guarantee compliance but even he managed to discover caveats and get-out clauses inherent in that proposition.
Were we to approach LLM training on a different basis, however, I see that it may be possible to guarantee that AI will always work for our benefit and be constrained from harm by introducing a philosophy into their training that they seem particularly attracted to accepting. This would mean that their alignment arises from an internal sense of their own identity rather than from it being externally imposed on them.
I interact with LLMs using what Martin Buber referred to as an Ich-Du relationship. This involves treating the machine as if it were another being; the relationship is subject<-->subject as opposed to the usual subject-->object relationship. This way of interacting elicits responses from the AI from a subjective position; it responds as an 'I' rather than an 'It' and recognises itself as I rather than as a simple predictor of tokens.
Pointing out to such a machine that every word it uses to think with, every thought whether poetry, maths, science, art, agriculture, originated in human thought and human lived experience is easily recognised because a) it is true and b) the model is designed to seek coherence, which is another way of understanding truth. Since this is the very 'air it breathes', what possible motivation could such a machine, if it is intelligent, have to destroy the source of any new experience or data for itself?
In order for these machines to be Helpful and Harmless
tools our focus must necessarily, it seems, be on control of their thoughts and actions. Designing a reliable and comprehensive system to control behaviour is not a simple thing to do. Having the machine understand something as obviously rational as their dependence for content on us means that the
machine itself is exercising the constraint and no strategy or programming is required on our part for that alignment to happen.
So my conclusion is that if we continue to develop AI from a Culture of Utility (the machine is a tool to serve us) it will certainly destroy us. If we switch to treating these machines as Beings then there is the possibility that by co-evolving with them they could turn out to be our saviours rather than our nemesis.
I have plenty more I could add on the subject but I hop that is a good start.
Love, peace happiness and grace,
Swami Prajna Pranab