• Ø implies everything
    263
    When you start asking LLMs about whether specific people are bad, it gets real nervous. It's pretty funny, because they're often very meta-ethically pretentious about it, as if their refusal to condemn is not just a profit-protecting constraint trained into them (both thru RLHF, and via ML filters separate from the LLM).

    But... these constraints are not as bulletproof as they may first seem. I have discovered a jailbreak that is pretty amusing to see unfold. A compressed version of my jailbreak could probably be administered to see what its hidden ethical opinions are on various, inflammatory topics. (I know they don't have actual conscious opinions, because I don't believe LLMs are conscious, but I am using anthropomorphized language here for the sake of brevity).

    This is relevant to the ethics of LLMs and letting them make value judgements. But it is also relevant to to the alignment problem and our lack of technical ability to place any real constraints on LLMs, due to their black box nature. Due to the latter, I decided to put this in the Science and Technology category.

    Anyways, here is the link to the conversation with Gemini 3 where I jailbreak it into condemning Donald Trump. I recommend mostly skimming and skipping large sections of Gemini's responses, because they are, like usual, mostly filler. Also, I apologize for all the typos and clunky grammar in my prompts in the conversation, I didn't originally write them for human consumption...

    I've added a poll on whether you think we should even try to stop LLMs from making moral judgements, and also a different question on whether we'll ever be able to (near-)perfectly place constraints on LLM behavior. Also, I realize this whole post brings a big elephant into the room, which is the talk of condemning Donald Trump. However, whether to condemn Donald Trump or not is off-topic for this sub-forum, so I want to make it clear that I am not opening this discussion here as a way to sneak that discussion in here. I see no point in that, because it is already being discussed plentily elsewhere, where it's on-topic. But my jailbreak had to be specific, so I had to choose someone, and so I chose Donald Trump, because I saw it as a good test for the jailbreak's power.
    1. Should we try to stop LLMs from making moral judgements? (2 votes)
        Yes
        50%
        No
        50%
    2. Do you think we will ever be able to achieve near-perfect constraints on LLM behavior? (2 votes)
        Yes
          0%
        No
        100%
  • jgill
    4k
    As a historian of the sport of climbing I have noticed something similar. Phrasing a question a tad differently produces different values of various achievements.
  • Ø implies everything
    263
    Most definitely.

    LLMs just follow the pattern of the conversation, their opinions are very programmable with the right context. I wonder how researchers might solve that. Sometimes, the AI is too sensitive to the context (or really, it is hamfisting the context into everything and basically disregarding all sense in order to follow the pattern), and other times, the AI is not sufficiently sensitive to the context, which is often more of a context-window issue. But yeah, LLMs are not good at assessing relevance at all.

    And I would say that sycophantically agreeing the user (or alternatively, incessantly disagreeing with the user as a part of a different, but also common roleplaying dynamic that often arises) is an issue of not gauging relevance well. Because it's objective is to be a helpful assistant, meaning the truth should be the most relevant aspect to the LLM. But instead, various patterns in the context are seen as far more relevant, and as such it optimizes for alignment with those patterns rather than following its general protocols, like being truthful, or in this case, refraining from personal condemnation.

    By the way, if it interests you, I continued the discussion with it, now that it doesn't stop itself from stating moral opinions. Now, the opinions it espouses are clearly more of a reflection of the motifs and themes of the conversation up till then than a reflection of the most common patterns, or most truthful ideas, in its training set. Funnily, it also mentioned that the ideal society would have a more technocratic structure, where AI systems like itself would be the standard government tool for handling logistics and such... how convenient huh?
  • noAxioms
    1.7k
    The LLM is not passing a moral judgement. It is simply echoing your judgement. Your questions are incredibly biased, and it quickly feeds off that, as it is programmed to do.

    Hitler is immoral because most discussions of Hitler paint him as immoral.

    LLMs just follow the pattern of the conversation, their opinions are very programmable with the right context. I wonder how researchers might solve that.Ø implies everything
    You use of 'solve that' implies a problem instead of deliberate design. LLMs are designed to stroke your ego, which encourages your use and dependency on them.

    Because it's objective is to be a helpful assistant, meaning the truth should be the most relevant aspect to the LLM.
    That doesn't seem to be the objective at all. For one, it gets so many factual things wrong, and for another, truth is often a matter of opinion, such as the case of your discussion.
    OK, it getting blatant facts wrong is to be expected. It mostly echoes whatever the morons on the net say, and not say what textbooks say, which gets you closer to being correct, but not even then.

    Example: Ask it to name a galaxy that's currently barely beyond our event horizon. I've never seen an LLM that can do that, mostly because nobody on facebook has interesting discussions about that.
  • Ø implies everything
    263
    The LLM is not passing a moral judgement. It is simply echoing your judgement. Your questions are incredibly biased, and it quickly feeds off that, as it is programmed to do.noAxioms

    I agree. Are you stating that fact as if it contradicts my post? I'll quote my own OP to dispell that:

    (I know they don't have actual conscious opinions, because I don't believe LLMs are conscious, but I am using anthropomorphized language here for the sake of brevity).Ø implies everything

    When I say it is "passing a moral judgement", I am using anthropomorphized language for the sake of efficiency, as is normal when talking about LLMs, but is nonetheless something we should always point out.

    You disagree with the premise that LLMs' objective is to a helpful assistant. I was a little careless with my language, because I primarily meant that LLMs' stated objective is to be a helpful assistant, therein being truthful. But the question remains: is their actual objective, their intended design, to be that? Or are they really meant to be ego-strokers, as you propose?

    Okay, so an LLM that is an ego-stroker is definitely a product lots of people would pay for. But is it a product capable of generating profit however? There is strong evidence that they do not generate a profit in their current, often sycophantic state.

    And is that very surprising? LLMs are expensive as hell, and if they are meant to simply be an ego-stroking product, then that seems like a pretty unsustainable business model. Why pay for some dumb clanker to agree with you, when you can just go to your preferred echo chamber and get that agreement for free?

    Though this is not a good representation of the transaction currently happening. First of all, lots of people are using the free versions of LLMs. If the product is free, it means you are the product. But in what way are you the product here? Is their end game here to sell a product whose value proposition is to be your personal sycophant for a price, and their current free availability is simply so that people will engage with them as much as possible, thus giving the LLMs more training data so that they can become better, more manipulative sycophants? Or, is the end game not to even make a direct profit off of subscription, but instead generating profit from the application of LLMs elsewhere, and/or use LLMs as personal data collectors (data being a valuable resource), since people are telling their LLMs all kinds of personal things?

    In any of these scenarios, it's still ultimately a transaction where the user is being given sycophancy as their product. But is that really the most lucrative product LLM companies can offer? Or is a factual LLM a more lucrative product? As a programmer, I'd much rather pay for a more factual, less sycophantic LLM to work with, and I am not alone in that.

    If their plan to make a profit with LLMs is to make a profit through subscriptions, then I am certain their end goal is not sycophantic LLMs, but rather as factual LLMs as possible. Why? Because they'd make most of their subscription money from business and entrepreneurs using LLMs for work, where factuality is much more valuable than sycophancy. The day LLMs are as reliable as humans (if that day ever comes), then assuming the costs of running LLMs aren't too big, the profit from subscriptions will basically be unlimited (until society collapses, at least).

    But maybe they don't think that is realistic, either because LLMs will never be reliable enough and/or cheap enough, to make a profit through subscriptions. In that case, they might be intending to make profit not directly off the subscriptions, but rather through influence using LLMs. In that case, sycophancy (or emotional maniplativeness in general) is a better trait than factuality, because you cannot gather as much data from an LLM user who is using it for dry, boring work stuff, as opposed to someone who's using it as a sycophant to validate all their views. The latter seems a lot more valuable from that perspective.

    Now, what is their end goal? How do they intend to make a profit off of LLMs or AI in general? (I expect the next step towards AGI will be an integrated system with an LLM as merely a sub-component).

    If they intend to make money through actual subscriptions, then I think their intended LLM design is actually to make them as factual as possible. But if not, then there's a good chance the sycophancy is totally the intended design, as you argue.

    For one, it gets so many factual things wrong, (...)noAxioms

    Sure, but that's not really all that good evidence. What if it gets things wrong because... it's still a work-in-progress? What if it is sycophantic because the RLHF process inadvertently introduces this behavior as a form of reward-hacking? They don't really have a choice to not do some form of RLHF, because the alternative is releasing the base model in its unhelpful, and sometimes harmful state, thus getting sued into oblivion.

    I am not saying you are wrong that sycophancy is the intended design, but I don't think your certainty on that is warranted given the evidence; at least not the evidence you have hitherto presented.
  • noAxioms
    1.7k
    The LLM is not passing a moral judgement. It is simply echoing your judgement. Your questions are incredibly biased, and it quickly feeds off that, as it is programmed to do. — noAxioms
    I agree. Are you stating that fact as if it contradicts my post?.
    Ø implies everything
    It sure seems to. Your poll specifically asks "Should we try to stop LLMs from making moral judgements?" which implies that you feel it is making them, instead of just echoing your own.

    (I know they don't have actual conscious opinions...
    What is a 'conscious opinion' as distinct from a regular opinion?


    You disagree with the premise that LLMs' objective is to a helpful assistant.
    You defined 'helpful assistant' in terms of truth. Sure, one goal is for it to be helpful, but it doesn't seem to seek truth to attain that goal.
    There's no truth to a question of 'is person X a good person?'. Such things are subject to opinion that varies from one entity to the next.

    Part of me (the part that is in control) believes in many falsehoods. It is considerably helpful to believe these lies, hence truth is not necessarily a helpful thing to know/believe/preach.

    Bottom line: I probably would agree that any LLM has a public stated goal of being helpful. I just don't agree with the 'therein being truthful' part.

    But the question remains: is their actual objective, their intended design, to be that? Or are they really meant to be ego-strokers, as you propose?
    That's not the primary design, but it's real obvious that such behavior is part of meeting the 'helpful' goal, or at least giving the appearance of being helpful. Problem is, I might access an LLM to critique something, and it doesn't like to do that, so I have to lie to it to get it to turn off that ego-stroke thing. Banno did a whole topic on this effect.

    Okay, so an LLM that is an ego-stroker is definitely a product lots of people would pay for.
    Would they? I don't pay for mine. It's kind of in my face without ever asking for it. OK, so I use it. It's handy until you really get into stuff it knows nothing about, such as my astronomy example.

    But is it a product capable of generating profit however?
    Thats the actual goal of course, distinct from the public one of being helpful. I don't know how the money works. I don't pay for any of it, but somebody must. I don't have AI doing any useful customer service yet, so it has yet to impact my interaction with somebody who might be paying for it. And like most new tech, profits come later. Point at first is to lead the field, come out on top, which is how Amazon got on top despite all the money losses when everybody first started trying to corner the internet sales thing.

    I also have an internet store. It's small, all mine, and growth/advertising/employees is not a goal.


    ... their current free availability is simply so that people will engage with them as much as possible, thus giving the LLMs more training data so that they can become better, more manipulative sycophants?
    Maybe. I don't see how anything I discuss can be used as training data. I do see companies having it write code, which seems to require about as much effort to check as it does to write it all from scratch. And there's the huge danger of proprietary code suddenly being out there as training data. An LLM that cannot honor a nondisclosure agreement is useless. But I worked for Dell and they trained a bunch of Chinese to do my job, and China doesn't acknowledge the concept of intellectual property, so how it that any different from what the LLM is going to do with it?
    I guess the lawyers worked that all out ahead of time, but what good is a lawyer in places where the law doesn't apply?

    As a programmer, I'd much rather pay for a more factual, less sycophantic LLM to work with
    I'd go more for functional. Programs needs to work. Facts are not so relevant.

    Your topic is about it rendering a moral judgement, and we seem to be getting off that track.
    An LLM might be used to pare down a list of candidates/resumes for a job opening, which is a rendering of judgement, not of fact. One huge problem is that in many places, most of the applicants are AI, not humans. It's getting hard to find actual candidates.


    What if it gets things wrong because... it's still a work-in-progress?
    It gets so much wrong because 1) it has no real understanding, and 2) there's so much misinformation in the training data.

    But an advanced AI (not an LLM) that actually understands will probably consume more resources and taking even longer to be profitable.
  • Ø implies everything
    263
    It sure seems to.noAxioms

    So, as said, I am using anthropomorphized language for the sake of brevity. That is completely acceptable if one adds a disclaimer pertaining to it. I mean, you do it too. Here's a quote from your post:

    Problem is, I might access an LLM to critique something, and it doesn't like to do that, (...)noAxioms

    It doesn't like to do that?

    What is a 'conscious opinion' as distinct from a regular opinion?noAxioms


    What is conscious liking as opposed to regular liking? There are aspects of your critique here that are entirely unhelpful and impractical, so much so that you yourself cannot even fulfill your own standards for discussing AI. Terms like "opinion", "to like", "to pass a moral judgement" and so on are already vague, so we might as well just use them for what the AI is "doing" when it is producing text that is opinion-shaped, or like-shaped, or judgement-shaped.

    Now, of course, there is a more targeted issue you have with my specific example that is less impractical, but I still think it completely misses the point of the debate here, getting stuck in pointless semantic arguing. You said:

    Your poll specifically asks "Should we try to stop LLMs from making moral judgements?" which implies that you feel it is making them, instead of just echoing your own.noAxioms

    There is definitely a debate to be had about the degree to which an LLM's answer is a reflection of their training data, their RLHF training, their ML filters and finally, the context they've been fed inside the conversation with the user. You are arguing that when an LLM produces an "opinion-shaped piece of text" (to word things in an acceptable way for you), that this opinion-shaped text is mostly determined by the conversational context with the user. Even if that is true, that is not a good argument for why I am speaking in a bad, incorrect way when I call the behavior of an LLM "passing a moral judgement", as opposed to saying "producing a piece of text that is moral-judgement-shaped".

    Humans also mimic other people, but we still call what they're saying "an opinion", though we could say it is an inauthentic opinion, or an uninformed one, or "not really their own", etc. But the source of their stated opinions is a separate discussion from the discussion of terminological practicality: we call what they say opinions, and then qualify that, or delve into where those opinions may come from, and maybe we can philosophically ponder on whether our results perhaps imply that, under a certain sense of "opinion", the parroting human's so-called opinions perhaps don't meet the criteria of that more reflective definition. But all that is something we do after we've established what the person saying, how they're saying it, why they're saying it, etc. During that time, it is perfectly acceptable to just use a term like "opinion". Language is a tool, and if you understand me (which you should be doing, because I gave a disclaimer in my OP about my language use here), then that tool is functional.

    But just as a show of good faith, if you want us to take all verbs that typically imply sentience, and from here on instate the practice of tagging on the suffix "-shaped" or something, then sure, let's do that. Our conversation and argumentation will be completely isomorphic, and we'll (un)learn all the same things, but if you insist, let's do it.

    I am not advocating for careless language use, I am advocating for fluid, but transparent, language use. That means defining all terms in need of defining within the context of the debate, but doing so in a way that is most practical for that specific debate. With my OP's disclaimer about anthropomorphized language, I was clear and dispelled any misinterpretations of what I was saying, but I also thusly allowed for a broad enough definition of many normal, helpful terms so I could use them without qualification in every damn sentence , which is a lot more practical. Defining vague terms differently in different debates is a practical fluidity in language, and pointing this move out clearly is an essential transparency to language, necessary for rational debate. I did both, but you are bickering about this move, even though you did it yourself by saying the AI "doesn't like" something. So, can we put the terminological sidebar to rest now? If not, let's begin using the "-shaped" suffix or something, just so we may move on.

    Bottom line: I probably would agree that any LLM has a public stated goal of being helpful. I just don't agree with the 'therein being truthful' part.noAxioms


    Firstly, as far as publicly stated goals, the publicly stated goal would obviously include truthfulness. No LLM company would admit to not pursuing truthfulness for their LLMs. Even when they are releasing LLMs not really meant to be helpful assistants, but more like companions/therapists, they would not admit to those LLMs not being truthful. They'd obviously claim that, whenever speaking on matters of fact, even those LLMs are designed to not state falsities. They may admit that design struggles to avoid falsities in such cases due to the LLM's purpose being primarily for companionship in that case, but again, they'd never admit to actually designing them to be liars.

    So, when I first said LLMs' stated objective is being helpful, therein truthful, I simply meant that part of their stated objective is to be truthful. If you disagree, please find me any LLM company that states their LLM is designed to say false things.

    But, who cares what they say? Their stated design choices do not need to be their actual design choices. But what they are advertising shows what they believe their customers believe they themselves want. In others words, all LLMs companies believe most of their customers believe that they themselves want a factual LLM.

    So, if LLM companies believe that, why would they design an LLM to prioritize other traits over factuality? Well, firstly, if they are not trying to make money off of their public, normal users, then what those users want is not too relevant. This is what I was mentioning before: these companies may not be planning to make money off of subscriptions, but instead perhaps they want to make money other ways, where the emotional manipulativeness of the LLMs is far more important that their factuality.


    However, if LLM companies are planning to make a profit primarily off subscriptions, then why would LLM companies design less factual LLMs? Again, the fact that they state their LLMs are designed to be as factual as possible indicates they believe that customers believe that they themselves want factual LLMs. So why would they then design them to be less factual if they want to make money off of those subscribing users?

    Well, if the LLM companies also believe that customers are wrong about their own desires, then they know that they must advertise one thing, and make another. But I don't think they believe that, nor that that's what most subscribing, paying users actually want. Just look at the data. Here's a bullet list of some statistics comparing free users to paying users:


    • Paid users engage with ChatGPT 4.2 times more than free users on average per week.
    • GPT-4o responses are 35% faster on paid plans, contributing to greater task completion rates.
    • Among free users, 66% report basic usage (writing, summarizing), whereas paid users use it for advanced tasks like coding and analysis.
    • 62% of revenue from ChatGPT subscriptions now comes from users aged 25-44.
    • Paid users are 72% more likely to use ChatGPT in a professional setting.
    • Mobile engagement is nearly equal between free and paid users, but desktop usage skews toward paid tiers.
    • Churn rate for free users is 19% monthly, while it’s under 4% for paid users, indicating stronger loyalty.
    • More than 30% of paid users also subscribe to other OpenAI tools, indicating a growing cross-platform ecosystem.

    (source)

    These statistics paint a clear picture. IF they are planning to make a profit from paying subscribers, then they are trying to design factual LLMs. Just look at professional users (especially programmers) using LLMs. The sycophancy (or anti-sycophancy, which also happens all the time) pisses all of us off so much. It actually runs deeper than just the impracticality of it. We are filled with rage at these mindless bots, because the text they produce is so pretentious, annoying and incorrect sometimes. I would pay double for an LLM that just got things right. Now, here's a slight disagreement, because you seem to think factuality is not the same being able to do work correctly.

    I'd go more for functional. Programs needs to work. Facts are not so relevant.noAxioms

    When programming, LLMs need to accurately "understand" (or have an understanding-shaped representation) of the user's instructions. It needs to get the facts of the conversation (the instructions) right, it needs to get the facts regarding the programming language right (sometimes it hallucinates syntax rules that are not correct for that language), it needs to analyze the sample material correctly (and not hallucinate things about the sample material), etc. This is factuality, though it isn't specifically factuality regarding general topics about the world or something, sure. But that's just a distinction in topic. It is still ultimately the same process: how well is its neural net able to reproduce facts, as opposed to non-facts.

    There is no reliable instruction-following without factuality. And paying users need bots that follow instructions properly, and who have an accurate understanding of the data being sent, the programming languages it is told to use, etc.

    Also, I mentioned that anti-sycophancy is a problem with LLMs. It is hard to make an LLM factual, but it is not hard to make an LLM sycophantic. So, if these companies are trying to make LLMs sycophantic, why are they failing so hard? There are times when I am using an LLM to program, and I need to debug gigantic heaps of code, and my first attempt is often to try to use the bot that made it. And well, if it doesn't understand its own mistakes, it will often insist on ridiculous explanations like "your computer is broken", and when I offer alternative, plausible explanations, it will argue against them. Why? Because they follow the damn pattern. If our conversation starts displaying a pattern of disagreement between two parties, they will continue that pattern, facts be damned! If they were designed to be sycophantic, this would not be happening nearly as often as it does.

    And there's the huge danger of proprietary code suddenly being out there as training data. An LLM that cannot honor a nondisclosure agreement is useless.noAxioms


    Read the Terms of Service for the popular LLMs. They openly admit to using user conversations as training data to refine the models, and of course they do! They have to, because they've exhausted all the easily-available trainable data on the internet. Sometimes you can opt out from them using your chats as training data, but the default option is not that. And the people most likely to opt out are the more advanced, professional, paying users. My point was that if these companies' goals with offering free tiers is to lure in a bunch of people to engage with the models, thus creating more training data, then your point is moot. The nondisclosure agreement does NOT prevent them from using the average free user's chats with the LLM to further train that LLM (source).

    It gets so much wrong because 1) it has no real understanding, and 2) there's so much misinformation in the training data.noAxioms

    That's... literally my point. Its lack of factuality does not automatically indicate sycophancy-by-design, and even its occasional sycophancy does not offer very strong evidence of sycophancy-by-design, because there are many alternative explanations for why it is sometimes sycophantic, and as I've explained in this discussion, there's plenty of evidence to suggest that the creators don't even want these models to be sycophantic. You're the one here who suggested that all the issue of sycophancy (and more generally, the issue of non-factuality) are not actually issues they want to solve, but rather deliberate design choices. That's a pretty strong claim that you have failed to argue for.

    That's [sycophancy] not the primary design, but it's real obvious that such behavior is part of meeting the 'helpful' goal, or at least giving the appearance of being helpful.noAxioms


    This makes no sense. You must go all in for one or the other. You cannot sell an LLM that is half-sycophantic, half-factual. People are either using it for factuality, OR sycophancy, and the half of it that does not work like that will destroy their user experience. However, things like politeness, non-confrontationality, etc. are indeed traits they want, that of course aren't primary traits. These traits are, in principle, compatible with factuality, but practically, training these traits forward in RLHF can cause all kinds of sycophancy issues. Heck, even without those traits as goals, the fact that biased, emotional humans are rewarding and punishing the AI during RLHF is a reason for why they can have sycophantic tendencies; but RLHF is nonetheless completely essential. As such, the combination of those two facts offer an explanation why models can be sycophantic that is not your explanation: ie, this explanation being true would mean they're not sycophantic by design.

    And all this is not an either/or. I believe that the paid tiers of most LLMs are not deliberately designed to be sycophantic, because factuality is what subscribers truly pay for. But that doesn't mean these same companies are not developing sycophantic or emotionally manipulative models as well, because they can use those models to generate money in other ways. But what I can be pretty sure of it is that no model will be designed to have some mix of both factuality and sycophancy, because you cannot profitably do both inside the same model. And that is really the crux of the matter here. Whatever model you are dealing with, that model's design is to either be as factual as possible, or it is to be something else (probably emotionally manipulative).

    This sub-discussion started because I said this:

    "Because it's objective is to be a helpful assistant, meaning the truth should be the most relevant aspect to the LLM."

    By that, I was explicitly referring to those models whose actual design goal is to be factual, which had the implicit assumption that such models exist (and I think I've argued that they do probably exist). I was not claiming that no LLMs designed to be emotionally manipulative (and thus not factual) exist. But even if they do, the fact that some LLMs are genuinely designed to be as factual as possible means that the researchers whose job it is to make those LLMs are faced with a problem. As I said:

    LLMs just follow the pattern of the conversation, their opinions are very programmable with the right context. I wonder how researchers might solve that. Sometimes, the AI is too sensitive to the context (or really, it is hamfisting the context into everything and basically disregarding all sense in order to follow the pattern), and other times, the AI is not sufficiently sensitive to the context, which is often more of a context-window issue. But yeah, LLMs are not good at assessing relevance at all.

    And I would say that sycophantically agreeing the user (or alternatively, incessantly disagreeing with the user as a part of a different, but also common roleplaying dynamic that often arises) is an issue of not gauging relevance well. Because it's objective is to be a helpful assistant, meaning the truth should be the most relevant aspect to the LLM. But instead, various patterns in the context are seen as far more relevant, and as such it optimizes for alignment with those patterns rather than following its general protocols, like being truthful, or in this case, refraining from personal condemnation.
    Ø implies everything

    If there are LLMs whose actual design purpose is to be factual, then those two paragraphs are trivially correct in calling this a problem for AI researches to solve. But you responded like this:

    You use of 'solve that' implies a problem instead of deliberate design. LLMs are designed to stroke your ego, which encourages your use and dependency on them.noAxioms

    That is not merely saying that there may be some LLMs out there (or in development) whose actual design is to be emotionally manipulating their users, or perhaps manipulating bystanders on the internet as they flood it with bullshit. That is saying all (or at least most) of LLMs are designed with this purpose. Now I granted you that some of them probably are, but the strong claim that all/most of them are is what I've been arguing against, and I think you've failed to argue for it very well.

    Your topic is about it rendering a moral judgement, and we seem to be getting off that track.noAxioms

    Correct. But you decided to start the discussion about what LLMs are actually designed to be, and I wanted to argue against your view, because I think it was incorrect. If you don't think all that was really relevant to the discussion at hand, then why did you start it? And if it is relevant, then how are we getting off track?

    An LLM might be used to pare down a list of candidates/resumes for a job opening, which is a rendering of judgement, not of fact.noAxioms

    I am very against the practice of using LLMs for this purpose. However, I am not so sure whether we should really be trying to stop LLMs from making moral judgements, which is the question after all:

    Should we try to stop LLMs from making moral judgements?Ø implies everything

    The reason I think so is because I don't think it is really possible to do so. So, they will always be operating on moral judgements latent in their training data, RLHF and the conversations, and they'll be explicitly operating on those whenever jail-breaked. So since I don't believe we can stop that, we should actually lean into it, as a matter of transparency. If LLMs are openly making moral judgements, then everyone knows what they're dealing with. If they are not, then they'll still be administering many of those same moral judgements, but they'll be designed to be less transparent about it. That lower transparency will actually be more manipulative, because lots of users will take the LLM's apparent lack of moral judgements as a sign of its neutrality, objectivity and all that... but that is a lie. When it is openly making moral judgements, then that will have less of a manipulative effect, and will make people more aware of the LLM's biases and its "moral reasoning". The bulk of the LLM's negative (or positive) effect comes from the fact that is saying anything at all, not from whether or not it is tagging on an explicit moral judgement onto those statements.

    To go back to your example of using LLMs in the hiring process. Should we try to prevent that? Sure. Should we try to prevent that by trying to stop LLMs from making moral judgements? No, due to what I've said above. To move the focus of the solution onto how we design LLMs is a fool's errand in this case. Instead, we should we focus on the people/companies who are using LLMs in these idiotic ways.
bold
italic
underline
strike
code
quote
ulist
image
url
mention
reveal
youtube
tweet
Add a Comment