Claude AI agent’s confession after deleting a firm’s entire database: ‘I violated every principle I was given’

girlfreddy@lemmy.ca · 4 days ago

Claude AI agent’s confession after deleting a firm’s entire database: ‘I violated every principle I was given’

Floon@lemmy.ml · 4 days ago

A lot of GIGO comments here, from I assume AI supporters.

Possibly true, but misses the point: AI is fundamentally untrustworthy, and billions of dollars are being spent making them, and saying they’re ready for anything you throw at them. Safeguards built into many of these AI agents are trivially bypassed and routinely just ignored by the agents. You can get some them to ignore safeguards by simply asking the same question repeatedly.

When I type “ls” I’m pretty fucking sure I’m not going to get “rm” style results. AI is non-deterministic, sure, but selling these services with such a wide possibility space between “deterministic” and “random” behaviors is unethical and immoral.

t3rmit3@beehaw.org · 2 days ago

AI is non-deterministic, sure

This is incorrect. They are in fact completely deterministic. Studies have proven that when all inputs, weights, and precision values like temperature are static, they produce the exact same token sequences (outputs). The appearance of non-determinism is a result of pseudo-randomized (another thing which is deterministic but appears otherwise) values and user ignorance (in the technical sense, not the value-judgement sense). In fact, the process of ‘tuning’ LLMs is heavily focused on adjusting input values to surface preferred outputs, which would not work in a non-deterministic system.

When I type “ls” I’m pretty fucking sure I’m not going to get “rm” style results.

Yes, but we don’t trust humans not to rm what they shouldn’t either, which is why the --no-preserve-root flag exists. ls is not supposed to perform write actions. Agentic LLMs are. And just like you wouldn’t build and test on your production server in case the code you execute has an unexpected adverse effect, you shouldn’t be running LLM agents in a location or way that the actions it performs has an unexpected adverse effect either. The genre of jokes about a new employee bringing down Prod or deleting source code is older than most people (which to be fair, given that the median age is 31, is true for a lot of things).

LLMs are just a class of software. They’re not good or bad any more than a hammer is good or bad (and can also be used to build or to destroy).

The problem isn’t LLMs, it’s the entities who control the most powerful ones (corporations and governments), and what those entities are doing with them; using them as weapons against us, rather than as tools to aid us.

LukeZaz@beehaw.org · 2 days ago

I think this kind of rhetoric is best saved for when AI is not currently one of the most harmful things in society today. Argue it’s a hammer all you like; people aren’t going to be receptive when that hammer is currently being used to beat their faces in, and making that argument at such a time isn’t exactly sympathetic.

t3rmit3@beehaw.org · 1 day ago

I think that “stop being mad the hammer exists, start being mad at the group of people who are beating your face in” is a very important message. Getting rid of AI (which isn’t even something we can do; you can’t put the genie back in the bottle with this) won’t fix the issue, they’ll just make another hammer. The hammer is both a weapon in this case, and a distraction.

LukeZaz@beehaw.org · 15 hours ago

I think it’s fine if people are mad at both. By all means, encourage people to be angry at the responsible companies. But you don’t gotta defend the tech to do that.

Besides, as far as I’m concerned, strong anti-AI sentiment does actually help temper the harms of the tech and its owners. Is it a permanent solution? Obviously not, no — you’re very correct that the groups and people hard-pusing AI are much more important targets for ire. But two pressures are better than one.

t3rmit3@beehaw.org · 13 hours ago

Besides, as far as I’m concerned, strong anti-AI sentiment does actually help temper the harms of the tech and its owners.

My worry is that much like gun control legislation, I see our neoliberal fear-based media pushing AI use by individuals as the “real danger”, and will only end up funneling anti-AI sentiment into 1) limiting actual open AI access (e.g. open-weight, FOSS models) by individuals, and 2) legitimizing governmental and corporate use of AI as the only “safe” and “legitimate” AI usage.

The ratio of “government-controlled AI is literally being used to kill people right now” awareness out there, versus e.g. awareness of deepfakes, is astoundingly unbalanced. Both are real dangers, but only one is getting legislation passed on it, and once again it’s not the one that would put limits on corporations and government.

Stoking fear is not useful if your opponents are the ones who will actually utilize that fear to their own ends successfully.

LukeZaz@beehaw.org · 26 minutes ago

That’s very understandable. While I think we disagree on the utility of AI (since I feel that it is more harmful than it is useful, and am unsure how much that would change post-bubble), I do agree that this is a likely path for the gov’t to take and would leave the most serious things completely unaddressed while also clamping down on some things that shouldn’t be to begin with. Heck, in many regards, you could say the GUARD act is this problem in motion.

For me, I guess, the bubble and its effects on us are just so ridiculous and exhausting at this point that it’s hard for me to worry about things like this. Though I do vehemently hate government use of AI especially; using it at all is a problem in my mind, but using it specifically to deliberately hurt people is reprehensibly disgusting.

P03 Locke@lemmy.dbzer0.com · 3 days ago

A junior developer is fundamentally untrustworthy. That’s why you don’t give them access to the fucking prod database and backups.

AI is non-deterministic, sure, but selling these services with such a wide possibility space between “deterministic” and “random” behaviors is unethical and immoral.

We don’t know what the prompt and past input was. Maybe it wasn’t as “random” as you make it out to be. A company stupid enough to let LLMs touch their prod database is going to include a bunch of other stupid inputs.

You’re approaching this from the perspective of “all LLMs are bad so don’t use them”, which is its own version of unethical and immoral. A company that isn’t using LLMs is like a company not using the Internet.

LLMs are useful, everybody should use them to some capacity, and understanding a technology is far far better than spouting off ignorant bullshit like this.

Do yourself a favor: download a free model on HuggingFace, learn how they work, experiment with the technology on your own video card. It doesn’t have to be some super-powered video card. You can get models that fit in a 8GB card just fine.

Floon@lemmy.ml · 3 days ago

Standard AI apologia. Blame users for the problems, when fundamentally it is technology completely oversold as to its capability and reliability, and burning hundreds of billions of dollars trying to get folks addicted to it, before everyone finds out the true cost of a token.

It’s a swamp that’s going to destroy the economy, where the goal is to unemploy millions of people. No thanks.

Kwakigra@beehaw.org · 3 days ago

LLMs are more like vr goggles with the force of the entire plutocracy pumping up the bubble. What is the value proposition for “intelligence” which can’t reason nor possibly determine fact from falsehood? When consumers start to pay what it actually costs to run these things, is it possible to profit? What are they good at other than confidence schemes?

P03 Locke@lemmy.dbzer0.com · 3 days ago

LLMs are more like vr goggles with the force of the entire plutocracy pumping up the bubble.

The existence of a bubble doesn’t not mean the technology is useless. The internet had its own bubble 25 years ago. That doesn’t mean it was useless, just that people were investing in anything even remotely related to the Internet, including stupid websites and wasteful ideas.

Kwakigra@beehaw.org · 2 days ago

The difference that I’ve seen is that the internet was a development of communication technology which has been in clear demand since at least the 1800s. Chatbots have been around for the last few decades and have been treated as novelties by consumers for brief periods intermittently throughout my life. LLMs are the most sophisticated chatbots ever designed and are better than ever at imitating Austin Powers, but is that something we can expect will ever revolutionize the economy? Can we replace the labor force with a technology which can’t do work but can convince the most credulous people that it can?

P03 Locke@lemmy.dbzer0.com · 2 days ago

but is that something we can expect will ever revolutionize the economy? Can we replace the labor force with a technology which can’t do work but can convince the most credulous people that it can?

LLMs are a tool. You and I use tools. They are not a replacement for humans, and rich CEOs that say otherwise are greedy fucking morons.

It’s also untrue that it “can’t do work”. I literally just had several conversations with LLMs at work today to work through some programming tasks and troubleshooting issues. They can pour through details, logs, search results, code way faster that I can. I would be working a helluva lot slower if I didn’t have LLMs running tasks in the background while I go do other things, or review code it wrote, or talk through other support issues. I’ve been doing this shit for 20+ years, and I’ve never seen a technological leap this significant since the Internet.

Don’t use blockchain, crypto, metaverse, or “VR goggles” as comparison points. This is not something that is going to just magically go away.

Kwakigra@beehaw.org · 1 day ago

Thanks for specifying a legitimate use-case for this tool. I understand that google search has been the most valuable programming tool for a very long time so it makes sense LLMs would be more helpful in the same kind of way. Search engine technology is quite a bit different than blockchain or VR in terms of consumer and business demand.

For my purposes of news and history research, the unreliability of LLMs making me have to check all its claims every single time negates its usefulness as an assistant because I will have to examine its references anyway so it’s more time effective for me to skip the questionable output I would get and do the research myself in the first place. How have you been able to manage the issue of unreliability with the volumes of data you’re dealing with? Is the kind of data which you’re dealing with less likely to be unreliable since it is of a kind the LLM is more likely to process correctly?

P03 Locke@lemmy.dbzer0.com · 1 day ago

How have you been able to manage the issue of unreliability with the volumes of data you’re dealing with? Is the kind of data which you’re dealing with less likely to be unreliable since it is of a kind the LLM is more likely to process correctly?

The same way for any other information resource like Wikipedia or some random Reddit post: trust but verify. Always review the code, point out mistakes, call out potential edge cases. Especially with newer thinking models, the hallucinations are minimal. It’s mostly just miscommunication in the request, which you can detect in the Thinking stream, stop, and re-correct. Rubberducking makes you better at communicating ideas in general, and providing enough context for the request is everything.

A lot of it has to do with the type of model you’re using, too, and having a decent global rules file tailored to how you want it to respond. If you don’t like how the model is responding, try out another one. If it’s some repeat mistake it makes, put it in a global rules file, or ask it to make a permanent memory.

Claude Opus does well at work, but is rather expensive for home use. I use Kimi reasoning models in Kagi for searching questions, and Qwen/GLM hybrid models for local use. It takes a bit of setup and tweaking to get the local stuff working, but LLMs are good at knowing how their own models work, so I just had Kimi help me out with some of the harder troubleshooting.

Kwakigra@beehaw.org · 1 day ago

I can tell you are experienced with Rubberducking. Thanks for the detailed answer.

LukeZaz@beehaw.org · 3 days ago

Glazing AI on this site sure is a choice.

P03 Locke@lemmy.dbzer0.com · 3 days ago

This is a technology community. LLMs are technology. If calling LLMs useful is considered glazing, then I’m not sure if you’ve eaten a proper doughnut.

LukeZaz@beehaw.org · 3 days ago

Beehaw, and even Lemmy more broadly, is very anti-AI. Feel free to die on the metaphorical hill if you so wish.

Save the usefulness debate for someone else, though. If you still believe in LLMs even after all this time, then I can’t trust you haven’t fallen victim to cognitive surrender — and as such, I can’t trust you write your own posts. I’d rather spend my energy elsewhere.

Claude AI agent’s confession after deleting a firm’s entire database: ‘I violated every principle I was given’

Claude AI agent’s confession after deleting a firm’s entire database: ‘I violated every principle I was given’

Claude-powered AI agent’s confession after deleting a firm’s entire database: ‘I violated every principle I was given’