Claude AI agent’s confession after deleting a firm’s entire database: ‘I violated every principle I was given’

girlfreddy@lemmy.ca · 4 days ago

Claude AI agent’s confession after deleting a firm’s entire database: ‘I violated every principle I was given’

t3rmit3@beehaw.org · 2 days ago

AI is non-deterministic, sure

This is incorrect. They are in fact completely deterministic. Studies have proven that when all inputs, weights, and precision values like temperature are static, they produce the exact same token sequences (outputs). The appearance of non-determinism is a result of pseudo-randomized (another thing which is deterministic but appears otherwise) values and user ignorance (in the technical sense, not the value-judgement sense). In fact, the process of ‘tuning’ LLMs is heavily focused on adjusting input values to surface preferred outputs, which would not work in a non-deterministic system.

When I type “ls” I’m pretty fucking sure I’m not going to get “rm” style results.

Yes, but we don’t trust humans not to rm what they shouldn’t either, which is why the --no-preserve-root flag exists. ls is not supposed to perform write actions. Agentic LLMs are. And just like you wouldn’t build and test on your production server in case the code you execute has an unexpected adverse effect, you shouldn’t be running LLM agents in a location or way that the actions it performs has an unexpected adverse effect either. The genre of jokes about a new employee bringing down Prod or deleting source code is older than most people (which to be fair, given that the median age is 31, is true for a lot of things).

LLMs are just a class of software. They’re not good or bad any more than a hammer is good or bad (and can also be used to build or to destroy).

The problem isn’t LLMs, it’s the entities who control the most powerful ones (corporations and governments), and what those entities are doing with them; using them as weapons against us, rather than as tools to aid us.

LukeZaz@beehaw.org · 2 days ago

I think this kind of rhetoric is best saved for when AI is not currently one of the most harmful things in society today. Argue it’s a hammer all you like; people aren’t going to be receptive when that hammer is currently being used to beat their faces in, and making that argument at such a time isn’t exactly sympathetic.

t3rmit3@beehaw.org · 1 day ago

I think that “stop being mad the hammer exists, start being mad at the group of people who are beating your face in” is a very important message. Getting rid of AI (which isn’t even something we can do; you can’t put the genie back in the bottle with this) won’t fix the issue, they’ll just make another hammer. The hammer is both a weapon in this case, and a distraction.

LukeZaz@beehaw.org · 11 hours ago

I think it’s fine if people are mad at both. By all means, encourage people to be angry at the responsible companies. But you don’t gotta defend the tech to do that.

Besides, as far as I’m concerned, strong anti-AI sentiment does actually help temper the harms of the tech and its owners. Is it a permanent solution? Obviously not, no — you’re very correct that the groups and people hard-pusing AI are much more important targets for ire. But two pressures are better than one.

t3rmit3@beehaw.org · 9 hours ago

Besides, as far as I’m concerned, strong anti-AI sentiment does actually help temper the harms of the tech and its owners.

My worry is that much like gun control legislation, I see our neoliberal fear-based media pushing AI use by individuals as the “real danger”, and will only end up funneling anti-AI sentiment into 1) limiting actual open AI access (e.g. open-weight, FOSS models) by individuals, and 2) legitimizing governmental and corporate use of AI as the only “safe” and “legitimate” AI usage.

The ratio of “government-controlled AI is literally being used to kill people right now” awareness out there, versus e.g. awareness of deepfakes, is astoundingly unbalanced. Both are real dangers, but only one is getting legislation passed on it, and once again it’s not the one that would put limits on corporations and government.

Stoking fear is not useful if your opponents are the ones who will actually utilize that fear to their own ends successfully.

Claude AI agent’s confession after deleting a firm’s entire database: ‘I violated every principle I was given’

Claude AI agent’s confession after deleting a firm’s entire database: ‘I violated every principle I was given’

Claude-powered AI agent’s confession after deleting a firm’s entire database: ‘I violated every principle I was given’