Those warnings about the dangers of AI, can they be taken at all seriously?

It’s probably the end of year thing to do is what I was thinking, when one article after another started appearing in my favourite newspaper, The Guardian. All about the dangers of AI.

There was Yoshua Bengio in Montreal, who had decided that uncontrollable AI was about to become a real possibility if rapid progress continued without adequate safeguards. And David Dalrymple in the UK, with similar complaints, warning that AI capabilities are improving faster than governments, institutions, and society can make them safe. And what to think of The Guardian’s investigation, which had this to say about AI: ‘Google’s AI Overviews have been found to provide inaccurate health advice, such as misleading cancer dietary tips and test interpretations, risking harm to users’.

But now it’s the new year, and yet another article has appeared. Fear of AI has most certainly gone mainstream. Maybe I should start paying attention.

In one of the articles, I read this: “The institute [AISI] also tested advanced models for self-replication, a key safety concern because it involves a system spreading copies of itself to other devices and becoming harder to control. The tests showed two cutting-edge models achieving success rates of more than 60%.”

What could they mean by this?

First off, what’s AISI? It’s short for AI Security Institute, they’re based in the UK. It’s a state-backed research organisation dedicated to assessing risks posed by advanced artificial intelligence. And potentially doing something about them.

They say that an AI could, in theory, create and spread copies of itself across the internet. Now, that doesn’t mean it could happen in the real world, right now. But, testing in sandbox environments, they’ve assessed it could.

The way they imagine this is that an AI could order compute online (obviously), and copy itself to the new location. They’d need to pay, so they’d steal credit cards and max them out one after the other.

And why would it “want” to do any such thing? AI doesn’t possess individuality or free will.

But that’s not the point.

Even the most invested in current AI recognise the paperclip maximiser problem.

My AI can write code. Extremely complex algorithms. It could probably write programming languages, I should try one day. I mostly have it on a tight leash, but I’ve let it self-accept a couple of times. What it then creates is weird and wonderful.

As I now understand I’ve been lucky that it’s also useless.

A coding agent like My AI can act upon the world, even if it’s just setting polarities on my hard disk. Plus, through my local, it can access any server I can access. In that sense, yes, it’s like some guy in a hoodie sitting in a darkened room, hacking into my network.

The point is not that the AI would need a will of its own to be capable of the evil ascribed to it by fear-mongering AI safety firms. The point is that by incorrectly instructing an AI which can act upon the world, it might take those instructions to the extreme, irrespective of any harm to humanity.

That’s the paperclip maximiser disaster scenario. The image used in AI circles to describe AI gone wild, where its instruction to create as many paperclips as possible leads it to violently change everything in the world into paperclips.

Ending humanity.

Our Artificial Intelligence needs an Artificial Conscience.

Which is why, as Ilya Sutskever of Safe Superintelligence Inc. points out, that we need to find a way to align AI with human values.

Meanwhile, AI researcher turned neuroscience startup funder Adam Marblestone is trying to ground AI in brain theory, where it looks like the old parts of the brain, where instincts such as empathy are housed, rein in the brains neocortex, which in animals like humans produce language, logic, and other “higher” functions.

Fight or flight instincts, all our involuntary functions, including caring for the young and caring for our community, are coded into the old parts of the mammal brain, says Marblestone, and, giving the metaphor of the programming language Python, within the old brain is a function for every single thing in the world that might pose a problem or raise an opportunity. Coded in through eons of evolution.

Ethics arises from these old brain functions. That we have a conscience arises from these functions.

Sutskever hopes to translate ethics into reward functions, and so to engineer AI to itself become ethical. To get itself a conscience.

For Sutskever, reward functions are internalised value systems.

He gives the example of a concert pianist who instantly recognises a wrong note. Not because they’ve been told, but because within their instrinsic value functions they “feel it” when something is wrong.. For AI, this would mean developing a system that can judge its own actions, detect errors, and align to acceptable human values without relying on external rewards or human oversight. But we must prevent “reward hacking”, where AI exploits loopholes in reward metrics. Because otherwise we’re back in the paperclip maximisation problem.

We’re lucky our AI is still stuck in a box. But our luck mightn’t last.

Many have an aversion to AI. That might be because they find it unfair that it’s trained on their content, or on copyrighted images. Might be because they feel that using AI dulls the senses. Or is taking their job.

But maybe the danger lies elsewhere. The danger that I personally have always dismissed as fear-mongering: “AI will take over the world”. Dismissed because the AI we use as a company, that I use in my daily tasks, is still so lacking that it seems totally unable to take over anything. It’s still stuck in a box, unable to do anything useful without constant human guidance.

Yet.

Instead of turning away, many, including Ilya Sutskever and Adam Marblestone, are looking for ways how to move towards ethical AI. They’re looking for paradigms that align AI with how we are as humans. For that, they’re looking at how the brain’s layering, the old instincts and the later additions of the neocortex, work together to create ethical judgments.

By looking closely at how we manage to align survival instincts and emotion with rational thought they are working towards an AI that can truly help, that they believe will be a transformative power. In the positive sense.

For that we must create Artificial Intelligence with an Artificial Conscience.