This post is part 3 of a series and assumes you’ve read the introductory post. It does not directly depend on the previous post as that post explored how these systems work while this one is is about some things to be aware of when using them.
In discussing limitations, it should be noted that (a) this is not an exhaustive list: there are limitations I’m omitting; and (b) this is not a universal list: there may be some specific instances that lack one or another of these limitations, though most systems I’ve used do have them.
Cognitive science suggests the brain recognizes patterns using deeply nested hierarchies. If you can recognize a grey grunt on sight that’s not because you saw so many pictures of them that your brain learned the patterns in the pictures themselves; rather, you learned to interpret images as representations of 3D shapes, and to recognize the foreground shapes as more important, and the common shapes of fish, and the shape and color patterns of grunts as a genus of fish, and the specific nuances of grunt that are typical of the grey grunt species.
This hierarchy is useful in many ways. It facilitates conversations between people with different knowledge and helps scaffold learning; indeed much of education is based on adding one layer to the hierarchy at a time. But more importantly it puts bounds on mistakes in two important ways.
First, mistakes tend to be limited to the end, most detailed parts of the pattern hierarchy. I might mistake a grey grunt for another kind of grunt or even another genus of fish altogether, but I’m not going to mistake it for an octopus, refrigerator, or meadow. And in the unusual cases where something can fool us across a large part of the hierarchy we can identify the camouflage or illusion that made that possible.
Second, the root levels of the hierarchy are fairly universal regardless of our individual experience. Even if I’ve never seen a llama and you’ve never seen a horse, we’ll both recognize them as animals and both agree that they are broadly similar to each other. The existence and universality of the mental hierarchy of ideas and perception ensures a shared context and basis for communication.
Most current AI systems do not have such a hierarchy, or have an overly flat hierarchy instead of a deep and meaningful one. They might be able to fix your grammar without having any notion of a sentence or recognize a grey grunt by noticing patterns in the number of times the sign of the gradient of the color intensity changes across the image instead of by first identifying the distinct objects in the image and then catagorizing each one.
Because of this, their errors are nonsensical, violating the parts of our cognitive hierarchy that we never violate. I might occasionally misjudge the distance to a truck or even think a truck is a shed instead, but an AI system might think the truck is a flock of seagulls or the clear blue sky. The AI might be wrong less often than I, making correct high-level decisions with good outcomes more often than I, but when it is wrong I’m unlikely to be able to make sense of why or how it is wrong.
Data-driven algorithms, which most in-the-news AI systems are, typically distinguish between the training data, which is used to find the patterns the system will use, and the live data that is responded to and generated when the AI system is running. The accuracy and flexibility of modern AI systems is directly dependent on the quantity of training data, so it is common to use all the training data that you can find. But training data is not balanced: it usually comes from public durable postings on the Internet and is hence lop-sided towards the kinds of content and people that post more durable public content online.
Imbalances in training data have many impacts, but I’ll highlight just three.
AI systems tend to have differential accuracy based on the quantity of training data. If I train a facial recognizer on 10× as many images of light-skinned people as dark-skinned people I’d expect 10× accuracy in recognizing light-skinned people than dark-skinned people. More generally, the further your use-case is from the most common use-cases the less accurate and helpful the tool will be.
AI systems tend to reflect biases in the training data. If your company disproportionately hires young men then AI hiring tools will recommend it continue to hire young men. If your police force disproportionately patrols and is less tolerant of dubious behavior in one neighborhood than in other neighborhoods then AI policing recommendations will send more police there. If women get less pay for the same work then AI-driven compensation plans will continue that trend. And so on.
AI systems often amplify imbalances in data they produce. Generative AI treats the unexpected as a proxy for bad and generally goes with the most common option. Ask it to generate a story about a profession where 80% of the training data used he
pronouns and you’ll get back he
pronouns far more than 80% of the time. Ask it to create an image of a woman and it will tend to give you a picture with the facial features common in the models that provide a disproportionate number of the images online. And so on.
No single problem caused by lop-sided training data is inevitable; once we notice a specific problem we can add a specific counter-model to correct it11 This correction is not always palatable; for example, to counter differential accuracy we either have to add a lot more training data or artificially reduce the accuracy of the common case to match the uncommon case.. But we can’t fix the problem overall: it’s intrinsic to having lop-sided training data to begin with.
Some AI systems (and some other computer systems too) are designed to adapt to user behavior. But different ways of doing that have major differences in scope and long-term flexibility.
Some AI systems, but more commonly non-AI systems, learn the way a Mad Libs book learns: they have templates with various blanks we can fill in for them. These range from splash screens that say Hello, 〈user name〉
when you log in to entire chat programs like ELIZA. Regardless of the number and complexity of the templates, they are limited to the equivalent of simple find-and-replace operations.
Many AI systems learn the way you or I would when playing twenty questions. Their knowledge of the world is fixed; all it is learning is what part of that knowledge you’re thinking about. When I put a prompt into an AI-based text or image generator, for example, or type a misspelled word to send to a spell-checker, this is the kind of learning
it is doing. The versatility and adaptability are entirely determined by the size of the trained model and amount of data it was trained on; no amount of clever phrasing or long explanation can give it information it doesn’t already have.
AI systems can also do what’s called on-line learning
, meaning they are adding new patterns to their repertoire while they are using those patterns to do whatever they are designed to do. This is not very common in the news-making AI systems of the day22 For example, the P in GPT stands for pretrained
and means it is not doing any on-line learning., but is common in some other contexts such as training robot control systems, typically because the only way to (finish) creating the training data is to use the model we have.
I bring this up in a post about limitations because it can create incorrect perceptions and expectations. The relatively superficial kind of learning
and adapting
we experience is easily mistaken for the much-less-common ability to add our own training data as we go. This is one of several illusions and misunderstandings that leads people to anthropomorphize systems and expect them to be able to do things they cannot.
All that AI systems know
are patterns in the training data, nothing actually meaningful. If I ask them to write a research paper they’ll create text that matches as many patterns in research papers as they’ve found: it will look like research, it will be presented like research, but it won’t be backed by any actual research. Citations will look like citations, and depending on how the AI was trained maybe be actual citations that appeared somewhere in the training data, but won’t defend the points the text is making. And so on.
We say an AI system is hallucinating
when it produces something that looks like an assertion of truth but in fact asserts something false or nonsensical. Arguably it is us, the users, who are hallucinating that the AI knows anything: after all, when it gets it right it was only by chance because it had the right answer so many times in the training data that expressing the right answer was a better fit for the patterns it detected than was expressing a wrong answer or no answer at all.
A subconscious mind, not matter how broad its experience, never substitutes for a conscious mind.