Tuesday, June 30, 2015

Image Classification and Semantic Symbiosis

What is this graphic depicting?



Most of us would recognise this as the Apple logo with the "bite" replaced with the profile of Steve Jobs.

Let's see how clarifai - a convolutional neural network based image recognition (deep learning) service - does on this task?



Perhaps surprisingly, not very well.

One simple reason this network failed might be that the training set did not include the Apple logo. Maybe a better trained network would have recognised it?

But even if a network did recognise the image as the Apple logo, this would not be the correct answer. The image is NOT the Apple logo.

Could any network correctly recognise the image? The profile is probably not anatomically correct, and almost certainly not distinctive enough to be recognised as Steve Jobs on its own: if it was presented without the surrounding apple, it would probably be very difficult to identify. I doubt if any network could make the correct identification.

So why is it so easy for humans to recognise? Because humans can use their cognitive inference on top of visual features to identify what an object is. We initially recognise the image as the Apple logo because of its visual features, but then notice what looks like a face inside it. It is not a picture of a face but a shadow, a caricature highlighting the salient features of what we know to be the late Steve Jobs. The caricature's salient feature is the little round glasses - a feature which has been used to comic effect elsewhere, as in this picture of Bruce Lee "being" Jobs.



It is a pretty impressive feat to see those little lines in front of the silhouetted face and interpret them as Steve Jobs with his round glasses.

This example illustrates that human perception and classification of images does not rely just on visual features. There is a complex relationship between visual features and cognitive interpretive processes. What something looks like doesn't necessarily tell us what it is. The Steve Jobs Apple shows how little accurate visual input we sometimes need to recognise objects. Visual features are a clue to what an object is, but don't define our concept of it.

The psychologists Susan Gelman, Ellen Markman and John Coley studied adults and young children (all the way down to the age of 2) to see how much importance the perceptual features of objects play in their understanding of what the object is. (We will only talk about the adult version of the study, but the results were similar in children). In one study they used picture triads in which two objects from one category looked dissimilar, but a third object from a different category looked similar to one of the other two objects. For example the image below shows two dissimilar looking birds (Flamingo and Blackbird) and a Bat (top right):


In a simple similarity judgement the Blackbird and the Bat were judged most similar by a group of 20 Stanford undergraduates. But then in the second stage some verbal information was added to the pictures -

Underneath  the flamingo  was written: “This  bird’s  heart  has a right  aortic arch only.”  Underneath the bat was written:  “This  bat’s heart has  a left  aortic arch only.” Underneath the blackbird  was  written:  “What  does this bird’s heart have?”

Overwhelmingly the students said that the Blackbird's heart had a right aortic arch, just like the heart of the dissimilar looking Flamingo. Perceptual similarity was ignored in the inference.

But it was not the case that people blindly followed the textual description all the time. In another condition the experiment asked for judgements about an irrelevant attribute introduced into the image. For example the experimenter could place a blue dot underneath the Flamingo and a red dot underneath the Bat and ask what colour dot would go below the Blackbird? The students answered this in a random fashion. Finally, the students were asked about a feature that was likely to correlate with perceptual attributes, like how heavy the object would be. In this condition the participants said that the Bat and the Blackbird would be of similar weight, as predicted by visual similarity.

What these studies shows is that people (and even 2 year old children) understand that perceptual similarity is only a clue to what objects are. When the perceptual input is contradicted by explicit and sensible information about what an object is, then people readily abandon their perceptual biases and draw inferences based on conceptual knowledge. When that knowledge tells them to use the visual properties (as in weight estimation) then they readily return to this, but when the perceptual features are irrelevant or misleading (as in deep biological properties) then they simply ignore them.

This complex relationship between visual features and classification can result in vast individual differences when humans are asked to label images. For example a simple picture of a cat can be labeled as animal or as a pet. Neither answer is correct, or incorrect. Being an animal and a pet are not two different things, and they certainly can't be differentiated by visual features. There is a complicated conceptual relationship between pet and animal which a visual feature based neural network does not, and can not be expected to understand. Most researchers know this, and as a result they carefully prepare training sets to avoid overly tricky situations. For example, in a recent paper which presents one of the most successful applications for answering questions about the content of an image, the researchers summarise their training procedure like this: "To make the labeled question-answer pairs diversified, the annotators are free to give any type of questions, as long as these questions are related to the content of the image. The question should be answered by the visual content and common sense (e.g., we are not expecting to get questions such as “What is the name of the person in the image?”). We only select a small number of annotators (195 individuals) whose annotations are satisfactory (i.e. the questions are related to the content of the image and the answers are correct).  We pick a set of good and bad examples of the annotated question-answer pairs from the quality monitoring dataset, and give them to the selected annotators as references. We also provide reasons for selecting these examples. After the annotation of all the images is finished, we further refine the dataset and remove a small portion of the images with badly labeled questions and answers." In other words the training set is carefully curated to ensure success. But this leaves out the richness of human interaction with he world, and creates an artificial distinction between "good" and "bad" annotation.

How can the Semantic Symbiosis view help? Consider again the example about deciding if a picture of a cat should be labeled as cat or pet. We have already noted that there are no visual features about the animal itself to distinguish between the two. So a neural network will recognise the visual features as the most highly associated interpretation in the training set. Probably a cat. (Unless its Google Photos in which case it might be a dog!)

But it might be possible to infer additional interpretations by making an educated guess. What kind of visual clues can lead us to conclude that there is a pet in the picture? Is there a human close by? Is the human smiling? Is there contact between the human and the animal? Is the animal in a house? If the answer to some of these questions is "yes", we could infer that the animal in the picture was also a pet. The computer can't ask these questions on its own, because it does not have a rich semantics about concepts. But it does have an excellent ability to locate visual features within pictures, that could answer the questions. Semantic Symbiotics is about getting those questions into the algorithm, either by encoding into some sort of knowledge base, or allowing the machine to ask for help if this is not possible. Either way, human semantics helps the computer and the computer's sheer power to ingest and generalise over volumes of data helps the human. Semantic Symbiosis.







Saturday, June 20, 2015

NLDB2015, How to Talk to a Cognitive Computer

I have just participated in the NLDB2015 conference, which is dedicated to research and practice involving natural language in information systems. I presented a discussion paper on cognitive computing, which was well received and generated a lot of discussion. The slides are on SlideShare.

The slides contain some interesting quotes that show the state of confusion about what cognitive computing might mean, with wildly contradictory claims, sometimes from the same person! Opinions range from "Watson is a Jeopardy! playing machine" to "Watson is an example of Strong AI".

My main technical point is that current approaches are generally not the right ones to capture conceptual semantics. Looking at distributional properties of words, for instance, can give very good results in certain tasks, but we should never confuse their representation with "proper" human-like semantics. For example, if a machine learns the distributional properties of the verb "kill", it could "understand" the sentence "The man killed the dog" and it could answer the question "Who was killed?". But it could not answer the question "Who died?". Or "Is the man a murderer?". This would require more elaborate structural knowledge about the word meaning, and in the case of the latter question, about the subtleties of inference among concepts.

If the Cognitive Computing enterprise was really Strong AI, then I would accept it if people would argue against me because they would want to prove that distributional approaches really are the way humans encode semantics, and one day their systems could answer the trickiest of questions. But if CC is not strong AI (and I don't think it is), then we don't have to be so stubborn. We can admit that the system is just a functional approximation to human cognition, and where it fails we can insert a cognitive hack to fix it.

This is the idea behind Symbiosis. Two different systems that work together. Emphasis on "different" because we acknowledge that each system will have limitations where the other won't, and the two systems must help each other.

As an example, computers are not very good at disambiguation. So they should ask us as often as possible to help them with the task. They might have a brain the size of a planet when it comes to pattern matching and overall storage, but they shouldn't be embarrassed that their language skills are not quite at the level of a three year old. If the engineers got over their insistence that they are building a "real" cognitive architecture and allowed us to help the machines more, then the machines might be able to help us more.

Wednesday, June 3, 2015

Cognitive Misinformation

Artificial Intelligence is big news these days. Bill Gates, Elon Musk and Stephen Hawking have famously expressed their belief in the power and consequent dangers of AI. The threat of super intelligent machines looms large, and there is a constant barrage of headlines like "Google a step closer to developing machines with human-like intelligence".

Companies like IBM, Google, Baidu and Microsoft have embraced the excitement and exploited it with positive futuristic visions of their own, developing useful AI systems. I was amazed at watching the Google I/O 2015 keynote, that almost half the time was spent talking about new capabilities powered by machine learning.

The problem, though, is that marketing can overtake the scientific facts, and important issues can become obscured to the point of falsehoods.

John Kelly, director of IBM Research has this to say in a Scientific American article
 ”The very first cognitive system, I would say, is the Watson computer that competed on Jeopardy!”
This is an impressive claim, but it is hard to know what he means by it, since the first cognitive system is surely the human cognitive system. We must assume he meant to say "first artificial cognitive system". But the problem there is that it is simply not true. There are many older attempts to build cognitive systems. SOAR for example.

SOAR is a research project which is a serious general theory of cognition. To quote: "Soar is a general cognitive architecture for developing systems that exhibit intelligent behavior. Researchers all over the world, both from the fields of artificial intelligence and cognitive science, are using Soar for a variety of tasks. It has been in use since 1983, evolving through many different versions to where it is now Soar, Version 9."

Noam Chomsky points out that, in a sense, what he does as a linguist is AI. His theory of language can be regarded as a computational system - a program - which models human linguistic capacity. It is a cognitive system.

Kelly probably meant something quite different by "cognitive system" ..  something more in line with an engineering sense, in agreement with Eric Horvitz, head of the Microsoft Research Redmond lab. Probably he does not claim that Watson is a model of human cognition (like SOAR is), but rather a very clever software which has some capabilities reminiscent of human cognition. That is, a machine that can perform tasks that normally only humans can, like make inferences from language, learn from facts, and form independent hypotheses. In fact in his own book Kelly admits that "the goal isn't to replicate human brains, though. This isn't about replacing human thinking with machine thinking. Rather, in the era of cognitive systems, humans and machines will collaborate to produce better results, each bringing their own superior skills to the partnership."

This is probably a good thing, since Watson is not a particularly good model of human cognition. In a highly discussed episode of Jeopardy!, Watson made a baffling mistake. The category was US Cities, and the clue was:  “Its largest airport was named for a World War II hero; its second largest, for a World War II battle.”  The two human contestants wrote “What is Chicago?” for its O’Hare and Midway, but Watson’s response was “What is Toronto?” (Which of course is not a US city), The rough explanation for this big mistake is that the probabilistic heuristics favoured Toronto after weighing up all the evidence. Watson did not "understand" any of the evidence, so it did not "realise" how important the category constraint is in this example, and allowed it to be swamped by the sum of all other evidence. John Searle puts it like this: "Watson did not understand the questions, nor its answers, nor that some of its answers were right and some wrong, nor that it was playing a game, nor that it won—because it doesn't understand anything."

But I can't end with this useful and happy story, that Watson is a clever machine doing useful things. The ghosts of "strong AI" persist. IBM's web site un waveringly says: "Watson is built to mirror the same learning process that we have—through the power of cognition. What drives this process is a common cognitive framework that humans use to inform their decisions: Observe, Interpret, Evaluate, and Decide.” In a more hostile spirit, Google's Peter Norvig attacks Chomsky's psychological theory of linguistics as being a sort of fiction, advocating instead that the current statistical models of language are in fact the correct scientific theories of language (and Chomsky himself some kind of deluded figure): "Chomsky ...  must declare the actual facts of language use out of bounds and declare that true linguistics only exists in the mathematical realm, where he can impose the formalism he wants. Then, to get language from this abstract, eternal, mathematical realm into the heads of people, he must fabricate a mystical facility that is exactly tuned to the eternal realm. This may be very interesting from a mathematical point of view, but it misses the point about what language is, and how it works." (my emphasis).

So we come to a difficult point. It seems that for the promotional materials to be right, for Norvig to be right, for Watson to really be the first Cognitive System, these computational systems are IN FACT HOW PEOPLE WORK. We need to erase the last 60 years of Cognitive Science. We need to fill the journals with new theories of statistical cognition.

The process is already weeding its way into people's consciousness. In the 2015 Google I/O keynote, Google Photos lead product manager Anil Sabharwal proudly announced that the new Photos application can categorise photos without any manual tagging. This reflects an attitude emerging from the recent successes of deep learning in object recognition, that the neural networks will soon make human intervention in picture identification redundant. But what if they don't? What if there are fundamental and unfathomable difficulties that machines without understanding cannot solve? Then it is a mistake to pursue a path which seeks to eliminate what we know about human cognition from the equation.

The problem begins when we start believing that the methods of cognitive computing are right. When we can laugh off as a quaint mistake when Google's 30 layer neural network thinks that a tall kitchen faucet is a monument, rather than see it for what it is, a deep and fundamental problem that illuminates the inherent limitations when statistics tries to play semantics.

I will argue in future posts that meaning - conceptual semantics - is a unique property of the human cognitive system, and machines will not possess semantics for a very very long time, if ever. This will fundamentally limit their ability to perform cognitive tasks. Intelligent systems can only be realised when human level semantics becomes an integral part of what we refer to as Semantic Symbiotic Systems, where the engineering techniques and the cognitive theories become equals. Together they will build better systems. Chomsky gets to keep his job. Everybody wins.







Saturday, May 23, 2015

Symbiotic Computing

More than fifty years ago Joseph Carl Robnett Licklider published a classic paper, Man-Computer Symbiosis, in which he outlined a vision for the way humans and computers will work together in the foreseeable future to solve problems. 


He believed that eventually we will have computers that achieve Artificial Intelligence, but we do not know how long this will take. He whimsically suggested it could be between "10 or 500" years. Fortunately we do not have to despair in the interim period, because we can still have intelligent assistant computers that work not through artificial intelligence, but Man-Computer Symbiosis. 

Symbiosis is a well known concept borrowed from biology - something like the "interaction between two different organisms living in close physical association, typically to the advantage of both." The important point is that they are different organisms, that nevertheless find a way to interact in mutually beneficial ways. When applied to computers, the key point is that the computers don't have to run the same program as the human. We don't need to worry about troublesome concepts like Artificial Intelligence or Cognitive Systems if we can instead achieve Symbiotic Systems which don't carry the same philosophical and emotional baggage. So what is it about?

It turns out that there is a simple way to think about what a symbiotic computer might be, and how it differs from other beneficial programs like spreadsheets and search engines. The main insight behind symbiotic computing is that the interaction between a human and a machine can sometimes hit unforeseen alternatives that requires an action which has not been anticipated by the designer of the application. That is, the solution requires some steps that have not been specifically programmed into the execution of the program. It is in this situation that the program needs to adapt in a way that is helpful to the human user, where the program takes the sort of initiative that is simply not a part of the operation of normal software tools. An example might be the need to generate novel hypotheses while categorizing large volumes of novel information, to help the human user make sense of the data. This is largely what Watson does while answering questions in Jeopardy!, where each novel questions requires the formulation of hypotheses from the large volume of assimilated data. The Watson system is usually presented as a question answering program, but in fact it generates a large number of alternative hypotheses from which it picks one answer based on a number of heuristics. If instead Watson presented several hypotheses in a collaborative problem solving session, then this would be an example of symbiotic computing, which would possibly achieve close to 100% accurate performance. If we think about such systems in this way we can avoid unnecessarily loaded terms like Cognitive Computing, and still have clever programs that help us carry out cognitively demanding tasks.

Still, it is not easy to create such systems. As Licklider says: "To think in interaction with a computer in the same way that you think with a colleague whose competence supplements your own will require much tighter coupling between man and machine than is suggested by the example and than is possible today."

In order to achieve coupling we need to understand important aspects of cognition, even if we don't need to reproduce it. But much more about this in future posts.

Welcome

Welcome to Semantic Symbiotics.

This site is dedicated to a new way of thinking about the way humans and computers work together, which we call Semantic Symbiotics. We will explain what this is, and why it has great promise!

We will also talk about other approaches. The dominant current approach is what is called Cognitive Computing. We will comment on some activities on this domain, and discuss some deep questions which the approach often ducks.

This will be fun!

State of the art in machine learning tasks

Operationalizing the task is key in the language domain. Link