symbol

Computer Vision and AGI

Why Computer Vision?

Computer vision is the science and technology that allows computers to extract information from images. The reason it is so essential to AGI is that in order for a machine to interact with and learn about its environment, it needs to be able to sense it. The world is complex and uncertain, you cannot just predict what will happen or be told how the world works. Living in the real world absolutely requires direct observation of it for the vast majority of real world problems that general artificial intelligence (AGI) should be applied to. Vision provides us with the information we need to effectively and confidently learn almost everything: language, what the world contains, how the world works, how to predict things, etc.

How do you Envision Computer Vision working in AGI?

To put it simply, the main goals of the computer vision system would be to:
1) solve the correspondence problem
2) figure out what the images of a video contain
3) figure out what the image content represents
4) disambiguate what is happening in an image.

The correspondence problem is trying to figure out what matches what in an image. I imagine the system only learning from high confidence conclusions in the beginning. Once it gains experience it can use its knowledge of the world to understand more difficult observations with higher confidence.

My current ideas about how this would work are the following. Videos for training and learning would be taken using a high speed camera in reasonably good lighting conditions. The videos might be taken using two cameras to get more depth cues. The images would then be analyzed to extract local cues and features. The analysis results would be further analyzed and the image segmented to determine locations, depths, groupings, surface relationships and properties, etc. With the local cues, features and other information regarding the images we would attempt to construct and match invariant representations of something we can find and match in any two sequential or simultaneous images. For example, we might match a feature or a curve or a shape, etc. I suspect that effective knowledgeless matching will require the use of methods that measure the likelihood of a certain change between any matches in the image. An example of such a method would be thresholds. If the change is within the threshold, it is acceptable. If it is not, it is rejected. The thresholds would allow us to establish whether a match is reasonable and likely. We may use other methods to establish likelihood as well. If we find a highly likely match, we then must make sure that the second best match is very unlikely. This is because it is not good enough to find a highly likely match. If there are 1000 good matches within the same image, you might make a mistake by just choosing the one that is slightly better. So, we have to establish the unlikelihood of the second match. If we can, then we have established a correspondence between a part of the image. As we learn about the image and find partial solutions we can use the information to explain other parts of the image. We can also develop expectations and prediction models/rules to assist in future images analysis. If we have knowledge, we can search our database of known objects or models to match the current image to our experience. We can then use the experience to assist in the analysis of the image. If we can make high confidence conclusions about the image, we can then learn from them and create more experience.

I hope to design and implement the above algorithm description or to find an algorithm that can effectively learn about the content of videos automatically and in a general way. Once I am there, I will continue to proceed step by step with the outline I gave at the end of the strategy page of this website.

Once the AI is able to understand and learn about the environment in a general way, it can also begin to learn the behaviors of objects; appearance; analyze the best way to recognize an object; learn cause and effect, etc. Such abilities would put us well on our way to general AI (AGI).

Intelligence is all about prediction and using our knowledge of the world to create desired outcomes. If we can understand and interpret sensory input from the world effectively, we will be well on our way to predicting the environment and pursuing goals, which is the purpose of AGI.

 


PracticalAI.org © 2010