Do you have to watch a new object 1000 times to recognize it? Or if you watch it another 100000 times will you recognize it better? I think not, and this is how image recognition systems based on the Haar cascade classifier or neural networks work.
SaraVision is an image recognition project that is conceptually completely different from the approach that the whole world is pinning their hopes on. It's different from better and better neural networks that learn from bigger and more available databases. We assume that sometimes just one or several glances at an object are enough to remember it and recognize it. It all started from the need to add the sense of sight and some intelligence to one of our sub-projects called SaraCam, which raises the current Google and Alexa assistants to the 2.0 level.
At first, in order to test some basic assumptions we wrote a simple program that recognizes the MNIST character set, which I describe on our blog in a slightly provocative article "About the nonsense of deep learning, neural networks in image recognition (using the MNIST kit)". Already there we managed to create a very universal program, recognizing characters regardless of their size, slant or font type, but it was only a programming "sandbox".
The next stage was to create something more universal, allowing for recognition of any objects, and for a start, allowing for quick detection of basic geometric figures and also to check a theory that our brain can see very well what we cannot see, literally drawing in our imagination the missing elements (see: visual perception and gestaltism, reification theory), and that our system will work similarly:
It worked, it works similarly, as you can see in this amateur video below, and it works so that you don't have to see the whole square to detect that the square is there.
It may seem that detecting simple figures is easy and any programmer can do it. You can use some "ready-made" programs, you can also algorithmize everything, but we don't want to write an algorithm for every shape, but to write one for all shapes, and most importantly we don't want to teach the system with thousands of images.
The next step was to test the system to see if it could handle face detection in the camera video. Importantly, the system is designed to detect a face very quickly, the face can be tilted left or right, slightly sideways, sideways, poorly lit, visible in color or IR rays - it worked. The system, despite the fact that it is under construction coped almost 20 x faster than standard face detection systems, and most importantly coped where other systems could not cope at all (for example, a face illuminated from one side by the sun with the head slightly tilted to the side):
We already know what our voice assistant SaraCam - one of our artificial intelligence subprojects SaraAI - will look like.
As we wrote earlier, thanks to the received funding we have strongly accelerated.
SaraCam project is about upgrading voice assistants to a higher level by adding sight and intelligence.
You can find more information on the project website SaraAI.com/SaraCam, and here I would like to present our journey from the model to the final look.
The idea of creating Sara was born a long time ago, in the times when the Internet was in its infancy, speech recognition didn't work and there was no access to open knowledge bases. Fortunately, those limitations are behind us now, which allowed us to return to the project and start the first tests of the previously thought out assumptions. In one of our first published videos you can see our first prototype assistant made from a regular IP camera, where we show some aspects of the assistant that we would like to develop more. This only one and a half minute video, although older and amateurish, shows some key solutions, like establishing a kind of bond with the device or continuity of dialogue, which seems to us to be crucial and which we already described in another article "We are looking for Artificial Intelligence, and we get.... a speaker."
After the initial tests, seeing the limitations of using standard IP cameras, we further developed our assistant by adding a more powerful processor, a set of 6 microphones and fast motors, so that the camera could keep up with fast movement. The next hybrid version of SaraCam was born:
At the same time, we also made our first video showing some of the functionality we want to do in the already commercial version of SaraCam:
In late 2020, thanks to the funding we received for SaraCam and our collaboration with MindSailors Design Studio, we are finally creating the final shape and functionalities of SaraCam, which we will soon present in action, and at the moment we can already reveal its design:
How do you like it?