SaraVision
Image recognition
Artur Majtczak

Artur Majtczak

Saturday, 20 March 2021 17:18

SaraVision

Do you have to watch a new object 1000 times to recognize it? Or if you watch it another 100000 times will you recognize it better? I think not, and this is how image recognition systems based on the Haar cascade classifier or neural networks work.

SaraVision is an image recognition project that is conceptually completely different from the approach that the whole world is pinning their hopes on. It's different from better and better neural networks that learn from bigger and more available databases. We assume that sometimes just one or several glances at an object are enough to remember it and recognize it. It all started from the need to add the sense of sight and some intelligence to one of our sub-projects called SaraCam, which raises the current Google and Alexa assistants to the 2.0 level.

At first, in order to test some basic assumptions we wrote a simple program that recognizes the MNIST character set, which I describe on our blog in a slightly provocative article "About the nonsense of deep learning, neural networks in image recognition (using the MNIST kit)". Already there we managed to create a very universal program, recognizing characters regardless of their size, slant or font type, but it was only a programming "sandbox".

The next stage was to create something more universal, allowing for recognition of any objects, and for a start, allowing for quick detection of basic geometric figures and also to check a theory that our brain can see very well what we cannot see, literally drawing in our imagination the missing elements (see: visual perception and gestaltism, reification theory), and that our system will work similarly:

 

  

 

It worked, it works similarly, as you can see in this amateur video below, and it works so that you don't have to see the whole square to detect that the square is there.
It may seem that detecting simple figures is easy and any programmer can do it. You can use some "ready-made" programs, you can also algorithmize everything, but we don't want to write an algorithm for every shape, but to write one for all shapes, and most importantly we don't want to teach the system with thousands of images.

 

 

The next step was to test the system to see if it could handle face detection in the camera video. Importantly, the system is designed to detect a face very quickly, the face can be tilted left or right, slightly sideways, sideways, poorly lit, visible in color or IR rays - it worked. The system, despite the fact that it is under construction coped almost 20 x faster than standard face detection systems, and most importantly coped where other systems could not cope at all (for example, a face illuminated from one side by the sun with the head slightly tilted to the side):


(video taken on a Raspberry Pi 4 microcomputer, use of a single CPU core at 20-30%, image analyzed in real time for obstruction from a moving pan tilt camera that tracks the user's face)

We are at the beginning of the road, but the results we are getting seem sensational, and we are already thinking about a 3D space recognition system based on our method.

Although at the moment we are thinking of applying our method to our other sub-project SaraCam, the possibilities of this system are enormous, the main ones being:

1. We don't teach the system with thousands of samples, just a few or a dozen ones.
2. The angle at which we analyse the recognised image is not important.
3. Up to 20 times higher speed of object recognition.
4. Minimum computer power (Raspberry Pi microcomputer can detect a face in 10 ms, not in 500 ms).
5. No internet connection is needed.

I do not deny any methods of image recognition based on neural networks, the huge progress of these methods especially in recent years, I think that tools like tensorflow are brilliant, but I also think that many things can be done differently, that not everything should be pushed into neural networks, and if we want to use them, let's give them the data on which networks have a chance to work best.

 

 

Thursday, 04 March 2021 14:00

SaraCam's new look

We already know what our voice assistant SaraCam - one of our artificial intelligence subprojects SaraAI - will look like.
As we wrote earlier, thanks to the received funding we have strongly accelerated.

SaraCam project is about upgrading voice assistants to a higher level by adding sight and intelligence.
You can find more information on the project website SaraAI.com/SaraCam, and here I would like to present our journey from the model to the final look.

The idea of creating Sara was born a long time ago, in the times when the Internet was in its infancy, speech recognition didn't work and there was no access to open knowledge bases. Fortunately, those limitations are behind us now, which allowed us to return to the project and start the first tests of the previously thought out assumptions. In one of our first published videos you can see our first prototype assistant made from a regular IP camera, where we show some aspects of the assistant that we would like to develop more. This only one and a half minute video, although older and amateurish, shows some key solutions, like establishing a kind of bond with the device or continuity of dialogue, which seems to us to be crucial and which we already described in another article "We are looking for Artificial Intelligence, and we get.... a speaker."

After the initial tests, seeing the limitations of using standard IP cameras, we further developed our assistant by adding a more powerful processor, a set of 6 microphones and fast motors, so that the camera could keep up with fast movement. The next hybrid version of SaraCam was born:


At the same time, we also made our first video showing some of the functionality we want to do in the already commercial version of SaraCam:

In late 2020, thanks to the funding we received for SaraCam and our collaboration with MindSailors Design Studio, we are finally creating the final shape and functionalities of SaraCam, which we will soon present in action, and at the moment we can already reveal its design:

How do you like it?

 

Page 1 of 2