🙌🏿 🐝 ♟️ Modern computer vision. Tasks and technologies of computer vision. Python Vision Programming 💺 💾 🥥

How to teach a computer to understand what is shown in a picture or photograph? It seems simple to us, but for a computer it’s just a matrix of zeros and ones from which we need to extract important information.

What is computer vision? This is the ability of the computer to "see"

Vision is an important source of information for a person, using it we obtain, according to various sources, from 70 to 90% of all information. And, of course, if we want to create a smart machine, we need to implement the same skills in the computer.

The task of computer vision can be formulated quite vaguely. What is “see”? This is to understand where it is located, just looking. This is the difference between computer vision and human vision. Vision for us is a source of knowledge about the world, as well as a source of metric information - that is, the ability to understand distances and sizes.

Semantic core image

Looking at the image, we can characterize it by a number of signs, so to speak, to extract semantic information.

For example, looking at this photo, we can say that it is outdoors. What is a city, traffic. That there are cars here. According to the configuration of the building and the hieroglyphs, we can guess that this is Southeast Asia. According to the portrait of Mao Zedong, we understand that this is Beijing, and if anyone has seen the video broadcasts or visited there himself, he will be able to guess that this is the famous Tiananmen Square.

What else can we say about the picture, considering it? We can select objects in the image, say, people are there, closer here is a fence. Here are the umbrellas, here is the building, here are the posters. These are examples of classes of very important objects that are currently being searched.

We can also extract some signs or attributes of objects. For example, here we can determine that this is not a portrait of some ordinary Chinese, namely Mao Zedong.

According to the car, you can determine that it is a moving object, and it is rigid, that is, it does not deform during movement. About flags, we can say that these are objects, they also move, but they are not rigid, they are constantly deformed. There is also wind in the scene, this can be determined by the developing flag, and you can even determine the direction of the wind, for example, it blows from left to right.

The value of distances and lengths in computer vision

Very important is the metric information in the science of computer vision. These are all kinds of distances. For example, for a rover, this is especially important, because teams from the Earth go on the order of 20 minutes and the answer is the same. Accordingly, a round-trip connection is 40 minutes. And if we draw up a plan of movement according to the commands of the Earth, then this must be taken into account.

Successfully computer vision technologies are integrated in video games. Three-dimensional models of objects and people can be built from the video, and three-dimensional models of cities can be restored from user photos. And then walk on them.

Computer vision - This is a fairly wide area. It is closely intertwined with various other sciences. Partially Computer Vision captures the area of image processing and sometimes highlights the area of machine vision, historically.

Analysis, pattern recognition - the path to creating a higher mind

Let us examine these concepts separately.

Image processing is a domain of algorithms in which an image is at the input and output, and we are already doing something with it.

Image analysis is an area of computer vision that focuses on working with a two-dimensional image and draws conclusions from this.

Pattern recognition is an abstract mathematical discipline that recognizes data in the form of vectors. That is, the input is a vector and we need to do something with it. Where this vector comes from is not so important for us to know.

Computer vision - it was originally a restoration of the structure from two-dimensional images. Now this area has become wider and it can be interpreted in general as making decisions about physical objects based on the image. That is, this is the task of artificial intelligence.

In parallel with computer vision in a completely different field, in geodesy, photogrammetry developed - this is the measurement of distances between objects from two-dimensional images.

Robots can “see”

And the last one is machine vision. Machine vision refers to the vision of robots. That is the solution to some production problems. We can say that computer vision is one big science. It combines some other sciences in part. And when computer vision receives a specific application, then it turns into machine vision.

The field of computer vision has a lot of practical applications. It is associated with the automation of production. In enterprises, it is becoming more efficient to replace manual labor with machine labor. The machine does not get tired, does not sleep, it has an irregular work schedule, it is ready to work 365 days a year. So, using machine labor, we can get a guaranteed result at a certain time, and this is quite interesting. All tasks for computer vision systems have a visual application. And there is nothing better than seeing the result immediately in the picture, only at the stage of calculation.

On the threshold of the world of artificial intelligence

Plus areas - it's hard! A significant part of the brain is responsible for vision and it is believed that if you teach a computer to "see", that is, to fully use computer vision, then this is one of the complete tasks of artificial intelligence. If we can solve the problem at the human level, most likely at the same time we will solve the AI problem. Which is very good! Or not very good, if you look at "Terminator 2".

Why is vision difficult? Because the image of the same objects can vary greatly depending on external factors. Depending on the observation points, the objects look different.

For example, the same figure taken from different angles. And most interestingly, a figure can have one eye, two eyes or one and a half. And depending on the context (if this is a photo of a man in a T-shirt with painted eyes), then the eye can be more than two.

The computer does not understand yet, but already “sees”

Another complicating factor is lighting. The same scene with different lighting will look different. The size of the objects may vary. Moreover, objects of any classes. Well, how can one say about a person that his height is 2 meters? No way. A person's height can be both 2.3 m and 80 cm. Like objects of other types, nevertheless they are objects of the same class.

Especially living objects undergo a wide variety of deformations. The hair of people, athletes, animals. Look at the pictures of running horses, it is simply impossible to determine what happens to their mane and tail. And the overlap of objects in the image? If you slip such a picture to the computer, then even the most powerful machine will find it difficult to give the right solution.

The next view is disguise. Some objects, animals are disguised as the environment, and quite skillfully. And the spots are the same color. Nevertheless, we see them, although not always from afar.

Another problem is movement. Objects in motion undergo unimaginable deformations.

Many objects are very volatile. For example, in the two photos below, objects of the "armchair" type.

And you can sit on it. But to teach a car that such different things in form, color, material are all objects of a “chair” is very difficult. This is the task. To integrate computer vision methods is to teach a machine to understand, analyze, assume.

differences between computer vision and human vision

Integration of computer vision in various platforms

Computer vision began to penetrate the masses back in 2001, when the first face detectors were created. Two authors did this: Viola, Jones. This was the first fast and fairly reliable algorithm that demonstrated the power of machine learning methods.

Now computer vision has a fairly new practical application - recognition of a person by face.

But to recognize a person, as shown in films - in arbitrary angles, with different lighting conditions - is impossible. But to solve the problem, one or different people with different lighting or in a different pose, similar, as in the photograph in the passport, can be with a high degree of confidence.

The requirements for passport photographs are largely due to the feature of face recognition algorithms.

For example, if you have a biometric passport, then at some modern airports you can use the automatic passport control system.

The unresolved task of computer vision is the ability to recognize arbitrary text

Perhaps someone used a text recognition system. One of these is Fine Reader, a very popular system on the Russian Internet. There are many forms where you need to fill in the data, they are perfectly scanned, information is recognized by the system very well. But with arbitrary text in the image, the situation is much worse. This problem remains unresolved.

Computer Vision Games, Motion Capture

A separate large area is the creation of three-dimensional models and motion capture (which is quite successfully implemented in computer games). The first program, which uses computer vision, is a system for interacting with a computer using gestures. When it was created, there was a lot of things open.

The algorithm itself is arranged quite simply, but to configure it, it was necessary to create a generator of artificial images of people in order to get a million pictures. The supercomputer with their help selected the parameters of the algorithm, according to which it now works in the best way.

This is how a million images and a week of counting time of a supercomputer made it possible to create an algorithm that consumes 12% of the power of one processor and allows you to perceive a person’s pose in real time. This is the Microsoft Kinect system (2010).

Searching for images by content allows you to upload a photo to the system, and by the results it will produce all the pictures with the same content and taken from the same angle.

Examples of computer vision: three-dimensional and two-dimensional maps are now made with its help. Maps for car navigators are regularly updated according to data from DVRs.

There is a database with billions of photos with geotags. By uploading a picture to this database, you can determine where it was taken and even from what angle. Naturally, provided that the place is quite popular, that at one time tourists visited it and took a number of photos of the area.

Robots everywhere

Robotics is everywhere nowadays, without it there is no way. Now there are cars in which there are special cameras that recognize pedestrians and traffic signs to transmit commands to the driver (such in a sense a computer program for eyesight that helps the car enthusiast). And there are fully automated robotic cars, but they cannot only rely on a camera system without using a lot of additional information.

A modern camera is an analogue of a pinhole camera

Let's talk about the digital image. Modern digital cameras are designed like a pinhole camera. Only instead of the hole through which a ray of light penetrates and projects the contour of an object on the rear wall of the camera, we have a special optical system called the lens. Its task is to collect a large beam of light and transform it in such a way that all the rays pass through one virtual point in order to obtain a projection and form an image on a film or matrix.

Modern digital cameras (matrix) consist of individual elements - pixels. Each pixel allows you to measure the energy of light that falls on this pixel in total, and output one number. Therefore, in a digital camera, we get instead of an image a set of measurements of the brightness of the light that has fallen into a separate pixel - computer fields of view. Therefore, when the image is enlarged, we see not smooth lines and clear contours, but a grid of squares painted in different colors — pixels.

Below you see the first digital image in the world.

But what is missing from this image? Color. What is color?

Psychological perception of color

Color is what we see. The color of an object, of the same object for a person and a cat will be different. Since in us (in humans) and in animals, the optical system - vision, is different. Therefore, color is a psychological property of our vision that occurs when observing objects and light. And not the physical property of an object and light. Color is the result of the interaction of the components of light, the scene and our visual system.

Programming computer vision in Python using libraries

If you decide to seriously study computer vision, you should immediately prepare for a number of difficulties, this science is not the easiest and hides a number of pitfalls. But "Programming Computer Vision in Python" by Ian Eric Solem is a book in which everything is presented in the simplest possible language. Here you will learn about methods for recognizing various objects in 3D, learn how to work with stereo images, virtual reality and many other computer vision applications. The book has enough examples in Python. But the explanations are presented, so to speak, in a generalized way so as not to overload with too scientific and difficult information. Work is suitable for students, just amateurs and enthusiasts. You can download this book and others about computer vision (pdf-format) on the network.

At the moment, there is an open library of computer vision algorithms, as well as image processing and numerical algorithms OpenCV. It is implemented in most modern programming languages and has open source code. If we talk about computer vision, Python uses as a programming language, then it also has support for this library, in addition, it is constantly evolving and has a large community.

Microsoft provides its Api services that can train neural networks to work with facial images. It is also possible to use computer vision, which Python uses as a programming language .

Modern computer vision. Tasks and technologies of computer vision. Python Vision Programming