Robot visual design with the panorama synthesis function

时间:2022-09-16 05:40:53

(2.School of Information Science and Technology , Tsinghua University, Beijing 100084,China)

Abstract. Panorama synthesis is basis of the target tracking. Through the synthesis of panorama, can enlarge the visual range to capture for the cameras, in the big background, the suspicious target which needs to be tracked are more easily detected, then selects the target, and Tracks it. In the existing target tracking of robot integrated machine, adds panorama synthesis function and the suspicious target selection function, the target tracking runs only on selected target, make the tracking more targeted. In the Panorama synthesis module of the present system, there are two ways to obtain video stream, that is real-time image stream from Cameras and the local AVI video file. Through the algorithm to judge the position relationship between two adjacent frames, finishes the image stitching, and displays panorama of the in the user interface in real time. By a particular target selection from users, complete moving object detection and tracking in motion camera environment. After the test, the system can meet the general needs of application oriented robot visual.

Key words: panorama; synthetic algorithm; robot integrated machine; target tracking; interaction

1.Introduction

In some fields, wide view and high resolution images or video is becoming more and more important, Such as structure of panoramic view in photographic surveying, background reconstruction technique in the field of video coding, Realization of panoramic video monitoring system and virtual environment construction in virtual display technology. In order to obtain a wide view scene image, people must by adjusting the camera focal length or the use of special equipment (Panorama wide-angle or fish-eye lens) to uptake in intact scene, but it acquired relatively low resolution panoramic photo, therefore, people used image stitching technology to splice More photos into a large panorama. Along with the scene expansion, the image sequence number used for splicing increased, Processing time generated a panorama is longer. Therefore, In the premise of quality assurance to generate images, to improve the panorama generation rate and to achieve the accurate real-time even real-time requirements are the hotspot and difficulty in the present research [1].

According to different projection planes, panorama can be divided into cylindrical, spherical panorama, cubic panorama. Cylindrical panoramic image technology is now mature, but it is only suitable for image sequence stitching obtained from the camera’s one-dimensional rotation along the horizontal direction; cubic panorama research is deepening, but effect is not as good as spherical true; spherical panorama is suitable to describe the large range scene, accord with people's habits of observation, but the model is complex, image sequence acquisition is difficult, related research is still immature [2].

Image registration is the key in Panorama generation process, the most commonly used methods have method based on motion[3], method based on characteristic point [4], method based on manifold projection [5] ,etc, these methods have the large amount of calculation, Panorama generation rate is slower, they are limited in practical application.

On the other hand,in the visual monitoring system, in order to detect suspicious cases, need to segment the moving objects in the scene, obtain objects described, and then to pay attention to it suspicious whether or not. In some special scenes, also need to judge distance from the suspicious objects to a particular, such as not too close from some restricted zone, or need to make a warning.

In many applications, there are some common features,one is required to detect some objects in the video stream, the other one is analysis and tracking of video information, and selectively stores. People often only care about the video content when the accident occurred, whereas in the past various visual system , all the video information are preserved, which leads to a tremendous data redundancy, not only occupies a large space, but also decreases the efficiency sharply to manually find abnormal events. And if you can understand the video information, only save the parts to pay attention to, the situation will be improved greatly. In addition, in these scenes are often necessary to let the camera movement, as is often needed to monitor a larger region, in only one camera case, only to let the camera continuously scan the entire scene, and finish panorama synthesis. And in robot system, camera is usually erected on the robot, so it must move following robot motion at the same time. Visibly, in Motion camera environment, the segmentation, the recognition and tracking of the foreground object are very meaningful. Only the middle-level information is obtained, then high-level event reasoning begins possibly.

This research work was based on the above two aspects -- a panorama composition, to extend the camera saccadic range; target tracking, keep the target in the lens and has been captured.

2. System structure and scheme constraints

2.1 System structure

The robot vision mentioned in the system refers to camera mounted in application oriented Robot Integrated Machine (RIM) remote control scheme, the scheme structure diagram as shown in Figure 1[6].

Fig. 1 RIM remote control scheme system structure diagram

In Figure 1, the camera is an integrated machine’s visual hardware, it is responsible for the entire field monitoring. According to camera information and feedback information from teleoperation platform, the operator can accurately grasp actual situation of RIM, and control movement of RIM and camera. In which, realized application oriented dynamic target tracking. This article is based on this, to give RIM with a panorama composition function and suspicious target tracking function, so that robot visual function is stronger and more practical. Panorama synthesis process and the target tracking process can switch each other.

2.2 scheme constraints

As first step of the visual system design, a panorama composition is the foundation of target tracking. Through panorama composition, can expand visual range captured by a camera, in the large field of vision, the target needed to track are more easily detected.

The panorama synthesis module realized in the system gets the video stream in two ways, the two ways are to capture real-time image stream by the camera and AVI video file stored in the local. Through the algorithm, the system judges the position relationship between two adjacent frames, finishes the image mosaic, and real-time displays the current state of panorama on the user interface.

The system agrees: after RIM movement, camera is fixed to the horizontal scanning direction (without considering the pitch angle), and thus greatly simplifies the calculation of panoramic image mosaic. Upper and lower boundaries of two adjacent frames are aligned in the horizontal direction, only needs to find matching points and to splice. Of course, in the splicing process, the system real-time judges whether camera saccade direction changes.

Thus, the design goal of panorama synthesis module is to use the efficient and accurate matching algorithm, to Synthetic image in real time, to judge camera saccade direction in real time, in order to determine whether needs to update part of panoramic image which has been completed stitching. Eventually the panorama is displayed on interactive interface in real time.

3. A panorama composition design

3.1 Algorithm design

A panorama composition module is one of the two basic function modules in the system, the module function is to complete panorama synthesis of the video file, to broaden the field of view of the camera. The system designed two input methods for the module, they are to capture video from a camera directly and read the local video file. If you choose mode to capture a camera video, but also video file can be written to the local hard disk. Similarly, panorama image files can be written to the local hard disk.

In the synthesis of panoramic image, need to display part of the synthesis of real-time image to the user, and need to judge shooting direction of camera or shooting direction of video files from the video, to decide whether need to begin to synthesize another panorama file [7]. The system design flow chart of a panorama composition module is as shown in figure 2.

In this module operation, several states require to be judged in real-time, namely when video frames need to be judged, they are judged: firstly, to judge whether a system state is changed, the user can switch states of the system in a panorama composition process, for example, to convert to target tracking function, without waiting for the end of reading video; secondly, to judge whether reading video is end or not, this is mainly designed for input mode of local video file, for input mode capturing from the real-time camera , you can ignore this judgment; the last is the direction judgment, after reading a new frame every time, comparison is needed between the previous frame and the last frame, Purpose is to judge the shooting direction of video stream, if the video direction changes, existing panorama synthesis results is needed the corresponding treatment according to user requirements, and open up storage area of new results, otherwise according to the direction, part of a new frame which does not exist in synthetic maps is stored to storage area of results.

3.2 Module Realization

This module uses the C++ language as a programming foundation, and combined OpenCV. OpenCV is an open source computer vision library, which is a series of C function and a small amount of C++ [8], it realized many common algorithms the image processing and computer vision aspects.

In this paper, template matching algorithm used in a panorama composition process is simplification of spherical projection model, mainly used the gray-scale image matching algorithm. Matching algorithm based on the gray projection [9] is that two-dimensional image gray values are projected respectively and transformed into two independent groups of data, then On the basis of one-dimensional data, begins image matching. As a result of dimension reduction, greatly reduces the calculation amount in the matching operation, thus greatly improves the matching speed. At each step of the operation, first of all operates the image space transformation of two images ,change RGB space into the gray space, reduce the dimensions of the image, to improve the matching speed. And then through the template matching algorithm, judge Position relation between the current frame and the previous frame, thus determines the direction of video, then finishes the image synthesis. OpenCV provides powerful algorithms library, here is several major functions used by this module [10].

3.2.1 CvtColor-Color space conversion function

The most commonly application of RGB (red, green, blue) is the monitor system, it can display each color value for HSB, RGB, LAB and CMYK color space.

HSV (hue, saturation, value ) color space model corresponds to a subset of conical in the cylindrical coordinates in, Cone top surface corresponds to V=1. that is to say, the V axis in HSV model corresponds to the main diagonal in the RGB color space. The color on the cone top surface circumference, V=1, S=1, this color is pure. HSV model corresponds to the painter matching method. Artists obtain different colors from some color by the method of changing the color consistency and color depth, adds white in a solid color to change color consistency, adds black to change the color depth, while adds different proportion of white and black at the same time, artists can obtain different colors.

HSI color space is from the human visual system, it describes the color using the tone (Hue), color saturation ( Saturation or Chroma ) and luminance ( Intensity or Brightness ). Therefore, in the HSI color space can greatly simplify workload of the image analysis and processing. HSI and RGB color space is different notation of the same physical quantity, so there is a conversion relationship between them.

Function prototype: void cvCvtColor( const CvArr* src, CvArr* dst, int code );

SRC: input 8-bit,16-bit or 32-bit single precision floating point number image;

DST: output 8-bit,16-bit or 32-bit single precision floating point number image;

Code: color space transformation, by the definition of the V_2 constant:

a)RGBXYZ (CV_BGR2XYZ, CV_RGB2XYZ, CV_XYZ2BGR, CV_XYZ2RGB);

b) RGBYCrCb (CV_BGR2YCrCb, CV_RGB2YCrCb, CV_YCrCb2BGR, CV_YCrCb2RGB);

c) RGB=>HSV (CV_BGR2HSV,CV_RGB2HSV);

d) RGB=>Lab (CV_BGR2Lab, CV_RGB2Lab);

e) RGB=>HLS (CV_BGR2HLS, CV_RGB2HLS);

f)Bayer=>RGB (CV_BayerBG2BGR, CV_BayerGB2BGR, CV_BayerRG2BGR, CV_BayerGR2BGR, CV_BayerBG2RGB, CV_BayerRG2BGR, CV_BayerGB2RGB, CV_BayerGR2BGR, CV_BayerRG2RGB, CV_BayerBG2BGR, CV_BayerGR2RGB, CV_BayerGB2BGR).

3.2.2 MatchTemplate comparing to a template and overlapping regions of the image function

Function prototype: void cvMatchTemplate( const CvArr* image, const CvArr* templ, CvArr* result, int method );

The function slides across the whole image, With the specified method, compare the template with overlap region where image size is w ×h, and the results will be stored to the result.

Image: to search the image. It should be a single channel, the 8- bit or bits of 32- float image;

Templ: search template, cannot be greater than the input image, and with the input image having the same data type;

Result : comparison of outcome mapping image. Single channel,32- bit float. If the image is the W ×H and templ is the w× h, then result is ( W-w+1) x ( H-h+1);

Method: Specifies the matching method, as follows.

a)method = CV_TM_SQDIFF:

b)method = CV_TM_SQDIFF_NORMED:

c)method = CV_TM_CCORR:

d)method = CV_TM_CCORR_NORMED:

e)method = CV_TM_CCOEFF:

In which,

(Specify here, template brightness=>0)

(Specify here, patch brightness=>0)

f)method = CV_TM_CCOEFF_NORMED:

After function completes comparison, through the use of cvMinMaxLoc, find the global minimu(CV_TM_SQDIFF*)or Maximum value(CV_TM_CCORR* and CV_TM_CCOEFF*)。

This module was developed by the Java language, interface structure was built by using the Java Swing packet, use JNI interface to communicate and exchange data with bottom layer. Fig 3 is control interface for input and output parameters which the system can obtain. " Input Video " can obtain video by selecting from a camera or a local AVI file; " Frame " and " Panorama " is to select whether they need to be saved to the local computer ,that is to say whether the video captured from a video camera and panorama image file will be saved to the local hard disk.

Fig.3 I/O control interface

4. Design of target tracking

Target tracking in RIM scheme has been achieved, here is to increase the suspicious target selection function, according to his observation, the operator can determine the suspicious target and select it. After the target is selected, The target is surrounded by a red oval-shaped frame, the operator put control state to tracking state, so he can track the suspicious target, the target can be static or moving. Algorithm flow diagram is as shown in figure 4.

Fig. 4 Target tracking algorithm flow chart

The module needs to have the mouse operation interface, in each frame operation it needs to judge target parameters. If the user selects a target again, the module needs to replace the original target tracking parameters. The main processing part focused on judgments of the target location in the current frame, and process selected by Oval frame. Because the rate that a camera captures frame is very fast, but the target moving is random, with its uncertainty, and requires that the system has good real-time performance and high accuracy, so it requires higher calculation speed and precision.

5.The results of experiment and conclusion

5.1 Panorama synthesis experiment

Fig.5 is the beginning part image frames obtained from AVI (compression).

Fig. 5 AVI part video frame sequence

Panorama synthesis operation results for the above image sequence, as shown in Figure 6 below:

Fig. 6 Panorama synthesis results map

Through many repeated test in similar to the above way, panorama synthesis module basically meets requirements of users, in real time and accuracy have reached better synthesis effect, but have some delay in the process of displaying results. In the synthesis process of the video rotation from left to right, displaying is no problem basically, because each display is only loaded with new added images; However, in the synthesis process of rotation from right to left, because each time needs to update the throughout the region of synthesis results, so the display also requires all to be reloaded, appears a certain degree of delay. It can meet requirements basically.

5.2 Target tracking experiment

The experimental procedure was tested in the USB camera case.

Without selecting any objects with the mouse, it is as shown in figure 7.

Fig. 7 Before selecting the target

Figure8 is the show after selecting the target object with the mouse, rings up the object (posters) with an oval frame.

Fig. 8 After selecting the target

Now move the camera angle, for target tracking experiment. In fact, It is for the purpose to camera automatic tracking and target moving. Here because of using USB camera, so can move the camera, so as to realize the relative motion. Figure 9 and Figure10 is respectively the camera left and right shift.

Fig. 9 Camera rotation to the left

Fig. 10 Camera rotation to the right

Map with red oval frame is being tracked target (posters). In the camera moving process, the system can always maintain the oval ring of the target, achieve the purpose of tracking. Because the use of the CamShift algorithm judges only on the target color information so as to realize the tracking, so the tracking precision is not very high, it is more accurate only in the case of big color difference in above experiment.If the environment is very complex, and the color information is very complicated, effect is poor.

Through the above 2 experiments, the test results were analysed respectively. Experimental results show that, in robot vision part, a panorama composition and suspicious target function were added Successfully, this makes the application more targeted. Although the system needs to be further improved and Practical, but through the experiment, we can find that system can satisfy the general demand in general in not very complex situations. System research is still on going.

References

1.David A.Forsyth, Jean Ponce. computer vision: a modern approach [M]. trans.Lin Xueyin, Wang Hong, et al. Beijing: Publishing House of electronics industry, 2004

2. Xie Kai, Guo Heng, Zhang Tianwen. Image Mosaics technology [J]. Journal of electronics, 2004, 32 (4): 630-634.

3. Szeliski R. Image Mosaicing for telereality applications[J]. IEEE Computer Graphics and Applications, 1994, (6): 44-53.

4. Li Yanli, Xiang Hui. Solid spherical panorama generation algorithm [J]. Journal of computer-aided design and computer graphics, 2007, 19 ( 11): 1383-1398.

5. Peleg S, Herman J. Panoramic Mosaics by Manifold Projection[J]. Proceedings of IEEE Computer Society Conference on CVPR, 1997: 338-343.

6. Wang Wenming. Application oriented robot machine remote control scheme design [OL]. [2012-05-10]. /index.php/default/releasepaper/content/201205-166/

7. Richard O.Duda, Peter E.Hart, David G.Stork. pattern classification [M]. trans.Li Hongdong et al. Beijing: Mechanical Industry Press, 2003

8. Wu Qiqing. Eclipse program design classic tutorial [M]. Beijing: Metallurgical Industry Press, 2007

9. Ping Jie, Yin Runmin. A panorama image and its implementation of [J]. micro computer application, 2007, 33 ( 6): 59-62.

10. H.M.Deitel, P.J.Deitel. Java programming tutorial [M]. trans. Shi Pingan et al. Beijing: Tsinghua University press, 2004

上一篇:黎平县分区域配方施肥技术的应用 下一篇:蜂蜜低温提纯技术的主要内容及推广前景