American Journal of Signal Processing

p-ISSN: 2165-9354    e-ISSN: 2165-9362

2015;  5(2): 40-50

doi:10.5923/j.ajsp.20150502.03

Landmark-Based Robot Localization Using a Stereo Camera System

Nguyen Tan Nhu, Nguyen Thanh Hai

Faculty of Electrical and Electronics Engineering, HCMC University of Technology and Education, Vietnam

Correspondence to: Nguyen Thanh Hai, Faculty of Electrical and Electronics Engineering, HCMC University of Technology and Education, Vietnam.

Email:

Copyright © 2015 Scientific & Academic Publishing. All Rights Reserved.

Abstract

This paper proposes the method of 2D localization using a stereo camera system. The navigating space of robot with landmarks for robot localization is an indoor environment. Therefore, the main task is that features of landmarks that are unchanged in scales and directions are determined from the left and right images. The algorithm of the Scale-Invariance Feature Transform (SIFT) is applied for transformation of all features of the landmark image to multi-dimensional vectors. Moreover, the pixel locations of the detected landmarks on the left and right images are transformed to distances using the geometric projections. The coordinates of each detected landmark and the distances to the robot are finally combined using Euclid distance equations for estimating the robot location. Results are shown to illustrate the effectiveness of the proposed stereo camera system with the consuming time and the low cost, but it still satisfies the localization task.

Keywords: Robot localization, SIFT algorithm, Landmark recognition, Geometric projections, Distance measurement

Cite this paper: Nguyen Tan Nhu, Nguyen Thanh Hai, Landmark-Based Robot Localization Using a Stereo Camera System, American Journal of Signal Processing, Vol. 5 No. 2, 2015, pp. 40-50. doi: 10.5923/j.ajsp.20150502.03.

1. Introduction

Localization is one of the most important functions in the navigation of automatic robots. Some robots with modern methods have developed with the ability of automatically moving from the initial location to the assigned one to accomplish tasks been entrusted by humans. In order to determine when the robot reaches its destination, the localization function has to operate accurately and continuously. Therefore, the accuracy and reliability of robot localization are keys to the success of navigating tasks [1].
The accuracy and reliability of the localization function are mainly dependent on how distances from the robot to landmarks are measured [2]. Sonar sensors were firstly and commonly applied to robotics in getting distances of surrounding objects from the robot due to the simplicity and speed of localization, but this method was limited in kinds of landmarks being able to be recognized. This method is able to recognize objects having flat surfaces such as walls. Consequently, most small-surface objects such as chairs, tables, and other moving ones are impossibly detected. In the other hand, the accuracy of this method is not very high [3].
The drawbacks of applying sonar sensors in localization can overcome using the alternation of laser sensors. The range of the laser is much larger than that of the sonar, but it is not able to recognize complex objects in surroundings [4]. In general, data obtained using sonar and laser sensors are just distances [5]. The most valuable ability of image processing methods in localization was the recognition of landmarks disposed and fixed in a navigation space. Moreover, these methods could be used to determine distances from the robot to the landmarks for making localization systems [6].
When image processing techniques were applied in robotics, recognition algorithms have become more important. Recognition techniques based on colours were firstly used, but the results are much dependent on the light intensity of the environment [7]. Other methods solving this problem are often Fourier and Wavelet transform-based recognition for transforming the grey level variances to frequency domain so that the results are independent from light intensity [8].
Objects in the navigating space are variant in scales and directions appeared on the captured images for recognizing. Therefore, recognition algorithms tend to focus on the features of objects such as edges, lines, and points [9]. David G. Lowe invented the technique for detecting features in an image without change in scales and directions appeared on the capture images. Thus, image processing algorithms have been applied in distance estimation based on the stereo vision technique in recent decades [10]. Therefore, the localization systems applied image processing algorithms are going to be served simultaneously with using both distance estimation and object recognition [11-15].
Several applications taking advantages of image processing in the field of localization have recently been being commonly developed. Artificial landmarks designed specifically for underwater image recognition are used to support for MCL in order to gain the accuracy of underwater localization [16]. Barcodes are recognized and measured distances by image processing techniques so as to determine location of the KINECT sensor [17]. An open source of triangulation tool box just introduced helps landmark-based localization tasks simplify their estimation of location in 2D or 3D space [18]. Based on the information of the landmarks gotten by image recognition methods, an indoor localization system has been developed to support visual impairment individuals in their moving [19]. However, the cost of stereo cameras utilized in these localizing systems is one of the drawbacks of the systems using image processing, and the issue is that the total cost of localizing systems must be reduced so that it can be suitable for everyday applications.
This research proposes the method of robot localization based on landmarks using a stereo system. The navigating environment of the robot is set up landmarks with own coordinates. Therefore, the main task is to localize the robot in the navigating space environment. In order to determine the robot location, the left and right images are captured for detection of features of landmarks without change in scales and directions, in which the SIFT algorithm is employed to transform to multi-dimensional vectors for easily detecting using Euclid's distance equation. Moreover, the pixel locations of the detected landmarks are transformed to distances using the geometric projection for stereo calibration. The coordinates of each d landmark and the distances from the landmark to the robot are combined based on Euclid distance for estimation of the robot location. The results will represent calculation of the time related to localizing cycle and low cost of the proposed method.

2. Basic Methods

2.1. Location Estimation Based on Known-Position Landmarks

Figure 1. The geometric model of robot localization in 2D space
Localization relates to the view limitation of the method for the estimation of the two factors of coordination of vertical and horizontal directions as shown in Figure 1. In this figure, it is supposed that a robot having an undefined coordinate is moving in the 2D space of , which is represented by the fixed landmarks having the own predefined coordinates. In this paper, there are three landmarks such as , , and are used. Distances as , and measured from the robot to each landmark of A, B, and C are symbolized, respectively. As a result, the relationship between these distances and the coordinates of the robot, A, B, and C in the system is shown by the following equations:
(1)
For the equivalent transformations, from Equation (1), Cramer's rule is applied for determination of the robot location in the dynamic space as follows:
(2)
Eq. (2) can be calculated to produce as follows:
(3)
(4a)
(4b)
(4c)
(5)
(6a)
(6b)
(6c)
(7a)
(7b)
(7c)
The distance between the robot and each landmark as well as the distance between two coordinates landmarks have to be estimated to solve the system equation (1) for landmark recognition.

2.2. Scale-Invariant Feature Transformation

The Scale-Invariant Feature Transformation (SIFT) method allows to transfer the invariant feature of the image object to scales and directions appeared on another image [10]. Therefore, the pixel locations on the sample object image with no change in various scales and the determined locations are found so that the features recognized are not depend on the appeared direction on the corresponding image.
(8)
(9)
(10)
In order to determine the locations of features that are stable in scale, the sample image of object is scaled to various versions of size by convoluting the image matrix with Gaussian function using Equation (8). The various version in scales of the original sample image are represented by in which k is called the scale factor. As a result, is computed Equation (9). According to the theory [20], the minimum and maximum locations of the Laplace transformation of Equation (10), can be computed using Equation (11) and are scale-invariant pixel locations on the sample image of the object. As shown in Equation (12), the Laplace transform of the Gaussian function is approximately equal to the difference of continuous scale versions of .
Consequently, the scale-invariant pixel locations are the minimum or maximum locations of .
(11)
(12)
After determining the pixel location of scale-invariant features, each location will be transformed to one multiple-dimension vector. The length of each dimension is calculated by histogram of directions, which have the resolution of 45 degrees for calculation of the SIFT vectors. Both features location and description are employed on the sample image and tested image, so the matching is determined using the following equation:
(13)
Where are the elements of the first vector, are the elements of the second vector, and is the distance between the two vectors.

2.3. Geometric Projection

This section presents briefly the geometric projection for distance measurement using stereo vision techniques [21] as shown in the Figure 2. The geometric model contains two single cameras which have the same configuration and are installed with T from the central left image plane to the central of right image plane. The focal length of the left and right camera is symbolized by f.
Figure 2. Representation of a stereo camera system
An object point in space projected on the left and right image planes has the pixel coordinates and respectively. Because the left and right image planes are co-planed together, in which the y-coordinates of the left and right projected points of P are the same. Thus, the difference occurs in the x-coordinate, which is represented as and called the disparity between the left and right image point. The distance, from the point P to the stereo camera is defined as the distance from the plane containing two focal points of the left and right camera to the point P. Therefore, the relationship between the distance and the disparity is determined using the following equation:
(14)
In which the focal length is f, T denotes the baseline between two cameras and the disparity d.

3. Proposed Method

This section presents briefly the algorithm of localization in 2D space. In particular, the two most important problems in the localization system are the scan-invariant object recognizer and the stereo vision distance estimator as described in Figure 3.
Figure 3. Block diagram of the proposed localization system

3.1. Localization System

The block diagram of the proposed localization system is shown in Figure 3, in which the Left camera and the Right camera installed with the same structure to be a stereo camera system for capturing surrounding images, including the predefined landmarks.
The existence of landmarks is determined by Landmark identifier based on an algorithm of object recognition which has the ability of recognizing objects appeared differently in scales and orientations compared with the original sample images. Moreover, the estimated Landmarks' pixel locations on the left image are combined with their pixel locations on the right image to output their distances from the stereo camera installed with on the robot. The Landmarks' locations on 2D space specified by Landmark identifier and distances from the robot to recognized landmarks measured by the Distance measurement are added to the Robot localization, using Equation (1) to output the robot location corresponding to the global coordination.
According to Figure 3, two most important problems beside the Robot localization are Landmark identifier and Distance measurement. In particular, the Landmark identifier is applied as a recognition algorithm named the SIFT. This algorithm allows to identify objects in different scales and directions compared with the original image for detection of the appearance of landmarks during navigation. In addition, the Distance measurement means that measuring the distances from detected landmarks to the stereo camera using the rules of the geometric projection.

3.2. Scan Invariant Object Recognition

The fundamental task of the localization system is the process of recognizing landmarks that have specified features distinguished. For processing this problem, the SIFT method for recognition is utilized due to its suitability for the requirement of detecting landmarks appeared in various scales and directions.
Figure 4 illustrates steps for recognition task. The most essential requirement of the task is that there is a library of features for each landmark and this features specialized for each landmark are invariant in scales and directions. Therefore, these features are determined on the sample landmark image using the Gaussian method so that their positions do not change in variations of scales. Each feature location is transformed to multiple-dimensional vector which is independent to its appeared direction on the left image.
Figure 4. Representation of landmark recognition
Figure 5 shows the results of the features on the sample landmark image, in which the features detected using the SIFT algorithm are circles with different radiuses. While the lengths are dependent on the number of scales without change of locations and the direction of each circle specialized for the direction of the largest variant vector.
Figure 5. Features in a landmark's sample image
The left image captured by the stereo camera mounted on the robot is analyzed to find out scale-invariant features of the sample landmark images in the same way. The features detected on the left image are matched with the features stored in the recognition library to check whether there is any landmarks on the image frame. The matching is satisfied if the Euclid distance between the two described vectors of features is within a specific tolerance of difference defined according to the specific situations. If the number of matching features is larger than that of the minimum features, it is considered as enough evidences to make the conclusion that the landmark is definitely appeared on the image frame. Inversely, the recognition process will continue to execute on the next image frame.
A successful recognition of a landmark on the left image frame is described in Figure 6. The lines appeared in this figure are connected between the feature location on the sample image to its matching point on the image frame. The number of lines represents the number of matching points between the two images. After detecting the appearance of landmark on the left image frame, the distance from the landmark to the stereo camera is then measured using the geometric projections.
Figure 6. A landmark is detected on the left image frame

3.3. Distance Measurement Using Stereo Vision

In addition, distance measurement is very important in the localization system. Figure 7 shows the processes of utilizing stereo camera in measuring distance. The precision of the localization results is strongly dependent on the precision of distance measuring. Although the precision of stereo camera is not very good, the utilization of stereo camera in estimating distances makes the localization more compact. For better information obtained, the stereo camera system as described in Figure 8 is calibrated firstly to determine its stereo parameters in the stereo camera model before determining distances. The stereo camera with the frame size of 640x480 is configured to have the resolution of 24 bits per each pixel.
Figure 7. The process of utilizing stereo camera in measuring distance
Figure 8. A pair of the two sample images obtained from the left and right camera system
The first step of the process is the collection of sample images such as a chessboard captured in various shifts, scales, and directions. Each position of the chessboard in 3D space is taken from both the left and right cameras and the images are stored in the library of calibration as shown in Figure 8.
For processing the sample images, the calibration step is applied to determine the left, right, and stereo parameters. Figure 9 shows the image pair with its sample points A, B and C in 3D real coordinates which are calculated to give values as shown in Table 1. Thus, each pair of the projected left and right points produces the relationship between the real 3D coordinates and the 2D project pixel locations to re-estimate the stereo camera parameters.
Figure 9. Example of an image with acceptable resolution
Table 1. The left and right image points
     
The stereo camera parameters estimated from the stereo calibration are employed to rectify the left and right image frame. Therefore, images are obtained from the left and right cameras containing many errors due to the configuration of this real stereo camera. For this reason, the left and right images are rectified to produce the better images as shown in Figure 10. After rectifying, the left image points have the same horizontal lines to the right image points. In Figure 10, the left image point marked by the red rectangle has the pixel location of (150; 102) and the right one has the pixel location of (92; 102).
Figure 10. Example of an image with acceptable resolution
In Figure 11, the rectified left and right images with the appearance of landmarks are detected. If there is a landmark appeared on the left and right rectified images, the disparity of the two projected points is determined based on the pixel locations on the two images. The left and right image points of the detected landmark have the same y-coordinate and the different x-coordinate. Therefore, distance from the stereo camera to the detected object is calculated using Eq. (14).
Figure 11. Detected landmarks on the rectified left and right images; (a) detected left landmark; (b) detected right landmark
After determining distances of three detected landmarks at least, the landmark locations and the distances from these landmarks to the stereo cameras are used to calculate the location of robot in 2D space.

4. Experimental Results

There are many factors affecting to the localization results which can be classified into two main influences of the recognized landmark and measured distance due to key contributions to the localization system for determination of robot coordinates in this paper.

4.1. Results of Landmark Recognition

The proposed algorithm for landmark recognition was applied to detect the landmarks appeared in various scales and direction on other images. Figure 12a shows the result of the detected landmark appeared from the nearest distance on the left image. Due to the large number of the matching points with lines connected between the feature on the left sample image and the corresponding matching ones on the right image, the landmark is easily detected. However, when the distance from the landmark to the camera is much farther, the number of the matching points is dramatically increased, so the landmark is hardly detected as shown in Figure 12b.
Figure 12a. Detected landmark appeared from the nearest distance on the left image
Figure 12b. Detected landmark appeared from the much farther distance on the left image
Once appearing on the same distance, the landmarks can be recognized effectively whilst their directions are always changing as shown in Figure 12c, in which the rotation angle is 90o and the rotation is 180o as shown in Figure 13d. In addition, the proposed algorithm can detect two or three landmarks simultaneously appeared on the captured image as described in Figs. 12e and 12f.
Figure 12c. The detected landmark rotated from the origin 90o
Figure 12d. The detected landmark rotated from the origin 180o
Figure 12e. The two detected landmarks simultaneously appeared on the captured image
Figure 12f. The three detected landmarks simultaneously appeared on the captured image
Assume that a landmark is partly obscured by obstacles, the recognizer can afford to detect landmark in case of having enough number of matching points. Figure 12g shows a landmark obscured by two others but it is still detected.
Figure 12g. The obscured landmark still detected by the algorithm
The range of recognition is strongly dependent on the number of characters on the sample image of each landmark and the nature of each character. Figure 13 shows the landmark with the most number of 2289 features in all analyzed landmarks, so the range of the successful recognition is largest with from 100 mm to 2735 mm. On the other hand, the landmark with the least number of 924 features as shown in Figure 14 can only be detected from 300 mm to 617 mm. In some other cases, although the number of features of a landmark is small, the range of the successful recognition can be amazingly larger than one of the most detailed landmark due to its details are small changed following the distance, as shown in Figure 15.
Figure 13. The landmark with the most number of 2289 features
Figure 14. The landmark with the least number of 924 features
Figure 15. The landmark with the least number of features but the largest range of successful recognition
The object recognition algorithm presented in the paper is suitable for the requirements of the localization system due to the ability of detecting objects appeared in different scales and directions on other images. The range of recognition is strongly dependent on the number of features and the nature of each feature on the sample image of object, so the landmarks have to be chosen so that they can be suitable for each working environment.

4.2. Determination of Landmark Distances

The paper proposed the method of determining landmarks from landmark to the stereo camera system. The range of the distance measurement is strongly dependent on the resolution of the camera used to collect image data for localization. Figure 16 shows the relationship between the image resolution 240×320 and the range of landmark distances and produces the range of linear results is from 500 mm to 4000 mm. Figure 17 describes the result of landmark distance for resolution 480×640 and the range of linear results is from 500 mm up to 8000 mm. Thus, this addresses out that the large resolution produces the longer measured distances. In addition, the accuracy of the results is affected by the image resolution, in which Table 2 shows the percentage of the error according to the two values of resolution. In the resolution of 240×320, the error is up to 15.47%, but the error is dramatically decreased to 5.32% when the resolution 640×480 is doubled.
Figure 16. Measured distances for resolution240x320
Figure 17. Measured distances for resolution 480x640
Table 2. The percentage of error according to each value of resolution
     
Because the distance measurement task is only active when there is a landmark appeared on the image frame, the range of measured distance is actually the range of recognition. Therefore, the effect of image resolution to the percentage of error should be more noticeable. The configuration of the stereo camera used in the paper has the resolution of 640×480, so the percentage of measuring error is about 5.32%.

4.3. Estimated Locations

The results of stereo localization are directly dependent on the quality of the landmark recognition and the measured distance. More results of localization measured from the two different positions of robot in 2D space are represented. Figure 18 shows the first position of the robot in the dynamic environment. The environment with four landmarks at the coordinate values is predetermined as shown in Table 3. Moreover, the real distances from the robot to each landmark and the real location of the robot are shown in Table IV. Each position is localized five times with percentage errors and the average error of the localization is 4.0% are as in Table 5.
Figure 18. The first position of the robot in the dynamic environment
Table 3. The landmarks’ location in the 2D space
     
Table 4. The real distances from the robot to each landmark and the real robot location
     
Table 5. The first estimated location of the robot in five different cycles
     
In similarity, the second task gives the results of the localization in the second position of the robot as shown in Figure 19. Therefore, the average error of 3.6% at this position is approximately equal to the first position as shown in Table 6 and Table 7.
Figure 19. The second position of the robot in the dynamic environment
Table 6. The landmarks’ location in the 2D space
     
Table 7. The second estimated position of the robot
     
Because the accuracy of the localization is directly dependent on the landmark recognition and distance measurement, the range of the localization is the range of recognition, so the accuracy of the localization is the accuracy of the distance measurement process. More specifically, the range of localization is strongly dependent on the light condition of the environment and the characteristics of landmarks and the accuracy is also affected by the resolution of images captured by the stereo camera installed on the robot.
The two inverse challenges of a system designation are the design cost and the system performance, which can be relatively successfully satisfied with the proposed method in designing a localization system. In particular, the system in this paper is relatively simple to take advantage of a single stereo camera system in both recognition and measurement tasks of localization [22].
Furthermore, the error of localizing estimated results is as approximately small as other common methods of localization using sonar [3] with the error of about 3% and using laser [4], having the 5% error. One of the most drawbacks of relative localization is that the estimated location is strongly depended on the initial position which is meaningless in the case of misbehaving [23].
The proposed method in this research is to focus on the current landmarks pre-located on surroundings, so the determined location can replace the initial location of the robot. Therefore, in the cyclicality of the system to determine robot location, the cost of time consumption does not affect much in the behaviours of the full system. In addition to the system localized based on the available landmarks, it will be possibly developed for determining new landmarks suddenly appeared during navigating and then updating to the old map. Although the proposed algorithm uses less of hardware sources to reduce the total cost of full system, it can still be affordable to the requirement of robot systems.

5. Conclusions

The paper proposed the method of the robot localization in the indoor environment using the stereo camera system. In this project, Landmarks were used for robot localization, in which landmark distances and landmark recognition were worked out using geometric projection. In particular, the SIFT algorithm was employed for determination of landmark features with multi-dimensional vectors. In addition, the coordinates of each detected landmark were combined with distance to the robot for estimation of the robot location. With this proposed approach in this research, many results were obtained and obviously estimated with the small errors. Moreover, the results with error dependent on the resolution of each image were represented.

References

[1]  Roland Siegwart and Illah R. Nourbakhsh, "Introduction to Autonomous Mobile Robots", The MIT Press Cambridge, Massachusetts, London, England, 2004.
[2]  R. Gonzalez and F. Rodr´ ıguez, "Comparative Study of Localization Techniques for Mobile Robots based on Indirect Kalman Filter", IEEE Computer Society, pp. 253-258, 2009.
[3]  Drumheller, "Mobile Robot Localization Using Sonar", IEEE transactions on Pattern Analysis And Machine Intelligence, pp. 325-332, 1987.
[4]  Joachim Horn, "Localization of a Mobile Robot by Matching 3D-Laser-Range-Images and Predicted Sensor Images", IEEE, pp. 345 - 350, 1994.
[5]  Carlos F. Marques, "Vision-Based Self-Localization for Soccer Robots", pp. 1193 - 1198, 2000.
[6]  Stephen Se and Jim Little, "Vision-based Mobile Robot Localization And Mapping using Scale-Invariant Features", IEEE, pp. 2051 - 2058, 2001.
[7]  Theo Gevers, "Color-based object recognition", Pattern Recognition Society, p. 453—464, 1998.
[8]  Fukunag, "Introduction to Stastical Pattern Recognition Second Edition", Computer Science and Scientific Computing, 1990.
[9]  Argyle, "Techniques for edge detection", Proceedings of the IEEE, pp. 285 - 287, 2005.
[10]  David Lowe, "Distinctive Image Features from Scale-Invariant Keypoints. International Journal of Computer Vision", IEEE, 2004.
[11]  Jen-Shiun Chiang, "A Stereo Vision-Based Self-Localization System", IEEE Sensor Journal, 2013.
[12]  W.Z. Wenzheng Chi, Jason Gui and Hongliang Ren, "A Vision-based Mobile Robot Localization Method", International Conference on Robotics and Biomimetics, pp. 2703 - 2708, 2703 - 2708.
[13]  Hui Zhang and Lingtao Zhang, "Landmark-Based Localization for Indoor Mobile Robots with Stereo Vision", Intelligent System Design and Engineering Application (ISDEA), 2012 Second International Conference, pp. 700-702, 2012.
[14]  Herath and Kodagoda, "Simultaneous Localisation and Mapping: A Stereo Vision Based Approach", Intelligent Robots and Systems, pp. 922 - 927, 2006.
[15]  Kuen-Han Lin and Chieh-Chih Wang, "Stereo-based simultaneous localization, mapping and moving object tracking", Intelligent Robots and Systems (IROS), 2010 IEEE/RSJ International Conference, pp. 3975 - 3980, 2010.
[16]  Donghoon Kim and Donghwa Lee, "Artificial landmark-based underwater localization for AUVs using weighted template matching", Intelligent Service Robotics, pp. 175-184, 2014.
[17]  Kwiwoo Pack and JeongGeun Chae, "A Landmark Based Localization System using a Kinect Sensor", Transactions of the Korean Institute of Electrical Engineers, 2014.
[18]  Sunglok Choi, "Triangulation Toolbox: Open-source algorithms and benchmarks for landmark-based localization", Robotics and Automation (ICRA), 2014 IEEE International Conference, pp. 6440 - 6446, 2014.
[19]  Yicheng Bai, "Landmark-based indoor positioning for visually impaired individuals", Signal Processing (ICSP), 2014 12th International Conference, pp. 668 - 671, 2014.
[20]  Lindeberg, "Feature Detection with Automatic Scale Selection", Int. J. of Computer Vision, 1998.
[21]  Nguyen Tan Nhu and Nguyen Thanh Hai, "The method of Distance mesuarement using Stereo Camera", The 2nd International Conference on Green technology and sustainable development 2014 2014.
[22]  Lopez and Perez, “Low cost indoor mobile robot localization system”, Intelligent Systems Design and Applications (ISDA), pp. 1134 - 1139, 2011.
[23]  Puneet and Roumeliotis, “Robust localization using relative and absolute position estimates”, Intelligent Robots and Systems, pp. 1134 - 1140, 1999.