A New Method for Recognizing Objects in Photos Taken by Smart Phones

A New Method for Recognizing Objects in Photos Taken by Smart Phones

1Jaegeol Yim, 2Jaehun Joo, 3Silvana Trimi 1, First Author Computer Engineering Department Dongguk University, Gyeongju, Gyeongbuk,

780-714 Republic of Korea, [email protected] *2,Corresponding Author Information Management Department Dongguk University, Gyeongju,

Gyeongbuk, 780-714 Republic of Korea, [email protected] 3Management Department, University of Nebraska – Lincoln, Lincoln, NE, 68588, USA,

[email protected]

Abstract With the advances in computing power and storage capacity of smart phones, cloud computing and

wireless networks, new apps and services are being developed and offered to the users of smart phones. One of these new services is the mobile augmented reality system (MARS). MARS integrates virtual information into the physical environment where a smart phone is located, allowing smart phone users to receive virtual information as if it is part of their physical surroundings. Several studies have suggested techniques/methods to match the best relevant information with the physical surrounding/object. Most of them identify the objects on the photo based on image comparison of the photo taken by a smart phone with images stored in a database. This method however is slow, as it takes time to do the image comparison. In this study we propose a new method, which can identify objects in a smart phone photo much faster and with a higher degree of accuracy, by using the phone’s sensor data and electronic maps.

Keywords: Augmented Reality, Object Recognition, Context Aware, Virtual Reality, Smart

Phones 1. Introduction

With the advent of rapid advances in information and communication technology (ICT), the computing power and the storage capacity of smart phones have accelerated. The recent developments in the speed and bandwidth of networks, and the increased usage of cloud computing and storage, have elevated competition in offering many new applications and useful services on smart phones. The context-aware service [4, 5] and mobile augmented reality service (MARS) [7, 13] are two of the most recent and popular services. The context-aware service provides the user with the most appropriate service, considering the user’s surrounding physical environment, location, personal interests, and so on. MARS has added graphics, texts, and sound to a photo image. For example, in MARS shown in Fig. 1, an arrow labeled “To CSE 591” directs the user to reach CSE 591 (Computer Science & Engineering Building Room 591), while the two windows inform the user that “CSE 502 is the grad office” and “CSE 503 is a conference room” [6]. The arrow and windows are not part of the picture. They are augmented to the picture to deliver valuable information to the user.

Such augmented reality is derived if the system identifies the objects in the picture, determines the relevant content, and displays the information in the screen to the user. Thus, identifying an object in the picture taken by the phone camera is important for both context-aware service and mobile augmented reality. Numerous studies have developed several techniques to identify objects and are used in several applications. For example, in an educational mobile application for children, when a child takes a picture of a marked object in a museum or a park, the application identifies the item, retrieves the information related to the item from the Internet and displays it [9]. In a museum guide system, when a visitor takes a picture of an exhibited item, the item is identified and related multimedia content is displayed [2].

Currently, all the existing object recognition techniques are based on image comparison, comparing the photo with the images stored in the database [3, 15]. This image comparison method of comparing pixel values, pixel by pixel, is time consuming and error-prone. For example, if image B is one pixel

A New Method for Recognizing Objects in Photos Taken by Smart Phones Jaegeol Yim, Jaehun Joo, Silvana Trimi

Journal of Convergence Information Technology(JCIT) Volume8, Number13, August 2013

289

shiftimag

In

exisRathEleccomphasmapthe whicthe e

Aapprare i

2. L

T

topiobje1) I

dadacaevimaua uthtagth

2) FexhevethHowe

ft off image Age comparison

F

n this paper, sting photo reher, it leveragctronic maps o

mpared to othese. The procesp (around the object). The ech are no morexisting photoAfter this intrroach regardinintroduced. Fi

Literature R

This paper proc is important

ect recognitionImage compaatabase. Extraatabase [1]. Tamera is narrven developemage from thudio to the phuser takes a pe moment ofgged images,e earth moveFeature basedxample, if theeight, DNA sertical, and die feature vaowever, featue conclude tw

A, then, even tn method will

Figure 1. An E

we propose aecognition meges on the higof interesting er apps, such ssing time to rlocation of thexecution timere than a few ho recognition mroductory sectng an object rinally, we disc

Review

oposes a new mt and, as such,n can be grouparison methoacting the imThe scope ofrowed down d a new pate server is id

hone. Anotherpicture of a bf taking the , using a comr’s distance md method: Ae object is a sequence, etciagonal pixel alues of two ure based imawo images are

though the objerroneously c

Example of M

a method whithods. This m

gh accuracy odomains are as marker-ba

recognize an ohe smart phonee of this procehundreds, thusmethods. tion, we presrecognition mecuss implicatio

method of rec, it has been t

ped as followsd: This meth

mage features,f the imagesby analyzing

ttern discovedentified, ther mobile applbuilding and picture, then

mbination of measure for s

A feature canperson, featu

c. A feature values of thepersons, fo

age comparise match when

ject in image conclude that t

Mobile Augm

ich is radicallmethod does nf the currentwidely availa

ased applicatioobject is mostle) that interseess is proports making the p

sent the literaethod is propoons of our stud

cognizing objethe focus of m: hod compare, an ID is cres to search ag the phone’sry algorithm

e server sendlication was dsends it to a

n the server the scale sal

scene matchinn be anythingures can be: for image coe picture. Byr example, w

son is also timn 4 feature po

A is identicalthe two image

mented Reality

ly faster and not use the psmart phones

able, includingons, our methly the time it tect with the linional to the nproposed meth

ature review rosed and expedy and conclud

ects when a smmany studies. M

es the photo eated for a giand to compas sensor data

m for noisy imds the scene ddeveloped for remote servmatches the

liency algoritng. g that is use

the color ofomparison causing the rig

we can deteme consuminoints match. S

l to the objectes are not the s

y (Source: [6

more accuratehoto image c

s’ sensors andg most large shod does not rtakes to find thne-of-sight (of

number of elemhod radically

related to oureriment resultsde.

mart phone poMost current m

with the imaiven image aare with a pa [3, 15]. Onmages [8]. Odescriptions r building recer along withphoto with

thm for featu

d to describef face, eye oran be the sumght feature, anermine if theng. For exampSuppose furth

t in image B, same.

])

e than any ofcomparison atd electronic mstructures. Herequire an offhe elements off the camera wments in the mfaster than an

r study. Our s of our appro

oints at them. Tmethodologies

ages stored iand is stored hoto taken b

ne of the studOnce the cloin both text

cognition [19h its GPS datthe stored G

ure matching

e an object. r hair, weighm of horizonnd by comparey are identiple, suppose her that a cam

this

f the t all.

maps. ence, fline f the with map, ny of

new oach

This s for

in a in a

by a dies

osest and ]. If ta at

GPS-and

For ht or ntal, ring ical. that

mera


290

image has 10 feature points and the map image has 32 feature points. Then there will be 4 billion different possible 4-point correspondences to check and compare [6].

3) Model based methods [14, 17]: Before starting the recognition process, a model based method must build and store models of many objects of interest. A model is human constructed representation of a real world object. For example, for building recognition, we should have models of buildings. A 2-D model of a building, for example, might be a rectangle. During the recognition process, a segmentation algorithm is applied, where (in the most primitive segmentation algorithm) the pixel values of two adjacent pixels are compared and grouped together, if they are not so different; otherwise they are separated. As the result of the segmentation, the boundaries of groups are obtained. If a boundary is similar to the model of an object, the recognition process determines that the boundary is the object. For example, if a boundary is similar to a stored rectangle (model), the method concludes that that is the building. Thus, a model based building recognition method must develop at least one model for each of the buildings in the domain of practice, which is time consuming and, along with the processing time of comparison, makes this method slow.

4) Landmark based methods [12, 16]: This method creates a database of landmarks before running the recognition process. Landmarks can be different objects, such as a big statue, a huge sculpture, and a large building. A landmark can also be a part of a building. If a landmark is in the photo, then we can conclude that we are around there. For example, the administration building might have a clock (landmark) at the top center of the building. The landmark based method concludes that the building in the photo is the administration building if it finds a clock in the photo.

5) Markers method: This method consists of two phases: the offline phase and the online phase [10, 18]. During the offline phase, markers (Quick Response (QR) code tags) are attached on each of the interested objects in the application domain, and the pair values for marker-ID/object-ID are recorded in a database. A picture of a marker, taken during the on-line phase, is compared (by the application) with every marker pair value in the database. The item’s information is determined and given back to the user, by retrieving the pair in the database that matches the image taken by the camera. The user then, by taking a picture of an item/marker (QR) in a store, for instance, can see where else and for how much that item is being sold. For example, in Table 1, if the picture taken corresponds to the marker (0 1 0 0 0 0 1 0 0 0) then object5 will be returned.

Table 1. Database Example (built by the app’s developer during the off-line phase)

Marker ObjectID 0 1 0 0 0 0 1 1 0 0 0 0 0 0 0 0 1 1 0 0 0 1 0 1 0 0 1 1 0 0 1 1 0 0 0 0 1 1 0 0 0 1 0 0 0 0 1 0 0 0

... 0 1 0 0 0 1 1 1 0 0

object1 object2 object3 object4 object5

... objectn

Even though some researchers may claim that the marker-based application is fast and the most

efficient image analysis method, the offline phase takes time and therefore it is a drawback. In addition, this method cannot recognize the object unless the camera points directly at it and takes a very clear picture of QR code (which is 2D-kind barcode).

Some other studies have tried to overcome the limitations in the processing speed and accuracy of recognizing objects. Thus, those that use the indoor positioning infer the user’s context through a decision tree method. This method uses data obtained through the Wireless Local Area Network,(WLAN) and a Bayesian network inference scheme [11, 20]. An augmented reality smart phone application was developed by using the indoor position method to estimate the position and the pose of the camera [13].

As we can see from these discussions and examples, most of the existing methods for picture recognition rely on image comparison techniques. Even though some of them use GPS data to narrow down the number of images to be compared with the photo taken, all the current image comparison


291

techniques rely on the photo recognition at the final stage, and therefore are very time-consuming. In this study, we propose a much faster method of recognizing objects on photos taken by a smart phone, based on the highly accurate sensors of today’s smart phones and on electronic maps.

3. The Proposed Object Recognition Method

Today’s smart phones are equipped with GPS, which measures the phone’s location with an error as little as 10 meters, and with compass, which measures the phone’s orientation with an error of less than two degrees. Our proposed method uses the phone’s location and orientation to first calculate the line-of-sight of the camera with the object. The line-of-sight of the camera indicates the line connecting the camera and the object taken by the camera. The user’s location is provided by the phone’s GPS. The orientation information is drawn from azimuth, pitch and roll values, all provided by the compass of the phone. The required time for obtaining the sensor data and calculating the line-of-sight is so short that it can be ignored to estimate in our method. Next, the proposed method finds the element of the map that intersects with the line-of-sight. An electronic map consists of lines, polygons, arcs, circles and so on. The method identifies the object in the photo by investigating those intersecting elements of the line-of-sight with those in the map. The intersecting element whose distance from the camera is closest to (or the same as) the focus distance of the camera is the one that represents the object in the photo. Methods for finding the focus distance of the camera are available, such as the getFocusDistances() for Android phones.

3.1 Accuracy of Sensors

The proposed method of recognizing the object in a picture taken by a smart phone camera stems

from the fact that nowadays sensors on a smart phone are very accurate. Before running our proposed algorithm, we first tested the accuracy of the sensors on a smart phone by performing the following: (1) location test – by using LocationManager, and (2) compass test –by using SensorManager.

First, we specifically chose two points A and B, with the following coordinates on the Google Map: A: 129.196990 (longitude), 35.861779 (latitude) B: 129.196 472 (longitude), 35.861650 (latitude).

(1) To measure the accuracy of the location measurement, we ran the LocationManager on a Galaxy S2 smart phone several hundred times at the two chosen points, A and B. First, we calculated the averages of the returned values, for both A and B. Then, we calculated the distance (error) of the averages from point A and B (Table 2).

Table 2. A Summary of Results of LocationManager Tests on a Galaxy S2

Location A Location B longitude latitude longitude latitude

Google Coordinate 129.196990 35.861779 129.196472 35.861650 Measurement

(Average) 129.1968912 35.8619749 129.1964195 35.8616350

Error (degree) 0.00009880 -0.0001959 0.0000525 0.0000150 Error (meter) 7.528311 11.493384 3.843647 10.243619

It is known that the distance from longitude 129 degree to 130 degree, at latitude 36 degree, is about 91.29 Kilometers. The distance from latitude 35 degree to 36 degree is about 110.941 Kilometers. Thus, the average error of the measurements obtained at location A (B) is about 13.73948193 meters (10.94099438 m). We did other tests from other locations, and based on the overall measurements obtained, we concluded that the average error of LocationManager on Galaxy S2 was approximately 11.52493934 meters.

(2) To measure the accuracy of the compass on the Galaxy S smart phone, we performed several experiments where we measured azimuth of Galaxy S when it was aimed at a point C(B) from point A (Fig. 2). Point A is the spot where we stood and ran the SensorManager many times. C and B were


292

theby

Wan

S

Slop

T

1, thcomthe betwazimmea

e points to why the start poin

Where Atan stannd C (B) (the e

(AtoC) Slope la(AtoB) pe

The results of he azimuth we

mpass points tocalculated az

ween the magnmuth of our Gaans that the ave

La

LonAltitudHeigh

Dis

AveMeasu

MeasuredCalc

PitchE

hich we faced nt A and the en

nds for arc tanend point), wa A of latitude

l -A of atitude

F

the accuracy te obtained foro is not the sazimuth for thinetic north analaxy S for Aterage error of

From PMeasured azim

CorrectedEr

atitudengitude de (meter) ht (meter) stance

erage of urements

d Pitch (Eq. 2) culated h (Eq. 3) Error

to read the aznd point C (B)

Azim

ngent, and “ths calculated by

of latitude -A

*B of latitude

Figure 2. Com

tests are showr AtoC was 22ame as the (ges difference.

nd the grid (mtoC would be

f measured azim

Table 3Point A muth (average)d azimuthror

Table 4.Point A

35.86162 129.19627140 1.67 0

zimuth at point) (AtoC; AtoB

At90muth e slope of a liny the followin l/110941*C

long/110941*

mpass Accurac

wn in Table 3.2.95307631. Heographic) maUsing a milit

map) north wa30.45307631 muths is abou

3. Azimuth TePo

) 284.298.14.5

. The Pitch AcP

35.861776 129.19578341 9.71 47.79314721 root of ((35.8+(129.196271+((40+1.67)-(-100.3

10.3 10.71086981 10.3 - 10.710= - 0.4108698

t A. The azimB), was calcula

of(Slope*tan

ne,” defined bng equations:

C of longitude

lo-B of gitude

cy Test at Poin

Using GoogleHowever, the map north. Thustary compass,

as about 7.5 dand for AtoB

ut 14.6 degrees

est Results oint B 147886 7303745 824885

ccuracy TestPoint B

= square 6162-35.861771-129.195783)2(41+9.71))2

86981 81

muth of the lineated by the foll

line) thef

by two points,

of longitude-C

A of ongitude

nt A

e map coordinmagnetic norths, we had to a, we found th

degree. TherefB would be 298s (Table 3).

Point C 15.80692130.4530763114.64615531

Poi35.86206129.196445 15.51

6)22

53.37266

-110.6

20.6 19.44250

20.6 19.44250= 1.1574

e-of-sight, deflowing equati

A(the start po

91290*A f

91290*

nates and Equah that a (phon

adjust the valuhat the differefore, the corre8.7303745, wh

1 1

int C 63 499

6842

0491

- 0491 49509

fined on: (1)

oint)

ation ne’s) ue of ence

ected hich


293

When we collected the azimuth values, we also collected pitch values (Table 4). The pitch value is the degree/angle of the elevation of the line-of-sight determined by the camera and the object. Android provides classes and methods (something similar to library functions) that return sensor values, and SensorOrientation is the method that returns azimuth, pitch and roll. SensorOrientation puts its return values in Values [ ] (variable name), an array of real numbers of size 3 cells. When we hold a Galaxy S in the portrait orientation, Android’s SENSOR_ORIENTATION returns a pitch value at values[1] (azimuth at values[0] and roll at values[2]). When the screen of a Galaxy S faces the sky (laid parallel to the earth), the returned value for the pitch (variable values [1]) is 0. As the top edge of the phone is pulled toward the holder, when the phone is in straight portrait orientation, the value of the pitch (variable values[1]) decreases to -90.

Thus, we can obtain the angle of elevation (in degrees) from variable values[1] read from the Sensor, with the following equation:

(2) values[1]--90elevation of angle Measured

The heights of points are the distances between the ground and the phone. In our tests, they were

specifically1.67 for point A, 9.71 for point B, and 15.51 for point C. Distance is the distance between A to A, A to B and A to C. The height and distances were used to calculate the estimated pitches, with the following formula:

(3) /elevation) of Angle( heightancedistSin

Thus, the measurement errors of the pitch values are calculated as the difference between the

average of measured pitches (Equation 2) and the calculated pitches (Equation 3). As shown in the Table 4, Values [1] measured at point A were: for the phone aiming at point B:(-100.3) and aiming at point C:(-110.6). Using Equation 2, measured pitches were about 10.3 for B and 20.6 for C. Using Equation 3, calculated pitches were about 10.71086981 for A, and 19.44250491 for B. Therefore, the pitch measurement error was 0.41086981 degree for point B and 1.15749509 degree for point C (Table 4).

3.2 The Proposed Algorithm

This paper proposes a new method of recognizing objects in photos taken by a smart phone camera.

This is new because it uses electronic maps for the first time. Thus far, the accessories attached to a mobile phone have not been accurate enough to locate the phone on a map, to determine the camera orientation and to measure the focus distance of the camera. Therefore, all the existing picture recognition systems rely on image processing techniques. Many of them do make use of GPS and compass data to narrow down the scope of the images in a database to be compared with the phone photo. However, they all rely on image processing techniques at the final stage of photo recognition, which is time consuming, and therefore slow. Today’s smart phones’ sensors are highly accurate. The measurement error of a GPS (compass) on a recently released smart phone is about 10 meters (less than 15 degrees). Thus, by taking advantage of the highly accurate sensors of current smart phones, we propose a new method for object recognition, that completely avoids time-consuming image processing and instead it employs electronic maps. Since we used an Android phone to design and test this method, the below explanations of the steps in the proposed method do incorporate specifics related to this types of phones.

The object recognition method proposed is done as follows: 1) Obtain an electronic map of the physical area that relates to the application system that uses the map.

For example, when considering an application for campus guide, we need a drawing (in AutoCAD) of the campus and its buildings. These drawings, especially for the big buildings, usually already exist. However, the level of the details of the drawing is closely related to the purpose of the application. If we want a campus guide only at the building level(being the lowest level), the object recognition process should be able to identify the name of the building in a photo. An illustration of an electronic map for this application is shown in Fig. 3. The electronic map consists of “edges,” representing the outlines of buildings (Natural Science Building, Gymnasium Building, Student Hall, and so on). Edges are represented by a pair of points, start and end, and each point is represented by a pair of real numbers (longitude, latitude).


294

Natural Science 129.198050, 35.862546 ... 129.198050, 35.862546 Gymnasium 129.196986, 35.862514 ... 129.196986, 35.862514 Students Hall 129.195878, 35.862025 ... 129.195878, 35.862025 ...

Figure. 3 An Illustrative Electronic Map of a University Campus

2) With the phone’s sensor, we determine the location of the smart phone. Most of the smart phones are equipped with a GPS receiver, WiFi device, 3G or 4G communication device. In Android, the “LocationManager” class provides a method that identifies the location of the smart phone, by using data from these devices.

3) We collect the orientation data, namely azimuth, pitch, and roll values. In Android, “SensorManager” class provides a method that measures these values.

4) Using the “Camera” class, we determine the focus distance of the camera. In Android, the “Parameters” class, nested in the “Camera” class, provides the getFocusDistance method that calculates and returns the focus distance.

5) Finally, we execute our proposed “Object Recognition Algorithm” (explained in the next section) to identify the object on the camera.

3.3 Proposed Object Recognition Algorithm

The proposed algorithm steps are as follows (Fig. 4): - Find the line-of-sight of the camera with the object, by using the location and orientation data. - Find the element (edge) of the electronic map intersecting with the line-of-sight. - Find the distance between the phone and the element (edge). - If the distance is close to the focus distance, then we conclude that the building (the point of

interest) whose outline contains the element (edge) is the object in the photo. If there is no such element, then the algorithm concludes that the object on the photo is not something that is represented/found on the electronic map and returns Nil.

objectRecognitionAlgorithm (fd, location, azimuth, pitch, eMap) // fd: focus distance ; // location: (longitude, latitude, altitude) Step 1: With location zyx ,, , and azimuth, calculate the following formula whose slope is obtained

from the azimuth value. ybax ….. (Eq. 1)

Step 2: Find S = { s | s is an edge in eMap which intersects Eq. 1}. Step 3: If S is an empty set then return NIL Step 4: Find the edge “e” in S which is closest to the location. Delete “e” from S. Let the building whose outline contains “e” to be a “candidate building,” and the intersect point of

“e” and Eq. 1 be (x’, y’). Step 5: Let the distance between (x, y) and (x’, y’) be d. Step 6: If “z+d*tan(pitch)” is between the bottom and top of the “candidate building” then go to

Step 7. Else, go to Step 3. Step 7: Let the distance between (x, y, z) and (x’, y’, z+d*tan(pitch)) be dist. If (|dist – fd| < threshold), where threshold is a small number representing the error of


295

getFE

B

supppropprog

3.4

(1) I

to (

FocusDistanceElse return NIL

F

Because we teport getFocusDposed algorithgram, instead o

Implementa

In onCreate, thget the orientA) LocationM

(

ethen return “cL.

Figure

F

Figure 6. Test

ested the propDistance(), the

hm: if “z+d*tanof jumping to

ation of the “

he program ination data.

Manager is inst(LocationMan

candidate build

e 4. The propo

Figure 5. The

t of the Reliabi

posed algorithe program we n(pitch)” is beStep 7, return

“Object Rec

nstantiates:Loc

tantiated with nager)getSyste

ding”

osed “Object R

Object Recog

ility of the Pro

hm using an Adeveloped and

etween the botns the “candida

cognition Alg

cationManager

the following emService(Con

Recognition A

gnition Algorit

ogram in a Vir

Android SDKd tested was sttom and top oate building” a

gorithm” M

r to get the loc

statement: ntext.LOCATI

Algorithm”

thm

rtual Environm

K, version 2.2lightly differeof the “candidand terminates

Method

cation data, an

ION_SERVIC

ment

2, which does ent from the abdate building,”s at Step 6.

nd SensorMan

CE);

not bove ” the

ager


296

UlocarequThe

(explnew(2) T

thi

Thigh

4. E

In

virtucamand thenvalufor a

Fthe e

A

expe

Using the getBation provider uestLocationU

LocationListeB) SensorMalanation. In ou

w azimuth and The IdentifyOis activity find

- the c- deter- retur

This proposedhly accurate as

Experiment

n order to verual situation, s

mera is located the coordinat

n the result shoue and checkedall the azimuthFigure 7 proviedge intersecti

F

After confirmieriments to te

BestProvider of the smart

Updates methoener returns thanager usage ur program, if pitch values.

Object, shown ds: closest edge thrmines the buirns the buildind method of rs the sensors o

t Results

rify the reliabshown in Fig. at the dot at th

tes of others aould be Rightd if the result h values, thus ides a snapshoing the line-of

Figure 7

Figure 8. A Sc

ing the reliabest our object

method of Lophone on whd of Location

he set {longituis similar to

f SensorManag in Fig. 5, is th

hat intersects wilding whose o

ng name, if therecognizing thon the smart ph

bility of the im6. In the figurhe center. The

are all shown i, and so on. Wis correct. Froproving that t

ot of the smarf-sight is foun

7. Snapshot of

creenshot of th

bility of the pt recognition

ocationManaghich the progranManager withude, latitude, a

the Locationger.SENSOR_

he workhorse o

with the line ofoutline containe line of sight hhe object on ahone today ha

mplemented Are, Top, Bottome coordinates oin the figure.

We ran the appom this test, wthe program isrt phone, runnd to be the “T

f the Smart Pho

he Galaxy S2

proposed progprogram, on

ger, we find tam is running

h the best provaltitude}. nManager’s, th_ORIENTATI

of this program

f sight of objens the edge hits the buildina picture takenave become qu

Android program, Right and Lof the start poIf the azimuth

p and fed an awe found thats reliable. ning the prograTop”.

one Running t

Running the P

gram with via Galaxy S2

the name of tg. In onResumvider, and the

herefore we oION is change

m. Making use

ect-camera

ng. n by a smart

uite accurate.

am, we tested Left representint are (129.19h value is betwarbitrary numbthe implemen

am. The text b

the Program

Proposed Prog

irtual scenario2 phone, by t

the most accume, we invoke

LocationListe

omit the detaed, then we ob

e of MathMeth

phone camer

the program t buildings and94977, 35.862ween 60 and ber as the azimnted app is cor

box indicates

gram

os, we performtaking picture

urate e the ener.

ailed btain

hod,

ra is

in a d the 2395) 130,

muth rrect

that

med s of


297

buildings on a real university campus. The results, again, confirmed the accuracy of our program to correctly identify objects in the picture. The screenshot of Galaxy S2 running the proposed program, shown in Fig. 8, shows the “Result: StudentsHall,” which means that the object in the picture was correctly identified as “Students Hall.” The proposed algorithm was tested by taking pictures and identifying buildings of many shapes and sizes, and at several distances. Running the proposed program, we found that it is 100% accurate when the distance between the building and smart phone was less than 50 m; and about 85% correct when the distance is 90m or more. These results definitely prove the reliability of our proposed algorithm for building identification.

However, most of the campus buildings are large with widths greater than 70m and heights taller than 20m. Therefore, to verify the reliability of the proposed methods for identifying any object, in addition to buildings, we performed experiments with all other types of objects in different shapes and sizes. Thus, we took pictures from different distances, of an elephant statue with width of 5m and height about 7m. The results showed that the average of successfully identifying the statue with our proposed program was: 100% when the distance from the camera to the object was less than 7m; 47% when the distance from the camera to the object was10m.Considering that in the real world pictures of objects are rarely taken in over 10m distance, we can conclude that our proposed object recognition method is quite reliable.

5. Discussion and Conclusion

This paper proposed a new method to identify objects in pictures taken by smart phones’ camera.

The current object recognition application methods are time consuming as they are based on camera image comparison with a large number of pictures stored in database. Our proposed method, using the orientation, location, and focus distance obtained via smart phones’ highly accurate sensors and devices (compass, GPS, and camera) and electronic maps, eliminates the comparison process, and therefore radically reduces the time to identify an object in the camera phone picture.

We used an Android Galaxy S2 phone to test the accuracy of the proposed algorithm because they are equipped with highly accurate sensors. Even though we use Android’s “GetFocusDistance,” in our object recognition algorithm, the function does not return any meaningful values when it is executed on Galaxy S or S2. Thus, the function was not used in the program we executed in our experiments. The requirement for highly accurate sensors to run the proposed program currently limits the usage of this method across mobile phones. In the very near future, Software Development Kit (SDK) and Android Development Tools (ADT) will be improved and smart phones will be equipped with much more accurate sensors. Then, our proposed algorithm will be even more accurate and the algorithm will be very useful and valuable across all smart phones in the near future.

There are positive implications of the proposed method for the existing businesses and opportunities for new ventures. It can open up many opportunities for new products and services. The automatic and fast recognition of an object enables businesses to provide smart phone users with more personalized and smart services. For example, since taking a picture of an object means that the person is interested in the object, relevant content or information on the pictured object can be automatically provided to the user. Travelers can get smart services, personalized content, or augmented reality for real-world objects, by simply taking a picture. Twitter already offers a service which adds a location to a photo or video. However, users must specify where the photo or video was taken. By integrating the method proposed in this research to the Twitpic service of Twitter, for example, we can remove the need for the user’s manual input and will make the service smarter.

ACKNOWLEDGEMENTS

This research was supported by Basic Science Research Program through the National Research

Foundation of Korea(NRF) funded by the Ministry of Education (NRF-2011-0006942). References [1] Abe, T., Takada, T., Kawamura, H., Yasuno, T., & Sonehara, N., “Image-Identification Methods

for Camera-Equipped Mobile Phones”, In International Conference on Mobile Data Management, pp. 372-376, 2007.


298

[2] Bruns, E., Brombach, B., & Bimber, O., “Mobile Phone-Enabled Museum Guidance with Adaptive Classification”, IEEE Computer Graphics and Applications, vol. 28. No. 4, pp. 98-102, 2008.

[3] Cipolla, R., Robertson, D., & Tordoff, B., “Image based Localization. In Int. Conf. on Virtual Systems and Multimedia”, VSMM, pp. 22-29, 2004.

[4] Dao, T., Jeong, S., & Ahn, H., “A novel recommendation model of location-based advertising: Context-Aware Collaborative Filtering using GA approach”, Expert Systems with Applications, vol. 39. No. 3, pp. 3731-3739, 2012.

[5] Espada, J., Crespo, R., Martínez, O., G-Bustelo, B., & Lovelle, J., “Extensible architecture for context-aware mobile web applications”, Expert Systems with Applications, vol. 39. No. 10, pp. 9686-9694, 2012.

[6] Hile, H., & Borriello, G., “Positioning and Orientation in Indoor Environments Using Camera Phones”, IEEE Computer Graphics and Applications, vol. 28, no. 4, pp. 32-39, 2008.

[7] Lee, J., Huang, C., Huang, T., Hsieh, H., & Lee, S., “Medical augment reality using a markerless registration framework”, Expert Systems with Applications, vol. 39, no.5, pp. 5286-5294, 2012.

[8] Lim, J., Li, Y., You, Y., & Chevallet, J., “Scene Recognition with Camera Phones for Tourist Information Access”, IEEE International Conference on Multimedia and Expo, pp. 100-103, 2007.

[9] Mitchell, K., & Race, N., “uLearn: Facilitating Ubiquitous Learning Through Camera Equipped Mobile Phones”, In IEEE International Workshop on Wireless and Mobile Technologies in Education, 2005.

[10] Mohring, M., Lessig C., & Bimber, O., “Video See Through Consumer Cell-Phones”, In International Symposium on Mixed and Augmented Reality (ISMAR 2004), pp. 252-253, 2004.

[11] Noh, H., Lee, J., Oh S., Hwang, K., & Cho S., “Exploiting Indoor Location and Mobile Information for Context-Awareness Service”, Information Processing & Management, vol. 48, no. 1, pp. 1-12, 2012.

[12] Oe, M., Sato, T., & Yokoya, N., “Estimating Camera Position and Posture by Using Feature Landmark Database”, In Scandinavian Conference on Image Analysis (SCIA 2005), pp. 171-181, 2005.

[13] Paucher, R. & Turk, M., “Location-Based Augmented Reality on Mobile Phones”, In IEEE Computer Society Conference on Computer Vision and Pattern Recognition Workshops (CVPRW 2010), pp. 9-16, 2010.

[14]Rosten, E., & Drummond, T., “Fusing Points and Lines for High Performance Tracking”, In International Conference on Computer Vision (ICCV 2005), pp. 1508-1515, 2005.

[15] Sato, J., Takahashi, T., Ide, I., & Murase, H., “Change Detection in Streetscapes from GPS Coordinated Omni-Directional Image Sequences”, In International Conference on Pattern Recognition (ICPR 2006), pp. 935-938, 2006.

[16] Skrypnyk, I., & Lowe, D., “Scene Modelling, Recognition and Tracking with Invariant Image Features”, In International Symposium on Mixed and Augmented Reality (ISMAR 2004), pp. 110-119, 2004.

[17] Vacchetti, L., Lepetit, V., & Fua, P., “Combining Edge and Texture Information for Real-Time Accurate 3D Camera Tracking”, In International Symposium on Mixed and Augmented Reality (ISMAR 2004), pp. 48-57, 2004.

[18] Wagner, D., & Schmalstieg, D., “First Steps Towards Handheld Augmented Reality”, In International Symposium on Wearable Computers (ISWC 2003), pp. 21-23, 2003.

[19] Yeo, C., Chia, L., Cham, T., & Rajon, D., “Click4BuildingID@NTU: Click for Building Identification with GPS-enabled Camera Cell Phone”, In IEEE International Conference on Multimedia and Expo, pp. 1059-1062, 2007.

[20] Yim, J., “Introducing a Decision Tree-based Indoor Positioning Technique”, Expert Systems with Applications, vol. 34, no. 2, pp. 1296-1302, 2008.


299

A New Method for Recognizing Objects in Photos Taken by Smart Phones

Documents

Transcript of A New Method for Recognizing Objects in Photos Taken by Smart Phones