Geo AR, the challenges of navigation from the web browser

Published in

Wemap

13 min readNov 4, 2020

We started a series of blog posts on the challenges of building a real-world browser, leveraging geo Augmented Reality (geo AR), at the technical level of course, but also in terms of human-machine interface, data, etc.

We talk about mobile sensors, maps, machine learning and databases, web and mobile development, orientation and positioning systems, buildings and cities, and much more :)

In this blogpost we will dive into the technical issues of the hybrid geolocation system we have set up to achieve geolocated augmented reality (Geo AR). We will explain how our system uses the fusion of data coming from smartphone sensors and also the mapping.

As a reminder, in order to carry out Geo AR, the most common approach consists in combining data coming from: (1) GNSS (GPS, GLONASS, Galileo, etc.) and (2) the smartphone inertial measurement unit (accelerometer, gyroscope, magnetometer). GNSS makes it possible to know the phone’s position while the inertial measurement unit (IMU) makes it possible to estimate the phone’s orientation (north, east, up, down, etc.). The combination of position and orientation is called geo-pose.

In our previous blogpost we did an extensive review of fusion algorithms to combine the IMU sensors and estimate the correct orientation of the smartphone in a web browser, you can read it here. In this new blogpost we will focus on estimating the geo-pose second component: the smartphone position.

How to estimate the position of a smartphone?

Over the past ten years or so, we have seen the birth of many applications that use geolocation (interactive maps, targeted advertising, local information, etc.). How is this position calculated? And how can we use it to do Geo AR?

The position of a smartphone

The position of a smartphone has to be considered within a well-defined frame of reference, the one we use for our maps and the Geo AR is the Earth reference frame. In most cases, a position on the Earth frame is defined by the geodetic system WGS84 (latitude, longitude, altitude). For example the position of Paris is defined by {latitude = 48.85 °, longitude = 2.34 °, altitude = 35m}. In our analysis, we will call a position defined on the Earth frame an absolute position. The absolute position of the smartphone can easily be retrieved if it is equipped by a Global Navigation Satellite System (GNSS).

GNSS Positioning and its limitations

Depending on the GNSS sensor integrated in the smartphone hardware, it is able to receive signals from different satellite constellations: GPS (American), GLONASS (Russian), BeiDou (Chinese) and Galileo (European). These signals are processed by the smartphone and then a trilateration algorithm deduces an absolute position. It is the propagation time of the radio waves between the satellite and the smartphone that is used to reconstruct the position.

*Illustration: visibility of GPS satellites from a point on Earth’s surface*

Data from GNSS can be of good quality (<5 meters) when the device is in an open space (a field, a beach, etc.). If the signal between the satellite and the smartphone is obstructed or deflected by an object (building, cliff, etc.), the position estimate will be altered. For example, in a city the average accuracy falls to 15 meters.

*Example of GPS signals bouncing on buildings (left) and of a GNSS data rough trace (right)*

Inside a building, using a GNSS becomes almost impossible. A smartphone user can get signals when close to a window, but will lose any signal when moving away from it. We must therefore find other ways to position the device in indoor use cases.

Other positioning technologies

In this article we will not describe in detail all the existing geolocation technologies but only those that can be implemented for Geo AR. If you wish to look into geolocation topics more extensively, you should read this overview from the INRIA research lab.

To date, GNSS is the only approach that allows geolocation on the whole planet. However, there are other geolocation systems that cover much smaller areas. In particular, they were created to geolocate objects where GNSS signals cannot be received. For example, WiFi routers and bluetooth beacons are devices that constantly emit radio signals, they can be used to determine the position of the smartphone. In particular, Google uses them (in addition to GNSS) for localization on the Android platform. Research benchmarks in scientific literature show an 5–7 meters accuracy of these approaches.

*Example of WiFi Trilateration with 3 access points (AP)*

Their reliability depends above all on the density of beacons installed and the environment in which they operate (hall, corridors, multi-storey, etc.).

Generally, these “local” systems return positions defined in a local frame chosen by the installer. The installer defines an origin point as well as an orientation, then all returned positions are relative to that origin point, for example {x = 53.2m, y = 18.9m, z = 1.5m}. The advantage of this type of system is that when the position and the orientation of only one of the local points are known in the Earth frame — a step called georeferencing — , the system is able to estimate all the future positions in the Earth frame. We call this type of positioning system: indirect absolute.

The specific requirements of AR Positioning

Until today, geolocation systems have mostly been designed to visualize the user’s position on a 2D map. The position is often represented by a blue dot and a confidence circle which corresponds to the uncertainty of the calculation. Using the systems described above, with an average positioning accuracy of 5 to 15 meters, users can quickly understand their location on a map. The circle of confidence is at least as important an element as the blue dot, because it allows the user to theoretically compensate for a possible error of the positioning system.

Now let’s take a look at positioning in Geo AR. As a reminder, the principle is to offer virtual content (in a 3D rendering engine) which overlay the real content (the video stream from the camera). To provide a truly immersive experience, the challenge is to estimate the geo-pose of the camera as precisely as possible so that the virtual content and the video stream are perfectly aligned on screen, in other words, they have to be in the exact same referential.

There are two main reasons why this issue is absolutely critical in AR:

Firstly, unlike a 2D map and its accuracy visual feedback, there is no way to visualize imprecision through a UI (User Interface) element in Augmented Reality. The geo-pose estimated by the positioning engine is directly used by the rendering engine to align the virtual 3D scene. An error of a few meters or a few degrees in the positioning system could result in a great inaccuracy in the virtual environment: shifted objects, missing data, a path that goes into a wall… The experience can be degraded and the service provided to smartphone users, made void.

*Illustration showing the impact of a wrong geo-pose estimation in GeoAR*

The second issue arises from the low frequency of position updates. Typically, a position from one of the absolute systems described above is received every 3 seconds, which corresponds to an update of the position in the virtual scene every 3 seconds as well. On a 2D map, a small jump in position every 3 seconds does not pose too much of a visualization problem; however, in Geo AR, you need a much higher frequency to make the experience smooth and immersive.

This is why relative positioning systems are indispensable for a successful Geo AR experience. A relative positioning system is a method of navigation that consists of estimating the new position of a device by using its last known position and a measure of the device’s displacement, using data returned by the sensors. These systems are both precise and fluid but are of no use for real-world navigation if they are not tied to an absolute system to provide a position in the Earth frame.

The most common system is Pedestrian Dead Reckoning (PDR). It works by combining data from a pedometer (step detection and step size) and a magnetometer (compass) in order to reconstruct the path of the device. For example: {one step of 75cm, at 28 ° north}, 500ms later: {one step of 73cm, at 26 ° north} …

*Illustrations showing the basic working of Pedestrian Dead Reckoning (PDR)*

There is a second relative positioning system that is getting more common in our smartphones: it is visual odometry with IMU or Visual Inertial Odometry (VIO). The device calculates a new position using the video feed from the camera, the accelerometer and the gyroscope. These types of algorithms can be found natively in libraries like ArCore (Google) and ArKit (Apple).

The PDR makes it possible to obtain a new position with each new step, i.e. around 2Hz. For the VIO, the refresh rate is even higher, a new position is estimated at around 30Hz.

The fusion of relative and absolute positioning systems then becomes very interesting. This allows, on the one hand, to have a position update much more frequently than the absolute positioning alone, and on the other hand, to isolate the areas of uncertainties created by the jumps of the absolute system to improve the precision.

Position estimation in web browsers

Implementing the approaches described in this blogpost is always feasible in a native iOS or Android application (although the analysis of WiFi signals for geolocation is not available on iOS). Web browsers on the other hand provide very little sensor access to developers. We saw in our previous blogpost how to use the devicemotion, deviceorientation and deviceabsoluteorientation events to obtain data from the IMU. The web consortium regularly offers new specifications to open access to sensors (Geolocation API, Wifi Information API, Web Bluetooth API, Sensor APIs, WebXR Device API, etc.), but unfortunately, web browsers do not always integrate them. Only the Geolocation API and events “device[…]” are generic enough to be usable by a web app today. We will come back to their uses in the next section.

Constraints

Although it is possible to retrieve some data for geolocation through the functions available on browsers, it is not always easy to use them to create a positioning engine specifically for Geo AR:

The Geolocation API enables subscription to position updates using the watchPosition method. This position, in WGS84 format, can come from different technologies (GNSS, WiFi, Cellular, IP?) or even from the fusion of some of them. Unfortunately, it is impossible to know its exact source, the operation of the Geolocation API is a black box. This complicates the development of geolocation algorithms because, since it is impossible to separate the sources, the fusion with other technologies can only be crude. For example, we do not rely on the watchPosition function to know if the user is in a building or outside; unlike the native GNSS sensor which, in the absence of returned data, leads us to assume that the user is inside a building (because the walls block signals).
To develop a PDR algorithm it is necessary to have access to the accelerometer, the gyroscope and the smartphone orientation. This is possible but the constraints are the same as those described in an earlier blog post.
While Google promotes the integration of Arcore on the web, there isn’t much of an off-the-shelf VIO API for the web that is usable today. We might want to rebuild these algorithms, but their complexities and the limits of access to sensors due to the web make their implementation inefficient.
Web workers allow web apps greater flexibility thanks to the support of multi-threading. However, neither the IMU nor the camera are accessible in a web worker. Although the data can be transferred by message to a web worker, if the main thread does a big calculation which blocks the sending of messages, a lot of data will be lost and some algorithms requiring high frequency data input like those for rebuilding the orientation cannot guarantee good results.

Despite these limitations, the creation of a positioning engine for the Geo AR remains possible: using map data is a key to improve performance.

Map-matching: maps to help positioning

Whether outdoors or indoors, modeling our environment to represent it on a map takes time, a lot of time. However, this time spent, which naturally makes it possible to obtain a nice map, can also improve the positioning engine. There are two approaches that can be used in our GeoAR context.

The first — which is very CPU intensive — is the particle filter. The principle is (1) to randomly generate particles around the initial position (typically a Gaussian distribution), (2) to move all the particles in the direction of travel, (3) to give a very low probability or remove the particles that have passed through an impassable element (such as a wall), (4) re-generate particles to reach the same number (of particles) as the initial step, then start the process again at step (2).

*Example of a particle filter for localization (Source: Illinois State University website)*

The second approach, Point to Network, assumes that the user travels over a network of segments. This network of segments can either be generated automatically or manually extracted from the map. The principle is to project the estimated position on the nearest network segment and to consider this new position as the user’s position. To avoid certain issues, an improvement is to ignore (in the algorithm) the segments of the network which are not in the same direction as the displacement.

*Example of a “Point to Network” projection*

We are familiar with such an approach in our cars where it is almost always implemented in conjunction with the GPS. For pedestrians, its use is rather complex since it is difficult to model the entire environment by segments, for example: pedestrian squares or halls. However, when the user has to follow a well-defined route (navigation), then this approach becomes very efficient.

Our approach

Today scientific research on positioning and navigation systems is very active. The latest advances in this area are regularly published in the international IPIN (Indoor Positioning and Indoor Navigation) conference [disclaimer: I am part of the IPIN review committee]. There are regular competitions to evaluate and compare the behavior of algorithms in a real environment (shopping centers, factories, stores, etc.). However, the research and development of algorithms embedded in a smartphone on the one hand and augmented reality on the other hand (see above) concentrate the most advanced technological challenges, which has led us to develop an original approach, inspired by the most recent advances in the field.

For example the step detection algorithms in the scientific literature are implemented to segment the way in which the phone is held (on the ear, in a bag, arm stretched, in the pocket, etc.) to have a good quality detection. In our case, not all of these modes are necessary, they could even add instability. However, it is important that we consider “augmented reality” type movements, that is, trying not to have step detection when the user interrogates the virtual environment around them without moving. And believe me, it’s not easy to isolate this behavior when the only signals available are the accelerometer and gyroscope! To do this, we carried out tests on user samples to observe their behavior with an augmented reality application (navigate without AR, navigate with AR, query the surroundings to discover virtual information, etc.) and recorded the data from the sensors. At the same time, we have annotated the different phases of the user’s movement in order to use them as ground truth for the comparison of algorithms.

*Sample graph of our step detection algorithm*

It is thanks to this type of analysis that we have been able to improve our algorithms and gradually gain in accuracy.

However the ultimate objective is to guide a user in the real world through Augmented Reality: having high accuracy is a good thing, but exactitude can sometimes be detrimental to the quality of the user experience. That is why we had to make trade-offs within heuristics between the precision of the positioning engine and the immersivity of the virtual scene engine. For example when the user is asked to follow a proposed route, an assumption must be made to distinguish a deviation that has to be corrected through map-matching and an explicit distancing that has to “suspend” map matching. This can lead, in some cases, to give a greater weight to map-matching system, even if it means losing a little accuracy.

*Graph illustrating the convergence of PDR with point-to-network map-matching*

Another best practice for 2D positioning is position interpolation. It is used to ensure a fluid movement of the blue dot. It is all the more important in AR because a jump in the position also leads to a jump in the virtual scene and therefore ruins the user experience.

*Example of interpolation using data from our positioning engine*

Ultimately, our approach aims to combine both portability and performance, so our system works both in a web environment and in a native environment. In order to benefit from the features of a native environment we have designed our system so that it can also integrate resources and data that are natively accessible such as WiFi, Bluetooth and VIO. The data from these technologies are transferred directly to our positioning engine, where they are taken into account to improve accuracy. The engine has been designed to automatically adapt to the different types of data it receives and provide the best possible experience regardless of the usage environment.

Conclusion

The web is a complex environment where sensor access limitations constrain the development of geolocation algorithms, yet it is still possible to provide a lightweight and immersive AR experience. The filters described above (with the notable exception of the VIO) are not very resource intensive compared to an approach based on image processing, this gives a significant advantage for their web implementation. Contact us if you would like more information.