The Center for Data Innovation spoke with Botond Bognar, Co-Founder and Chief Product Officer of REScan, a California-based company that uses head-mounted scanning technology to make spaces remotely viewable, analyzable, editable, and ready for spatial computing applications. Bognar discussed the company’s human-centric approach to spatial mapping and the potential use cases as mobile and wearable technologies advance.
This interview has been edited for clarity.
Ellysse Dick: You have described REScan’s service as “mapping from the human point of view.” Can you tell me a little more about what that means, and how it works?
Botond Bognar: Where we are really different compared to other companies is the eye-level recording. This is because we believe that technology should be bent around human needs and humans themselves, not the other way around. I believe that we have to capture reality the way that we want to revisit reality as humans. That’s one thing. The second thing is, because of that approach, and because it’s quasi-humans seeing it, then humans can annotate it. We created an annotation system, which then humans can annotate with very high precision, so we can feed machine learning applications with really high-grade, high-quality data.
In practice, “mapping from the human point of view” means we put a helmet on a human who is walking around, and this helmet has all the gadgets of an autonomous car, such as lots of cameras, LiDAR, stereo vision, and inertial measurement unit (IMU). We do this because we believe that we need to record the space, the environment from the human point of view, because when someone uses our viewer, it’s from the human point of view. Therefore, algorithms don’t have to translate what it would look like if a human would walk around. And this also enables us to apply very good quality data for machine learning applications.
As opposed to, let’s say, a pair of glasses that would look only forward, our helmet looks everywhere, all the time. It’s as if you have a pair of eyes on the side, on the back, and on the top of your skull. So basically, the wearer is looking everywhere at the same time, and therefore we don’t have to go everywhere in a space—it’s enough to walk through one prominent part of it. We also have a real-time feedback mechanism, a pair of glasses on the scanning person, which gives them the impression or understanding of what has been seen by the LiDAR. That way the person has a good understanding of what was captured and what is still missing.
Dick: What can analytics from this kind of 3D mapping tell us that a 360-degree video recording could not?
Bognar: A 360-camera—at least as of today—is made of two big fisheyes. They compile and stitch together these two images, which are highly distorted, to give you the 360-image. We broke that up into smaller cameras. Therefore, our quality is much better, or more precise, and we have LiDAR, and hybrid IMU, as well as stereo vision. So we are not only recording RGB data, but a lot more information, including of course Wi-Fi and GPS. We record spatial qualities through the LiDAR, and we also record other data, which is the non-visual part, like Wi-Fi or GPS or GSM. This set of data gives us a richer foundation to create not only extra meshes, which is for point clouds (the usual suspects), but it also enables us to use machine learning-based applications.
When somebody walks around and captures from the human-eye level, you can reverse engineer the locations to determine, let’s say, where cameras could be or should be. For example, in an airport, we can do a CCTV audit, and say “your CCTV is not covering this and this area, therefore you will need to put more there.” Or, we can also use this for indoor 5G antenna placement, when 5G antennas also need to have line of sight of the future devices which are using it. But also, we can do 3D segmentation, and with that we can have a very accurate data source, or foundation for training algorithms.
Dick: REScan has been a helpful tool for real estate companies to map out their properties. What other industries could benefit from this kind of 3D mapping and analytics?
Bognar: Basically, all industries are operating in some sort of real estate. Because we are capturing the building, or the building environment, as it is, we enable the industries that are using these spaces. For example, an oil refinery uses tens of thousands of IoT sensors in the system, on its campus. Now, we can put these sensors in a real place in a 3D model, so when people read the data they can put it in a 3D context. On the other hand, and this very much ties in to and is similar in other industries, this IoT sensor now can be read with a phone looking at say, a pipe. It’s an augmented reality (AR) layover, which is actually very similar to when there is a digital advertisement in a shopping center.
These are examples of persistent, location-based AR content. Our system enables these applications on top of the digital copy. We can even take a big jump to cleaning robots in a shopping center. These also need to have prior knowledge before rolling out and cleaning the floor: to not fall down stairs, or go into unwanted territory. This digital understanding enables machine and IoT applications on top of that digital layer, which then wouldn’t be possible if a human, prior to that application, can’t walk around in 3D or on a phone and say “the robot can go here” or “the robot cannot go there.” So we are at the same time creating human understanding and machine understanding with the same capture.
Dick: What are some of the challenges to indoor mapping from a human point of view that might not exist in other tools like vehicle-based, 360-degree street mapping?
Bognar: We are primarily for those locations which are indoor, or pedestrian. We don’t go to roads—other companies are doing that pretty well. But when a human is walking, their head is bobbing, or they might turn quickly because somebody shows up. So the technical challenges are the specific way we move with our head – even a steady handheld scanner is steadier than your head, in practice, because you’re always moving your head. So there are some technical challenges.
It was also a challenge to minimize the tech into a hard hat. Maybe this is a time to highlight that the reason we use a hard hat is because this is socially acceptable—way more acceptable than a huge camera rig on a backpack. You know, you show up with that in a shopping center and people ask “Hey, are you recording me? What’s going on?” So we also have a vest which communicates that we are scanning right now. When somebody has the vest and the helmet, it is perceived as more like somebody from maintenance, so it blends in to the fabric of the space.
This hardware was one of the biggest challenges. Actually what you see now is generation six: We had to have five generations of prototypes, the hardware and the software. The hardware challenge and the software challenge was a lot about the moving of the head: How do we still maintain a very precise trajectory? How do we construct the precise 3D data, although there is a lot of motion?
There were two challenges in developing these prototypes. One was the hardware—for all of the five hardware prototypes, I could tell you where we failed, and we had to throw out that solution and try a new one, and a new one, and a new one. So each iteration was a major upgrade. It was trial and error, because it hasn’t been done before—that’s the curse when you’re working on the edges of technology. Another challenge was the software. Of course, the software was going in parallel with the hardware development, and there we don’t have to throw out too much, so it’s a lot easier to adapt to the next update. But the hardest challenge of the software was creating the steady trajectory of compiled data coming in. Because we work in batch mode—that is, we don’t create it in real time, but on our servers—we had a little bit more time to tinker with it.
Dick: You have said that your goal is to “map the world.” What are some of the barriers to achieving this, and what are some factors that would help you reach your goal?
Bognar: Well first let’s clarify “map the world,” because the world is quite large! I would say we want to capture wherever we are allowed to go, and then where are we requested to do a scan. Our purpose is not to go into people’s homes or these types of areas. We are doing universities, large-scale facilities, public areas. Wherever a person can walk to, and has been asked to go there, that’s where we are going. That’s the legal part, so we are always in compliance with the owner of the place.
Having said that, once we do a master scan, which is this large-scale scanning and capturing at speed and at scale, now someone can come with a phone that might be enabled with LiDAR, and then they can put their own version of reality onto it. Now here comes the challenge: quite interestingly, the hardware would be easier to make. But server-wise, distribution of data is going to be a huge challenge, because right now our server capacity is tiny compared to what we’d need for the rest of the world. So that’s one of the challenges.
Going forward, people are going to want higher and higher resolution. If I were to show you what Google Earth looked like on the day it came out, you’d say “oh, this is a smudge”—but I tell you this was magic and everybody remembers the first time they looked up their address. But now people expect day-by-day growth. So that’s why we like to have this combination of the two: one is an industry-scale scan—this is what we provide with the helmet—and we believe that the real combination, or the other end of the coin, will be the person standing with the phone, or the tablet, or eventually even glasses, and then stitch on a high-res version of reality for that person. Hence the company name “RE”-Scan.
Your glasses, or your phone, will spatially understand about 15 feet, maybe 20. Beyond that it doesn’t see much. The phone is almost like a little flashlight—you need to have some kind of ambient light as well so that these devices can relocate themselves in space, and they have some prior knowledge about what is there. So the phone or the glasses, when they are visiting a new location, or wherever you are, they are confident what’s around the corner. When there’s going to be AR content—like a monster jumping out from around the corner—it will need to know where and what is reality. There are two important components: the “where” (the geometric) and the “what” (the semantics). So it also needs to understand, for example, what is a table or what is a door.
This is especially useful when augmented reality is going to be coupled with entertainment, so you can literally have 3D characters walking around. A 3D character needs to understand what they can sit on, what they can walk through, and so on.