The Center for Data Innovation spoke to Katya Serova, vice president and co-founder of Habidatum, an international data analysis firm that builds tools to visualize city data in three dimensions. Serova discussed the complexity of modern cities, how we can better plan for unexpected anomalies, and why people need a new “language” to understand big data.
This interview has been edited for clarity.
Nick Wallace: In 2015, you kicked-off your speech at the Smart Cities Expo World Congress in Barcelona by saying that “cities are incomprehensible.” What is it about cities that makes them so hard to understand, and is this something new, or did cities always confound us?
Katya Serova: It’s not just now that cities are incomprehensible; they always needed deeper analysis and deeper understanding of the behaviors and activities of people. But it’s today that the complexity of cities and their dynamics have grown dramatically, and the need has arisen for other means of comprehension and of understanding cities’ development. New data and technologies have appeared, too—but we have to find the right link between them.
Traditional statistics are being replaced by new kinds of spontaneously-generated data, produced as a by-product of people’s behavior. But this new data requires proper translation for analysts, it needs to be turned into something that is comprehensible, and which can be relied upon for important decisions. Without that, it remains useless. So that’s the drama of contemporary cities that needs to be studied in more detail, and more dynamically.
There are three major changes that matter. One is the changes in the lifestyles of citizens. Especially in European and American cities, the inhabitants are more and more time- and cash-rich. They have more free time and they demand more stimulation from this time. They want to fill this time with more activity. People’s behavior and their mobility are very varied and very unpredictable, and this is above all because of the changes in time, in geography, and in people’s lifestyles.
The second and third are technology and data. Technology—which has entered all spheres of urban life—teaches citizens to be more demanding for interactivity and for quick responses from their urban environment. This is hard for city managers to comprehend and adapt to. Data is leading to the creation of completely new layers and phenomena of urban life. These are self-coordinated communities around services; for instance, the Uber community, or the Airbnb community—a sharing economy of self-sufficient services based on communities. All this is something new, which cities and urban managers are not used to and are unprepared for. That needs to be studied and it demands more flexible urban planning.
There are some common words attached to big data, like variety, volume and velocity, that can be linked to what I just said. The volume of time has increased, the velocity of life has increased, the variety that people demand has increased. This is driven by technology and data, and cities are still trying to work out how to deal with it.
Wallace: You co-founded Habidatum, which uses 3D images to visualize city data, both geographically and over time, in a single model. Can you give us a few examples of what kind of data one might want to represent in this way?
Serova: The idea for Habidatum came from these new dynamics of life, when we need to understand the city as a process and link space to time. So we created this analytical visualization engine, which helps connect space and time visually, and we started experimenting with various types of data, which can be injected into it. Actually, from a technical point of view, any data linked to a time and a place can go into the engine. So we have a comprehensive engine for transforming all time and geo-location data into a single format, which is then visualized in our platform.
But the most popular types of data that we deal with in projects, and which we visualize for our clients, are activity data—that is, mobility patterns, purchases, or social media activity. We call these types of data “spontaneous data” and it is a kind of by-product of people’s activity. People are using cellphones, and they’re not thinking about the fact they’re generating mobility patterns that can be analyzed for better planning. They are posting to social media networks publicly, and creating a very useful stream for semantic analysis and for understanding demand patterns.
We’re trying to link not only the spatial and temporal dimensions of the city, but also the behavior and semantic patterns; meaning that we’re looking not only at the activity patterns, but also at what people think about areas and periods of time where and when they are active. We read through what people post on social media and we identify which places they’re talking about and their attitudes to these places. We put the activity into the platform—which could be a geo-tagged tweet, for instance—and the semantic activity, which could be the place mentioned in the tweet. We use those two layers to look for links between people’s perceptions of the city and the activity they’re undertaking. That’s one of the most important components that we bring to cities and businesses in them. We help them understand how behaviors evolve through space and time, and also what the background is behind that, what’s driving this behavior. All that is based either on publicly-available data provided through online APIs, or through open data portals, or through some kinds of proprietary data that is operated by certain data carriers and holders—like telco operators—with whom we work together to use their data in aggregated, anonymized ways.
Wallace: What does this way of looking at data allow governments or businesses to do that they could not do before? What kinds of decisions does it help them make?
Serova: The most evident value that this kind of work brings is better understanding of the community and more responsive planning and management of the urban environment. There are two more things we add through the connection of space and time. One is seeing time as a resource, which we put under the umbrella name of “chronotope city.” Another is using time as a tool for a concept we’re trying to promote called “lean planning.”
So chronotope city—seeing time as a resource—is how we are trying to explain that time expands the city several fold; it’s economic resources expand. This is an opportunity for small businesses and for better development in the urban environment. For instance, metropolitan cities like London or New York or Moscow or Tokyo are too expensive in certain areas for small and medium sized enterprises to get in there.
We had a project in New York that focused on identifying these kinds of places, where the real estate prices are so high that small businesses are squeezed out. Here, time management and time coordination could be the cure. Of course, the one small enterprise cannot afford to be in this place. But if it collocates with some other enterprise, share the time and the rent, they can cope with the high prices there.
We started by exploring the under-used resources in the city through time, and what could be done with this time—whether some businesses could use the time, or coordinate with one another to share space over time. It’s developed into a new kind of business model. In New York, some places are left for “pop-up” activities by small and medium-sized businesses, who rent space for short periods, and then that space is time-managed, which allows these firms to operate in areas where it was previously impossible. We can support that with spatial-temporal analysis, and analysis of people’s activity, along with the prices and demand associated with that.
Another thing is seeing time as a tool for management, which we call “lean planning.” That is the understanding of time as something that gives us the opportunity to better understand and anticipate changes in demand, and therefore the changes we as urban managers need to make in the urban environment to anticipate and adapt to changes in demand.
There is a popular adage that says architecture is always late. A lot of demand is coming from architectural and civil engineering firms, who want to monitor the process of planning and changing an area, by using new types of data about the community to better identify and control for their own mistakes. Mistakes in the project concept and planning stages can be extremely expensive if you do not catch them in time and do something about them. So what architects, real estate developers, and construction firms are interested in is catching possible mistakes, or mismatches, or changes in demand at the proper point in time and intervening in the process of development so that the project does not go in the wrong direction, but adapts.
In order to implement such planning on a citywide level, you need to properly understand the patterns of behavior and study the anomalies in these trends, and then get knowledge on how to anticipate those anomalies. That’s the direction we are focusing on now; we’re trying to integrate the functions of trend and anomaly detection into our projects and into our software, and also get an understanding of how to more quickly and more properly catch anomalies, and help planners avoid planning mistakes and adapt to changes.
Wallace: You work with “digital traces”—data that is produced as a by-product of digital activity. I’ve also heard this called “data exhaust,” and you’ve already mentioned “spontaneous data”. Unlike traditional statistics, this data is a by-product, collected without any particular purpose in mind. What is different about working with “digital traces,” and how do you go about turning them into something you can analyze and use?
Serova: The spontaneous data we deal with is both a solution and a problem. From one point of view, it’s generated in large amounts and there are already cases of proper use of social media data or mobility data in smart city projects that bring value to the city. But at the same time, it still has not been introduced as a market standard. There are two major reasons for that. First, this data has a high degree of complexity. Some people refer to its volume, or the speed of update, which are both issues to consider. But we, first of all, refer to the variety and complexity of the meta-data and the angles from which you can interpret certain types of data.
You might look at a social media post as an activity point, you might look at it as a text with some semantic information inside, you might look at it as a characteristic of a certain demographic or behavioral group. And all these interpretations get multiplied many times if you get this type of data set against another type of data, like mobility data, which in turn can also be interpreted in several diverse ways.
This complexity makes the dialogue with data much more difficult for analysts. First, you can no longer deal with spreadsheets, because the data is too complex and too large for you to just look at it in a table. You need it to be visualized, and visualized properly. If you think of a city, the first type of visualization that comes to mind is a map, looking at the city space. What we’ve done is add a temporal dimension to that, which helps to get more detail from the data that’s being generated.
The common mistake people make is taking this kind of data and putting it into outdated models with traditional visualizations, which may lead to certain mistakes and does not help analysts use this data in its complete variety and scale. So if you’re cutting a time series of mobility data into temporal slices, you lose the continuity of the process and cannot understand changes in trends of anomalies that may appear. That’s an issue of proper tools and proper approaches to get some kind of understanding of this kind of data.
Another issue is there’s some resistance to new types of data when they don’t fit current methods of analysis. Decision makers need the data they act on to be financially and analytically reliable; they need to be absolutely confident that “this is the truth,” and not a mistake that will lead to enormous losses of budgets, or result in dissatisfaction on a citywide levels. The analysts they work with are used to traditional models of regular indicators that have already proven reliable. These models are becoming outdated, but there is no language that can translate these new types of data into these reliable models.
So even if you just put Twitter posts on the map and show them to analysts doing financial projects, this will not be enough for them to take that data and utilize it, because there’s no easy way to feed it into the models they are using. You need to translate it to the language of more traditionally reliable data. We are trying to promote ourselves more as analytical than technological: technology and data services are not enough, you need to find the proper language of data translation for the area you operate in.
A large part of our team comes from architectural and urban planning backgrounds. So the field we’re focusing on, which is smart cities, is familiar to us and there is a way for us to actually transcribe the complexity of the data we get into something that can be easily used, easily checked, and easily relied upon by the analysts working in this market. Most likely, if such approaches and companies get more integrated into smart cities and other data-dependent and data-demanding fields, this may at some point in time make spontaneous data and big data a new market standard. It isn’t there yet, but we’re working in that direction.
Wallace: In working with all this data, is there anything you’ve learned about cities that has especially shocked or surprised you?
Serova: We learned a lot from the projects we did, and some of the findings we came across were striking for us as urban analysts used to some traditional rules and understanding of the city. We are all used to seeing the city as a collection of functional zones: places where there is some dominant kind of activity, where those who are involved in that are the “owners” of that place. You see residential areas, business areas, retail and entertainment areas, and can make assumptions about who the people there are, and what they’re doing there. We’re used to that as something stable and static.
But what we’re seeing in modern cities now is that they are too vibrant to retain these set categories of understanding. Places are changing throughout the day. You cannot now understand well who “owns” the place, who generates the demand there, who is looking for a friendly environment there.
In Barcelona, we looked at tourist flows into the city, and the crowds they create that annoy locals. We split the activity into two groups: locals and tourists. Then we looked at the distribution of this activity through space in time. Though we found that these communities do cross, they only meet in very specific locations for limited periods of time: during the evening on La Rambla, and that alone leads to all the dissatisfaction about crowds of tourists in Barcelona. These communities get into enormous conflict just by meeting for a short period on a certain street. This is a very interesting finding, which gives us an understanding of a city in space and time as a crucial part of urban planning and urban management.
I already mentioned the price of mistakes. If you’re just living in the city, or even if you’re an urban analyst responsible for some small, neat issue, you might underestimate how much people running construction projects need to keep track of and plan for to avoid disaster. As we started to work with these kinds of projects, we started to get to know how much dissatisfaction might be caused even by a very small mistake in the project concept. An improper location for a connection between two buildings may trigger multiple effects that lead to enormous dissatisfaction and losses of time and money. This is exacerbated by the growing complexity of cities, increasing the need for data analysis. It isn’t just a nice thing to have, it’s the only way you can avoid extreme losses in urban projects.
It wasn’t long ago that people were only just beginning to talk about big data with regard to things like finance and healthcare, and they were just starting to discuss smart cities too. The use of new types of data in the city was taken as a concept that needed to be tested. Now, some say smart cities are already becoming an outdated buzzword. Big data is taken for granted, and people are talking about spontaneous data and lean governance. The market no longer perceives data just as something to experiment with. The companies we work with come to us with specific requests to help minimize their risks of mistakes and better understand what’s expected of them. They already believe these new types of data can be transformed into something they can rely on. The speed with which they’ve come to accept that is very surprising.