Geospatial Data Manageme

Geospatial Data Manageme

Geospatial -and more generally, spatiotemporal- data are pervasive in many and diverse human activities, and thus have naturally been an active topic of research and application development for decades. In the recent years, the advances and widespread use of position tracking devices (especially GPS) and the increasing availability of crowd-sourced geospatial data on the Web has spawned new challenges and opportunities in this area. GPS positioning devices are becoming a commodity sensor platform with the emergence and popularity of smartphones and ubiquitous networking. Collecting and analyzing large amounts of such data enables new and advanced services for traffic intelligent, fleet management, fleet analytics, and more generally transport planning and sustainable mobility. In parallel, GeoWeb 2.0 -the geographic embodiment of the Web 2.0 moniker- is transforming the way geographic information is being published, discovered and (re-)used. In addition to traditional, professionally produced geospatial data, the public has also been encouraged to make its content available online to everyone as User-Generated Content (UGC). In this new landscape, the challenges are to develop advanced and intelligent techniques for collecting, storing, analyzing, processing, reconciling, and making use of large amounts of semantically rich user-generated geospatial information available on the Web. In the following, we describe in more detail our activities and future directions in the above areas.

Routing services in road networks

Our work focuses on implementing and augmenting state-of-the-art Shortest Path (SP) algorithms to adapt to new test cases, such as one-to-all queries (i.e., finding the distance of every graph node from a single node) and range / isochrone queries (i.e., finding all nodes within a specified range). Work has also been done on crowdsourcing SP pre-processing on Web clients, using JavaScript. Further work deals with calculating isochrones for road networks. For this research, large amounts of live GPS tracking data from vehicles moving in Athens and other European cities are collected, stored, analyzed and used for traffic monitoring and analytics. Among the highlights of the outcomes of this work is the best poster award received at the SIGSPATIAL 2012 conference. Moreover, a substantial code base has been built that implements most state-of-the-art SP algorithms along with the augmented versions.

Map generation

Road networks, and more generally transportation networks, represent the principal dataset for a large range of applications, including GIS, transportation systems, location-based services and Web mapping. Our work addresses the challenges of evolving map datasets specifically by working towards: (a) automatic map and attribute generation and (b) algorithms for adequately utilizing such evolving datasets. So far, our work on automatic generation of road network maps has focused on static datasets, i.e., tracking data collected over long periods of time. We have also been working on developing a benchmarking tool set over existing approaches using different datasets with diverse characteristics (sampling rate, speed profiles, type of vehicles, etc.).

Crowdsourcing geospatial data processing

Text and multimedia content shared by users in various social networks and media-sharing sites (e.g. Flickr) is more and more tagged with geo-location metadata. Depending on the nature of the application, harvesting these user generated content datasets can be even more appealing than using official datasets. However, the main concern is that the respective information units they hold have limited credibility, as they are usually not moderated. This can be accounted for with clustering and averaging techniques, so that the overall opinion of the many is expressed (a concept known as the “wisdom of the crowds”). The main focus has been on research and development of an algorithm and a complete web application that allows users to collaboratively search the location and the spatial extent of objects on a map. The algorithm runs on crowd-sourced datasets and leverages the connected clients as computation units to speed up the overall process by scheduling tasks in parallel.

Spatiotextual search

Our work here focuses on semantically enriching k-Nearest Neighbor search by taking user experience into account. We introduce the concept of a Link-Of-Interest (LOI) between two Points Of Interest (POIs) to express respective relevance, e.g. for finding related nearest POIs to a location. Relevance is inferred by extracting pairs of POIs that are frequently mentioned together in the same context by users. We extract this information by analyzing travel blogs and we represent it as a graph. The challenge is to how efficiently combine relevance captured by a graph and location captured by a spatial index. A software tool has been developed comprising: (a) a spatial index (regular grid) and a semantic index (relevance graph), and (b) two algorithms for the k-RNN query processing, namely GR-Sync and GR-Link, operating on top of the implemented spatial and semantic indexes.

Mining spatial relations

Our work in this topic focuses on the extraction, analysis and representation of spatial relationships existing in user-generated content. In particular, a quantitative approach for the representation of qualitative spatial relations extracted from user-generated text is formulated on the basis of training probabilistic models. We have developed a method, which returns estimates of uncertain object locations described by human reporters in relation to other known object locations. Distance and orientation features combined with a Greedy Expectation Maximization (EM) algorithm are used to train Gaussian Mixture Models (GMMs) which represent the extracted spatial relationships in a probabilistic framework. We have implemented algorithms for processing textual descriptions of POIs in order to generate stochastic maps that illustrate probable positions of uncertain object locations.

Emerging Research Directions

Our research agenda in this area is to further pursue and extend the research activities described above, converging towards what we refer to as “dynamic, interactive and crowdsourced urban profiling and sustainable mobility”. More specifically, our goal is to pursue the following main research areas and application domains:

  • Harvesting of crowdsourced tracking data from vehicles (e.g. GPS, onboard/OBD-II sensors) and citizens (e.g. mobile phones, social network activities) to allow for real-time traffic monitoring, and hence real-time routing taking into consideration current conditions. This can further be exploited in collective route planning to optimize resource allocation in fleet management applications.
  • Aggregation and mining of historic traffic data to derive profiles of the various segments of the transportation network, which can be used to support sustainable commute (e.g. multimodal transportation, electric cars, bike-sharing) and the more effective, efficient and long-term design of transportation services and infrastructure. This can be further extended and complemented by mining trajectories of moving vehicles and citizens to identify, model, classify and predict movement behaviors and patterns, and identify evolving ‘hotspots’ (e.g. fashionable areas) in urban environments.
  • Mining of crowdsourced data from social networks to bring the citizens (closer) in the loop. Text mining techniques, such as topic detection and sentiment analysis, will be developed to identify events and evolving hotspots (e.g. festivals, sport events, accidents, disasters) and to understand and characterize the way various parts of the city are used by the people that live or conduct their activities in them.
  • Mining of events and hotspots for social and event-based route/trip planning, and furthermore for integrated personalization and recommendation services for urban mobility and activities.
  • Automatic generation and maintenance of “area profiles” on multiple dimensions (e.g. health and social care, social, environmental and economic externalities) to support the online monitoring of indicators about the quality of life and the economic development across different areas of a city. The area profiles will have a strong temporal aspect, allowing to make projections about the evolution of an area in the future. A range of applications will make use of different views of an area profile, for example, aiding citizens to select an area, in which to rent or acquire real estate property, or to establish new businesses