IMIS conducts basic and applied research on data management problems rising in Web applications and distributed computing in general. Focus is on four main areas: (a) Semantic Web, Linked Data, Data Web, (b) Open Data, (c) Data Evolution, Preservation, and Annotation, and (d) Privacy Preservation. Strong interest also exists in modeling and data management issues, heterogeneous data source integration problems, web services, and semistructured data storage and querying.
Semantic & Data Web
The objective of the Data Web is to extend the current Web infrastructure with a global data space connecting data from diverse domains. There is a vast and rapidly increasing quantity of scientific, corporate, government and crowd-sourced data published openly on the emerging Data Web. Linked Open Data (LOD) is one of the key technologies that essentially transforms the Web from a document publishing-only environment (Web of documents) into a vibrant information ecosystem. Linked Open Data (LOD) refers to the recent W3C efforts for a unifying, machine-readable data representation infrastructure that makes it possible to semantically access and interlink heterogeneous resources at data level — independently of the structure and the semantics of the data, who created them, or where it comes from. The core idea of LOD is to use HTTP URIs not only to identify Web documents, but also to identify arbitrary real-world entities or things and most importantly create meaningful links between them.
Open Data
The public sector produces daily data on each type of its activity and at all levels of its operation. The free access to such data is needed to enhance transparency and accountability, as well as improving services to citizens. More importantly, open data is a powerful development tool. The potential benefits to the EU are estimated at 200 billion euros per year, arising from savings and more efficient operation, as the development of new services and value added products. This new form of economic activity based on a combination of open public data and other large collections of data (big data) is called “data economy”.
IMIS is the leading organization in Greece in the field of theoretical and applied research on the management of open data, developing fundamental technologies for the functioning of the economy of data. The areas in which IMIS has strong activity and significant results is to create lists of data and general data infrastructures, the development of methodologies for the disposal and documentation of the data life cycle, managing and querying large collections of data, the management of linked data, the development of web services for query and analysis, as well as the development of added value.
IMIS has developed and operated since 2010 geodata.gov.gr, a unique service for the dissemination of open data in Greece, which has thus far led to gains of more than 20 million euros for the Greek economy. Moreover, it is the only Greek representation in the EU and the international community (Open Government Partnership) for national agencies and open data initiatives. Also, IMIS helps with volunteering for shaping and strengthening a community of citizens and young researchers, who will use open data and consolidate the knowledge economy.
Data Evolution, Preservation, Annotation
A key problem in reasoning about data evolution stems from the fact that information systems usually treat changes as distinct events. In reality, a number of changes that occur at disparate and seemingly unrelated pieces of data constitute conceptually a single complex change event. Such high-level changes are more meaningful than the individual changes they encompass, and offer a richer interpretation of the evolution process. In our approach, changes are discrete objects that have complex structure and retain their semantic and temporal characteristics, rather than being isolated low-level transformations on data. For example, the high-level change operation "move" is a complex object composed by the atomic change objects "remove" and "add". Information systems that treat changes as first class citizens can provide a better understanding of the evolution and the provenance of data, and can support synchronization between databanks.
Another area of interest concerns the management of schema evolution in data-centric ecosystems. Such systems, comprising a large number of applications (e.g., web forms, stored procedures, workflows, etc.) and datastores, are highly vulnerable to schema evolution processes. In our research, we have identified three fundamental needs for the developer, administrator, and designer of a data-centric ecosystem. The developers would enjoy a facility that predicts and evaluates the effect of a schema evolution event and highlights places where syntactical or semantic inconsistencies must be maintained. Then, the administrator would need a means to control the flooding of the event’s impact to the affected constructs. Finally, the designer can highly benefit from a set of objective metrics that report the vulnerabilities of the system to potential evolution events.
Hecataeus is a tool we implemented that represents a data-centric ecosystem (e.g., database schema along with its dependent views and queries) as a uniform directed graph. The tool enables the user to create evolution scenarios and examine their impact over the graph as well as to define rules so that both syntactical and semantic correctness of the affected constructs is retained. It also supports an extensible suite of design metrics, which can be used for detecting crucial and vulnerable parts of the system regarding potential evolution events.
Privacy Preservation
More and more data related to sensitive areas of human activity are published for statistical and commercial reasons, but also for transparency reasons, especially when public bodies are involved. The publication can be fully open, for example, a website posting, or may be restricted, e.g., within professional associations. In all cases, this gives rise to several issues concerning the preservation of privacy for people described in the data.
Our aim is to develop techniques to ensure privacy for published data, providing anonymity guarantees. We emphasize that simply removing the attributes that directly link the identity of a person with a set of data, for example, the VAT, does not ensure that this connection will remain hidden. The identity of the person can be discovered with the help of external catalogs (such as voter registration lists, telephone directories) that identify a person based on data, e.g., age and place of residence, which in general do not characterize a single individual.
Our research interests focus on sparse multidimensional data, in multiple publications from various sources, spatiotemporal data, as well as life sciences data. IMIS has developed a series for prototypes for anonymizing high dimensional data, like set-valued data, trajectory data, and tree-structured data. Moreover, IMIS has developed tools specialized for anonymizing tax-related data. IMIS has worked with the Greek General Secretarial of Information Systems to produce an anonymization solution that would allow publishing tax data with high-level information content, without compromising the privacy of tax payers.
Emerging Research Directions
IMIS has an active research and development record in this area, involving several ongoing research projects. This line of work will be continued towards the following focal points:
- Change management, evolution and preservation of linked information on the Data Web
- Integration and fusion of heterogeneous information in linked dataspaces
- Publishing and querying geospatial, scientific and multidimensional data on the Data Web
- Privacy and anonymization techniques for publishing linked data.