Intelligent Systems and Multimedia

Intelligent Systems and Multimedia

The research agenda in Intelligent Systems and Multimedia at ILSP is on the development solutions and prototypes that successfully and intelligently integrate devices, data, information and media content, while placing people in the loop, in demanding everyday application settings. Such systems and solutions can make extensive use of computational intelligence, smart human - machine interfaces, algorithms for data processing and mining, media and content processing capabilities, as well as knowledge extraction and information management techniques. The possibilities offered by the integration of such computing features within the new breed of mobile, context-adaptive, social and cloud-based computing presents challenging new research opportunities that are still largely unexploited.

The convergence of media delivery channels, such as Web, TV and mobile networking, shape the current multimedia area and pose new challenges to the research community. Today’s’ devices are equipped with advanced functionalities and capabilities, providing users unlimited access to a wealth of shared but often heterogeneous content. Such data and media-rich content encompasses text, measurements, 3D graphics, animation, sound, speech, image, video and point clouds. Data and multimedia collections need to be organised and analysed in a structured manner, in order to offer added value to enterprises, governmental services, community-based services and personal archives. Special concern is given to representational issues for information modelling as well as in multimedia analyses and indexing processes.

New and emergent computing and media technologies increasingly enable users to manipulate or interact with content in ways not possible in the past. The combination of diverse computing devices, inclusive of PCs, mobile smartphone and tablet devices with heterogeneous network transmission channels, allows the individual to create, edit, transmit, share, aggregate, personalize and interact with multimedia content in increasingly flexible ways. Our research is therefore aiming to support data, information and media modelling, processing and analysis, up to the level of offering meaningful information and services mediation to the end user. Our research is primarily focusing on the following areas:

  • Document image analysis and handwriting recognition
  • Robotics and machine perception
  • Multimodal semantic multimedia analysis
  • Music Information Retrieval
  • 3D content-based indexing and retrieval
  • Virtual environments, visualization and creative technologies
  • Context-adaptive computing
  • Intelligent data processing and analytics

Document image analysis (DIA) focuses on the determination of the document structure, the identification of entities and their relations that make up the document image and issues involved in the recognition of written language in images (Nagy, 2000). The proposed algorithms, that address these tasks, come mainly from the fields of image processing, computer vision, machine learning and pattern recognition. Some of these algorithms are very effective in processing machine-printed document images and on-line handwritten notes. Therefore they have been incorporated in the workflows of the well-known OCR systems and ICR modules integrated in hand-held devices respectively. On the contrary, no such efficient systems have been developed for handling off-line handwritten documents. The main reason is that the format of a handwritten manuscript and the writing style depend solely on the author's choices. The key challenges in processing such document images are text localization (i.e. detection of text and non-text segments e.g. smearing, shadows, drawings, etc.), text segmentation (i.e. segmentation into words, text lines, paragraphs, columns, etc.) and text recognition (character/word recognition).

Robotics and machine perception is now supported by capable CPUs/GPUs and is becoming increasingly capable of equipping robotic devices with advanced perceptual features (Nalpantidis et al., 2008). Significant challenges remain both in advanced perception, such as 3D vision and understanding and action recognition, as well as in linking perception with higher level cognitive features. Given the fact that the robots should rely on their own perceptual symbolism, so as to be able to perform both action and planning, an integration of perceptual components can lead to holistic perceptual experiences. Semantically enriched representations can play this role and provide an innovative approach to associate higher level cognitive features on top of lower level feature-based imaging representations. Increasingly, robotic implementations are expected to perform under real application settings, rather than laboratory environments.

Multimodal semantic multimedia analysis techniques exploit information from multiple content modalities with a prior knowledge of the domain to overcome limitations and drawbacks of traditional unimodal approaches. In general, multimodal techniques can be classified into those that attempt to exploit jointly low-level features from different modalities (Lin and Hauptman, 2002), (Lie and Su, 2005), and those that employ each modality (visual, audio, text) independently and subsequently, introduce a fusion technique (Laudy and Ganascia, 2008), (Wahlster, 2002) that takes into account domain knowledge typically encoded in an appropriate ontology.

Content-based indexing and retrieval is becoming increasingly important in a range of fields, such as retrieval of cultural 3D objects (Koutsoudis et al., 2012), engineering design retrieval and synthesis, (Chakrabarti et al., 2011), media indexing and retrieval (Papageorgiou et al., 2005), music classification and indexing (Gkiokas et al., 2012) etc. Content-based information indexing and retrieval from large web repositories and databases has been a key area of content management and delivery research.

Music Information Retrieval (MIR) aims at the extraction of high level abstraction features that describe musical signals efficiently both in their formal mathematical representation, and the representation under the aspect of human music perception. These features are highly representative of a musical piece and efficient for semantic annotation, indexing and retrieval of music documents. Music audio analysis involves advanced Digital Signal Processing techniques combined with Machine Learning approaches in order to extract low level features and generation of higher level features/descriptors for addressing specific tasks such as audio tempo estimation (Eronen et al. 2010, Gkiokas et al. 2010, Seyerlehner et al. 2007), audio beat tracking (Alonso et al. 2004), audio chord estimation, audio melody, extraction, audio key detection, audio similarity and retrieval, etc.

3D content-based retrieval (3DCBR) mechanisms can provide efficient management of 3D models in terms of indexing, searching and retrieving (Lew et al., 2006)(Tangeldar and Veltkamp, 2004). Although keyword-based search engines have made progress over the last decade (Carpineto et al., 2009)(Agualimpia et al., 2010), in situations where morphological features need to be described, keywords are not efficient for formulating such complex queries (Gorisse et al., 2007). 3DCBR technology allows the substitution of keywords by the actual 3D data. Such an approach introduces an intuitive approach for depicting the user’s criteria and constraints in mind. Relying only on morphological properties, 3DCBR overcomes the multi-language barrier introduced by metadata and allows the discovery of supra-regional typology coherencies (Hörr and Brunnett, 2008). Sketch-based 3D search and retrieval can benefit diverse application domains, such as engineering design retrieval (Li and Johan, 2013), virtual tours applications in tourism (Koutsoudis et al., 2008), while 3D scene analysis can be employed for similar retrieval tasks (Koutsoudis and Pavlidis 2011).

Virtual Environments and Creative Technologies. The immediacy and real-life feeling of virtual environments offers unique advantages in various application domains, such as in learning (Shin et al., 2010), gaming (Kotsia et al., 2013) and tourism (Huang et al., 2013)(Guttentag 2010). The use of advanced 3D computer graphics and 3D-supporting hardware devices enabled significant improvements in the everyday applicability of virtual environments and creative technologies. Recent research work has been focused on the efficient real-time visualization and interaction of data by using volume renderings, isosurfaces, 3D contours, slices, scatter and vector plots.

Personalised and Context-adaptive computing. Mobile applications are already headways to everyday practice and the global mobile applications market was forecasted to reach $25bn by the end of 2015 (Holtzer and Ondrus, 2011). The key advantages of mobile applications lie with the combination of mobility with 24/7 network multi-connectivity in order to deliver contextualized application services. The notion of context has been linked with computing for long, largely associated with computational linguistics. More attention is now paid to the role of context in adaptive computing. With the prevalence of service-oriented computing, adaptation capacity has become synonymous to adapting offered services and consequently a context-aware system is expected to tailor service delivery to the apparent usage context. Although context-dependent delivery can be relevant to non-mobile applications too, the flexibility offered by the device and user mobility places mobile applications at the very heart of context-aware computing. Furthermore, as mobile devices and tools are being increasingly employed in collaborative settings, the prospect of true mobile collaboration is raising expectations for deeper business penetration of mobile applications. Such expectations are supported by the emerging characteristics of mobile applications, including active data management, enhanced web-based interactivity, ready access to knowledge and information, and usage of advanced networks. Such developments have wide reaching application potential within the internet of things applications ecosystem (Perera et al., 2013).

In Intelligent Data Processing and Analytics there is a profound need for solutions and systems which feature flexibility, adaptation capacity, as well as large scale data and knowledge processing capability. The use of different statistical learning and computational intelligent techniques, algorithms for data processing and mining, as well as in decision analysis and support tools, often aided by semantic information modelling and reasoning are invaluable aids in many such domains. Data science research has been boosted in recent years, both by the increasing availability of potent hardware capacity computing equipment, as well as by algorithmic advances in machine learning and computational intelligence approaches. Nonetheless, the increasing complexity of data silos and streams generated from a multitude of heterogeneous sources, poses new challenges for research. From an end user perspective, a key requirement is to combine the efficiency of algorithmic approaches with practical implementation ease for specific application domains. Four main streams of research work are emerging as key areas in intelligent data processing, namely search and analysis, semantic processing, learning for classification, diagnostics and prognostics, as well as visualisation and representation (Bierig et al., 2013). Intelligent data processing needs emerge in a wide range of application areas, from manufacturing, logistics and retail, to healthcare, tourism, transport, government, crime detection and law enforcement to finance, telecoms, energy and utilities, but also in education and e-science too. Whereas the employed techniques are relevant to the previously mentioned research areas, e.g. image analysis, information retrieval, signal/music processing and context-adaptive computing, our research has also targeted application areas in tourism and physical asset management. 

Ινστιτούτο Επεξεργασίας του Λόγου