Data intensive systems flourish in the last decades with an ever increasing rate of data production. A characteristic such case is the Web graph. Given the dynamism of the Web we aim to study the Web graph in terms of learning models and monitoring its evolution. The main problems we study are:
The results of the proposed research will be a framework of approaches and algorithms that will enable effective and efficient:
- Query based top-k list predictions (future and historical ones)
- Prediction based crawling: based on our ranking predictive modeling, crawling resources can be optimized maintaining at the same time a satisfactory top-k quality.
All the above are profoundly beneficial for resource management in the context of large scale Web search, and the added value of the above will be the potential use of these techniques by the Web search industry.