Semantic-Driven K-Walker Search in Unstructured Peer-to-Peer Networks

Xiaoqi Cao and Matthias Klusch
German Research Center for Artificial Intelligence
Stuhlsatzenhausweg 3, Campus D3 2
66123 Saarbrücken, Germany,

In unstructured peer-to-peer (P2P) networks, achieving good precision and recall of data search while at the same time maintaining reasonable communication overhead is a challenging problem, in particular when it comes to rare data items. To this end, we propose S2P2P, a novel approach to semantic-driven query routing and item information propagation in such networks. In S2P2P, each peer maintains its observation on the semantics of received queries (demands) and data information (supplies), as well as a local view on the network topology. Each peer, when forwarding a query, disseminates its known data information to a selected set of remote peers by taking advantage of query piggybacked data. That is, each peer locally observes the query and item semantics, aka local view on the semantic overlay of the P2P network, during the k-walker search of each query it receives or issues. For routing a query, each peer, instead of merely introducing an immediate neighbor, suggests a query routing path consisting of a sequence of peers with expertise on topics which are semantically equivalent or sufficiently similar to the query topic.
This is achieved by a path suggestion heuristics that iteratively applies Dijkstra's algorithm in a greedy manner. Each iteration manages to detect one more expert peer and augments the current path suggestion with the shortest path from its tail to the detected expert peer. That is, inspired by the literature on gossiping in P2P networks, the optimal expertise-based routing path per query is collaboratively determined by peers as the shortest path with the maximal number of peers which are actually known to have semantic expertises for the considered query topics. In particular, the collaborative path adjustments by peers on this path are done in order to maximize the gain of the total answer for the query. Our comparative experimental evaluation revealed that S2P2P may outperform representative approaches to semantic flooding while being at least as robust against changes in the network.

There are many approaches known which allow users to discover relevant items in peer-to-peer (P2P) networks. We focus on directory-less item discovery in unstructured P2P networks without any given overlay structure. In this case, each peer initially knows only about items provided by itself or its direct neighbor peers.
For example, social networks can be considered as unstructured P2P networks. Prominent examples of item location or query routing schemes in such networks are query flooding and k-random walks [11] with replication and caching strategies, as well as informed probabilistic adaptive search. This type of item search is effective for finding popular but not rare items and provides only probabilistic search guarantees, i.e., incomplete recall. The performance of unstructured P2P networks in terms of search precision/recall and network traffic highly depends on the chosen strategies for item information dissemination and query routing. Many approaches to semantic P2P search rely on (restricted) flooding [7], [10], [12] or selectivity enforced gossiping mechanisms [4], [8], [13], [16] in order to propagate query or item information to remote peers. In general, these approaches offer a fairly high recall but suffer from large network traffic cost. Moreover, a periodically fashioned gossiping of item information dissemination comes at the risk of propagating duplicated messages for identical items and does not pay off for unpopular but potentially relevant items in social networks. Other semantic-driven P2P search approaches such as [2], [15], [17], [6], [9] employ intelligent query routing strategies based on machine learning, network topology and query analysis. Though, in these cases, the query routing focuses on one peer which is determined to be most relevant for a given query. Those approaches do not consider sets of relevant expert peers which are collaboratively determined by peers in the network based on their observed semantic network overlay.

To this end, we propose S2P2P, a k-walker-based semantic search scheme for item information dissemination and query routing in unstructured P2P networks. According to this admittedly simple but surprisingly very efficient strategy, each peer in S2P2P is able to suggest a path for query routing which contains as many known and relevant expert peers as possible instead of only one direct neighbor or remote peer. The S2P2P scheme is agnostic to the kind of semantic description of data items as well as to the kind of semantic similarity measurements which are employed by peers to determine the semantic relevance of items. In the following, we briefly describe the S2P2P search scheme and summarize some of the results of our experimental evaluation of its performance.

Each S2P2P peer in an unstructured P2P network performs a path suggestion-based query routing. Besides, semantic descriptions of items which are known to it are selectively propagated as piggybacked data of a query message during the routing process. That way, other peers along the query route are facilitated to build and improve local observation records of the semantic overlay of the network. Whenever a peer receives a query during the query routing process, it adds semantic descriptions of items it knows about to the query answer set, if those items are determined to be of sufficiently high semantic relevance to the query topic.

On forwarding a query, each peer follows to some extent the classic k-walker protocol but driven by local knowledge on the semantic overlay rather than by random choice. In fact, a peer suggests not one direct neighbor peer at random to which the query should be forwarded to but a path of peers with an expertise that is relevant to the query topic.

In brief, a routing path suggestion for the query walker is computed, which maximises the total semantic expertise gain with respect to the query topic, while considering TTL (time to live; number of hops of a walker) restriction. This suggested path contains as many relevant expert peers as possible. The path suggestion is computed by a greedy path augmentation that iteratively applies Dijkstra's algorithm in order to find the shortest paths to expert peers which can be added to the currently suggested routing path.

If the submitted query contains a non-empty path suggestion in its piggybacked data, the peer adjusts it by comparing the total expertise gain of this path with a new path computed in accordance with its own local view on the network overlay. The new suggested routing path then includes the originally contained relevant expert peers together with those of the peer currently in charge of the query routing. Additionally, a peer checks the currently unsatisfied demands of other peers on this suggested path. That is, it identifies which other peers have issued queries in the past which did not return with satisfying results. If the peer currently checking this has items that would satisfy such a demand, the corresponding item descriptions are disseminated together with the currently processed query walker as piggybacked data. This simple semantic-driven k-walker-based search scheme is inspired by the literature on gossiping and related P2P systems [5] [1].

Fig. 1. Simple example of semantic query routing with S2P2P.

A very simple but illustrative example of query routing in S2P2P networks is shown in Fig. 1. Given an unstructured social P2P network in which the user of the peer node H is searching for a video on "car races" (topic of query q). Peer H does not know yet any items in the network and issues the given query q with one walker
(and TTL = 4) to the network following the classic k-random search protocol. That is, it randomly forwards q to its direct neighbor peer D. On receiving q, peer D makes a suggestion for a query routing path for q based on its own knowledge of the semantic overlay relevant to q. In fact, peer D determines item i2, a car race video, which description i2.desc it has received in the past (in piggybacked data of some other query) from peer F as semantically relevant for q. As a result, peer D suggests the shortest path Pa(D,q) = D -> C -> E -> F for routing q containing peer F as the nearest expert peer for q. Subsequently, peer D forwards the query q to its neighbor peer C according to its initial P(D,q) = D -> C.

On receiving q, peer C compares the semantic description of the topic of q with that of the topics of items known by peers it knows of and eventually finds out that some item i1, a video on 'German Grand Prix' offered by peer B is relevant to q as well. Therefore, B is also considered an expert peer for answering q.
As a result, C computes a new path suggestion P(C,q) for q based on P(D,q) that has been received by C in the piggybacked data of q from D.

For this purpose, C updates its local knowledge of remote peer expertises and semantic overlay based on P(D,q) and then computes (a) the shortest path C -> B from itself to the nearest expert peer B and (b) the shortest path from B to the next expert peer F it knows who has knowledge on videos on 'Belgian Grand Prix' and 'Formula 1 races'. Eventually, the path suggestion P(C,q) computed by C is the concatenated path from C to F: C -> B -> E -> F. Once this path has been computed, C routes q to the first peer B on it.

Peer B checks whether it has something new, adds its relevant video description(s) and follows the suggested path by routing the query further to peer E who eventually forwards it to peer F. Since the TTL of the query walker is set to 4, the path is iteratively, and in some sense jointly, augmented by the peers D, C, B, E, F, though the one computed first by C never changed afterwards and remained to be the shortest path with a maximum number of expert peers for q. After the query routing finishes and the semantic-driven walker returns to peer D on the inverse path, the relevant videos are loaded by D directly from peers B, F and G through the underlying physical network (TCP/IP).

Some remarks: First, a path shortcut from C to F via E instead of via B and E would have been shorter but less informative in the response to this one (k = 1) query walker. Second, the relevant item i3 is owned by peer G and not F, but the latter knows about this fact from previous observations. Similarly, C aquired this knowledge from piggybacked data of former queries issued by F. Given the classical k-ramdon walker approach alone, peer G would not have been reached within the given TTL at all. Using the item description propagation and semantic query routing presented here, the item of G is found nevertheless, thus showcasing the benefit of this approach.

Our experimental evaluation of S2P2P is performed on unstructured P2P networks with up to one million peers and a topology based on random graphs (RG) with averaged connectivity of 3.2, as well as random power law graphs (RLPG). In particular, the latter topology model is known to be typical for real-world social networks.

Further, we employed two models of item popularity distribution in these networks which are used for many real-world item popularity rankings: Uniform at random (R) and distribution according to Zipf's law (Z). For testing purposes, the initial value of TTL for each walker in S2P2P is 10 and the number of walkers is limited to 4.
As test collection we employed a random subset of 20k RDF linked data items (in files: instance_types_en.nt.bz2 and mappingbased_properties_en.nt.bz2) taken from DBpedia ( with its ontology (dbpedia_3.7.owl.bz2) O of 319 defined concepts and 1635 roles.

We compared S2P2P with a prominent and representative approach to semantic P2P search based on flooding, INGA [10], in terms of search performance. INGA essentially relies on a shortcut-based restricted semantic flooding search strategy. Each peer of the INGA system creates a semantic network overlay (shortcuts of paths to data) by query analysis which is a common feature with S2P2P. In addition, both the maximum fanout k and maxTTL of flooding performed by INGA are set to 3 which ensures that a query of INGA can traverse about 40 peers in a random graph-based network with averaged connectivity of 3.2.

In the following, we will focus on the search performance of S2P2P and INGA in random graph-based networks with uniform at random distribution of item popularity over all items (R) and a Zipf (Z) distribution (beta=1.03) over pre-clustered 79 topics. The experimental results revealed that S2P2P can significantly outperform INGA in terms of macro-averaged precision at recall (MAP@Recall) and averaged precision (AP), regardless of the kind of item popularity distribution, as can be seen in Fig. 2 and Fig. 3 respectively.

In fact, S2P2P achieved around 24% more precision at intermediate recall levels and around 20% more averaged precision than INGA. The main reason for that improvement is that an INGA peer cannot transitively propagate its detected shortcut information, while S2P2P peers may do this. That is, they propagate received item descriptions to those groups of peers which are located topologically farther in the network than INGA peers. In other words, in S2P2P networks, a request has a higher chance of being satisfied by more relevant items. This holds even though the maximum number of peers which a query walker can reach is the same for S2P2P and INGA in the experiment setup.

In addition, our experiments also revealed that S2P2P when combined with the dynamic semantic replication scheme DSDR [3] outperforms each of the following combinations of means for P2P search and replication in terms of an average precision of 0.813 (at 10k queries):

  • S2P2P without any replication (0.681),
  • S2P2P with non-semantic but near-optimal P2P replication scheme P2R2 [14] (0.698),
  • non-semantic P2P k-random search with replication using P2R2 (0.716),
  • non-semantic P2P k-random search with semantic replication DSDR (0.752)

Fig. 2. MAP@Recall of S2P2P and INGA.

Fig. 3. Averaged precision of S2P2P and INGA.

Moreover, both semantic search strategies appear to be not quite sensitive to the kind of item popularity distribution because the item information dissemination is mainly driven by the existence of items rather than the observed demands. In INGA, the exploration in terms of flooding for building the shortcuts is triggered independently from a query.

In this paper, we presented a semantic-driven k-walker search scheme called S2P2P for item information dissemination and query routing in unstructured P2P networks. S2P2P offers a novel query path suggestion heuristics which allows peers to jointly answer a query within a maximum set of relevant expert peers on demand topics
within the given TTL for the query walkers. The experimental results revealed that S2P2P can outperform a prominent and representative approach to semantic P2P search, INGA, in terms of MAP@Recall and AP for random and Zipf distribution of items.


[1] Aberer, K.; Cudr´e-Mauroux, P.; Datta, A.; Despotovic, Z.; Hauswirth, M.; Punceva, M.; Schmidt, R. (2003): P-Grid: a self-organizing structured P2P system. ACM SIGMOD Record, 32(3), 29–33. ACM.

[2] Basters, U.; Klusch, M. (2006): RS2D: fast adaptive search for semantic web services in unstructured P2P networks. Proceedings of International Semantic Web Conference, 87–100. Springer.

[3] Cao, X.; Klusch, M. (2012): Dynamic Semantic Data Replication for K-Random Search in Peer-to-Peer Networks. Proceedings of 11th IEEE International Symposium on Network Computing and Applications (NCA), 20–27. IEEE.

[4] Cheng, A.; Joung, Y. (2006): Probabilistic file indexing and searching in unstructured peer-to-peer networks. Computer Networks, 50(1), 106–127. Elsevier.

[5] Datta, A.; Sharma, R. (2011): GoDisco: selective gossip based dissemination of information in social community based overlays. Distributed Computing and Networking, 227–238. Springer.

[6] Filali, I.; Huet, F. (2010): Dynamic TTL-based search in unstructured peer-to-peer networks. Proceedings of 10th IEEE/ACM International Conference on Cluster, Cloud and Grid Computing (CCGrid), 438–447. IEEE.

[7] Haase, P.; Broekstra, J.; Ehrig, M.; Menken, M.; Mika, P.; Olko, M.; Plechawski, M.; Pyszlak, P.; Schnizler, B.; Siebes, R. (2004): Bibster – A semantics-based bibliographic peer-to-peer system. Proceedings of 3rd International Semantic Web Conference, 122–136. Springer.

[8] Kermarrec, A.; Van Steen, M. (2007): ACM SIGOPS Operating Systems Review, 41(5), 2–7. ACM.

[9] Liu, Y.; Wu, S.; Xiong, N.; Park, J.; Zhang, M. (2012): A cache-based search algorithm in unstructured P2P networks. Intelligent Manufacturing, 23(6), 2101–2107. Springer.

[10] Loser, A.; Staab, S.; Tempich, C. (2007): Semantic social overlay networks. Selected Areas in Communications, 25(1), 5–14. IEEE.

[11] Lv, Q.; Cao, P.; Cohen, E.; Li, K.; Shenker, S. (2002): Search and replication in unstructured peer-to-peer networks. Proceedings of 16th International Conference on Supercomputing, 84–95. ACM.

[12] Margariti, S. V.; Dimakopoulos, V. V. (2011): A Novel Probabilistic Flooding Strategy for Unstructured Peer-to-Peer Networks. Proceedings of 15th Panhellenic Conference on Informatics, 149–153. IEEE Computer Society.

[13] Patel, J.; Gupta, I.; Contractor, N. (2006): JetStream: Achieving predictable gossip dissemination by leveraging social network principles. Proceedings of 5th IEEE International Symposium on Network Computing and Applications, 32–39. IEEE.

[14] Sozio, M.; Neumann, T.; Weikum, G. (2008): Near-Optimal Dynamic Replication in Unstructured Peer-to-Peer Networks. Proceedings of 27th ACM SIGMOD Symposium on Principles of Database Systems, 281–290. ACM.

[15] Tempich, C.; Staab, S. (2005): Semantic query routing in unstructured networks using social metaphors. Chapter in Staab, S.; Stuckenschmidt, H. (eds.): Semantic Web and Peer-to-Peer, 107–123. Springer.

[16] Vigfusson, Y.; Birman, K.; Huang, Q.; Nataraj, D. (2009): GO: Platform support for gossip applications. Proceedings of 9th IEEE International Conference on Peer-to-Peer Computing, 222–231. IEEE.

[17] Xu, M.; Zhou, S.; Guan, J.; Hu, X. (2010): A path-traceable query routing mechanism for search in unstructured peer-to-peer networks. Network and Computer Applications, 33(2), 115–127. Elsevier.

Xiaoqi Cao.
He received his M.Sc. degree on computer science from Saarland University, Saarbrücken, Germany, in 2011. He is currently a Ph.D. student in computer science department of Saarland University, Germany. Supervised by PD. Dr. Matthias Klusch, he is working in the intelligent information systems research team (I2S) of the DFKI department of agents and simulated reality (ASR). His research focuses on the field semantic-driven P2P data retrieval and composition. He is also interested in the theory and application of semantic technologies.

Matthias Klusch.
PD Dr Matthias Klusch is a Senior Researcher and Research Fellow of the German Research Center for Artificial Intelligence (DFKI) as well as Private Lecturer (PD) for computer science at the Saarland University in Saarbruecken, Germany, and Adjunct Professor of computer science at the Swinburne University of Technology in Melbourne, Australia. He is head of the Intelligent Information Systems (I2S) research team of the DFKI department of Agents and Simulated Reality. His research is concerned with theory and applications of semantic web technologies, service-oriented computing and intelligent software agents for the Internet of Services, the Internet of Things, and the 3D-Internet. He received his MSc (1992) and PhD (1997) from the University of Kiel and his Habilitation (2009) in computer science from the University of the Saarland, Germany. Besides, he has been assistant professor at the Free University of Amsterdam (Netherlands), the Chemnitz University of Technology and the University of Kiel (Germany), and a postdoctoral researcher at Carnegie-Mellon University in Pittsburgh (USA). Among other, he co-founded the German conference series on multi-agent system technologies (MATES), is an editorial board member of several major international journals in the areas of Information Systems, Semantic Web, AI and Intelligent Agents. He was nominated for the ACM SIGART Award for Excellence in Autonomous Agent Research in 2008, acquired and coordinated many funded research projects, served on program committees of about 170 conferences, and has co-authored more than 180 publications. Web: