Xiaoqi Cao and Matthias KluschABSTRACT
German Research Center for Artificial Intelligence
Stuhlsatzenhausweg 3, Campus D3 2
66123 Saarbrücken, Germany
In unstructured peer-to-peer (P2P) networks, achieving good precision and recall of data search while at the same time maintaining reasonable communication overhead is a challenging problem, in particular when it comes to rare data items. To this end, we propose S2P2P, a novel approach to semantic-driven query routing and item information propagation in such networks. In S2P2P, each peer maintains its observation on the semantics of received queries (demands) and data information (supplies), as well as a local view on the network topology. Each peer, when forwarding a query, disseminates its known data information to a selected set of remote peers by taking advantage of query piggybacked data. That is, each peer locally observes the query and item semantics, aka local view on the semantic overlay of the P2P network, during the k-walker search of each query it receives or issues. For routing a query, each peer, instead of merely introducing an immediate neighbor, suggests a query routing path consisting of a sequence of peers with expertise on topics which are semantically equivalent or sufficiently similar to the query topic.
This is achieved by a path suggestion heuristics that iteratively applies Dijkstra's algorithm in a greedy manner. Each iteration manages to detect one more expert peer and augments the current path suggestion with the shortest path from its tail to the detected expert peer. That is, inspired by the literature on gossiping in P2P networks, the optimal expertise-based routing path per query is collaboratively determined by peers as the shortest path with the maximal number of peers which are actually known to have semantic expertises for the considered query topics. In particular, the collaborative path adjustments by peers on this path are done in order to maximize the gain of the total answer for the query. Our comparative experimental evaluation revealed that S2P2P may outperform representative approaches to semantic flooding while being at least as robust against changes in the network.
There are many approaches known which allow users to discover relevant items in peer-to-peer (P2P) networks. We focus on directory-less item discovery in unstructured P2P networks without any given overlay structure. In this case, each peer initially knows only about items provided by itself or its direct neighbor peers.
For example, social networks can be considered as unstructured P2P networks. Prominent examples of item location or query routing schemes in such networks are query flooding and k-random walks  with replication and caching strategies, as well as informed probabilistic adaptive search. This type of item search is effective for finding popular but not rare items and provides only probabilistic search guarantees, i.e., incomplete recall. The performance of unstructured P2P networks in terms of search precision/recall and network traffic highly depends on the chosen strategies for item information dissemination and query routing. Many approaches to semantic P2P search rely on (restricted) flooding , ,  or selectivity enforced gossiping mechanisms , , ,  in order to propagate query or item information to remote peers. In general, these approaches offer a fairly high recall but suffer from large network traffic cost. Moreover, a periodically fashioned gossiping of item information dissemination comes at the risk of propagating duplicated messages for identical items and does not pay off for unpopular but potentially relevant items in social networks. Other semantic-driven P2P search approaches such as , , , ,  employ intelligent query routing strategies based on machine learning, network topology and query analysis. Though, in these cases, the query routing focuses on one peer which is determined to be most relevant for a given query. Those approaches do not consider sets of relevant expert peers which are collaboratively determined by peers in the network based on their observed semantic network overlay.
To this end, we propose S2P2P, a k-walker-based semantic search scheme for item information dissemination and query routing in unstructured P2P networks. According to this admittedly simple but surprisingly very efficient strategy, each peer in S2P2P is able to suggest a path for query routing which contains as many known and relevant expert peers as possible instead of only one direct neighbor or remote peer. The S2P2P scheme is agnostic to the kind of semantic description of data items as well as to the kind of semantic similarity measurements which are employed by peers to determine the semantic relevance of items. In the following, we briefly describe the S2P2P search scheme and summarize some of the results of our experimental evaluation of its performance.
2 ROUTING PATH SUGGESTIONS AND ITEM INFORMATION DISSEMINATION
Each S2P2P peer in an unstructured P2P network performs a path suggestion-based query routing. Besides, semantic descriptions of items which are known to it are selectively propagated as piggybacked data of a query message during the routing process. That way, other peers along the query route are facilitated to build and improve local observation records of the semantic overlay of the network. Whenever a peer receives a query during the query routing process, it adds semantic descriptions of items it knows about to the query answer set, if those items are determined to be of sufficiently high semantic relevance to the query topic.
On forwarding a query, each peer follows to some extent the classic k-walker protocol but driven by local knowledge on the semantic overlay rather than by random choice. In fact, a peer suggests not one direct neighbor peer at random to which the query should be forwarded to but a path of peers with an expertise that is relevant to the query topic.
In brief, a routing path suggestion for the query walker is computed, which maximises the total semantic expertise gain with respect to the query topic, while considering TTL (time to live; number of hops of a walker) restriction. This suggested path contains as many relevant expert peers as possible. The path suggestion is computed by a greedy path augmentation that iteratively applies Dijkstra's algorithm in order to find the shortest paths to expert peers which can be added to the currently suggested routing path.
If the submitted query contains a non-empty path suggestion in its piggybacked data, the peer adjusts it by comparing the total expertise gain of this path with a new path computed in accordance with its own local view on the network overlay. The new suggested routing path then includes the originally contained relevant expert peers together with those of the peer currently in charge of the query routing. Additionally, a peer checks the currently unsatisfied demands of other peers on this suggested path. That is, it identifies which other peers have issued queries in the past which did not return with satisfying results. If the peer currently checking this has items that would satisfy such a demand, the corresponding item descriptions are disseminated together with the currently processed query walker as piggybacked data. This simple semantic-driven k-walker-based search scheme is inspired by the literature on gossiping and related P2P systems  .
A very simple but illustrative example of query routing in S2P2P networks is shown in Fig. 1. Given an unstructured social P2P network in which the user of the peer node H is searching for a video on "car races" (topic of query q). Peer H does not know yet any items in the network and issues the given query q with one walker
(and TTL = 4) to the network following the classic k-random search protocol. That is, it randomly forwards q to its direct neighbor peer D. On receiving q, peer D makes a suggestion for a query routing path for q based on its own knowledge of the semantic overlay relevant to q. In fact, peer D determines item i2, a car race video, which description i2.desc it has received in the past (in piggybacked data of some other query) from peer F as semantically relevant for q. As a result, peer D suggests the shortest path Pa(D,q) = D -> C -> E -> F for routing q containing peer F as the nearest expert peer for q. Subsequently, peer D forwards the query q to its neighbor peer C according to its initial P(D,q) = D -> C.
On receiving q, peer C compares the semantic description of the topic of q with that of the topics of items known by peers it knows of and eventually finds out that some item i1, a video on 'German Grand Prix' offered by peer B is relevant to q as well. Therefore, B is also considered an expert peer for answering q.
As a result, C computes a new path suggestion P(C,q) for q based on P(D,q) that has been received by C in the piggybacked data of q from D.
For this purpose, C updates its local knowledge of remote peer expertises and semantic overlay based on P(D,q) and then computes (a) the shortest path C -> B from itself to the nearest expert peer B and (b) the shortest path from B to the next expert peer F it knows who has knowledge on videos on 'Belgian Grand Prix' and 'Formula 1 races'. Eventually, the path suggestion P(C,q) computed by C is the concatenated path from C to F: C -> B -> E -> F. Once this path has been computed, C routes q to the first peer B on it.
Peer B checks whether it has something new, adds its relevant video description(s) and follows the suggested path by routing the query further to peer E who eventually forwards it to peer F. Since the TTL of the query walker is set to 4, the path is iteratively, and in some sense jointly, augmented by the peers D, C, B, E, F, though the one computed first by C never changed afterwards and remained to be the shortest path with a maximum number of expert peers for q. After the query routing finishes and the semantic-driven walker returns to peer D on the inverse path, the relevant videos are loaded by D directly from peers B, F and G through the underlying physical network (TCP/IP).
Some remarks: First, a path shortcut from C to F via E instead of via B and E would have been shorter but less informative in the response to this one (k = 1) query walker. Second, the relevant item i3 is owned by peer G and not F, but the latter knows about this fact from previous observations. Similarly, C aquired this knowledge from piggybacked data of former queries issued by F. Given the classical k-ramdon walker approach alone, peer G would not have been reached within the given TTL at all. Using the item description propagation and semantic query routing presented here, the item of G is found nevertheless, thus showcasing the benefit of this approach.
3 EXPERIMENTAL EVALUATION
Moreover, both semantic search strategies appear to be not quite sensitive to the kind of item popularity distribution because the item information dissemination is mainly driven by the existence of items rather than the observed demands. In INGA, the exploration in terms of flooding for building the shortcuts is triggered independently from a query.
 Aberer, K.; Cudr´e-Mauroux, P.; Datta, A.; Despotovic, Z.; Hauswirth, M.; Punceva, M.; Schmidt, R. (2003): P-Grid: a self-organizing structured P2P system. ACM SIGMOD Record, 32(3), 29–33. ACM.
 Basters, U.; Klusch, M. (2006): RS2D: fast adaptive search for semantic web services in unstructured P2P networks. Proceedings of International Semantic Web Conference, 87–100. Springer.
 Cao, X.; Klusch, M. (2012): Dynamic Semantic Data Replication for K-Random Search in Peer-to-Peer Networks. Proceedings of 11th IEEE International Symposium on Network Computing and Applications (NCA), 20–27. IEEE.
 Cheng, A.; Joung, Y. (2006): Probabilistic file indexing and searching in unstructured peer-to-peer networks. Computer Networks, 50(1), 106–127. Elsevier.
 Datta, A.; Sharma, R. (2011): GoDisco: selective gossip based dissemination of information in social community based overlays. Distributed Computing and Networking, 227–238. Springer.
 Filali, I.; Huet, F. (2010): Dynamic TTL-based search in unstructured peer-to-peer networks. Proceedings of 10th IEEE/ACM International Conference on Cluster, Cloud and Grid Computing (CCGrid), 438–447. IEEE.
 Haase, P.; Broekstra, J.; Ehrig, M.; Menken, M.; Mika, P.; Olko, M.; Plechawski, M.; Pyszlak, P.; Schnizler, B.; Siebes, R. (2004): Bibster – A semantics-based bibliographic peer-to-peer system. Proceedings of 3rd International Semantic Web Conference, 122–136. Springer.
 Kermarrec, A.; Van Steen, M. (2007): ACM SIGOPS Operating Systems Review, 41(5), 2–7. ACM.
 Liu, Y.; Wu, S.; Xiong, N.; Park, J.; Zhang, M. (2012): A cache-based search algorithm in unstructured P2P networks. Intelligent Manufacturing, 23(6), 2101–2107. Springer.
 Loser, A.; Staab, S.; Tempich, C. (2007): Semantic social overlay networks. Selected Areas in Communications, 25(1), 5–14. IEEE.
 Lv, Q.; Cao, P.; Cohen, E.; Li, K.; Shenker, S. (2002): Search and replication in unstructured peer-to-peer networks. Proceedings of 16th International Conference on Supercomputing, 84–95. ACM.
 Margariti, S. V.; Dimakopoulos, V. V. (2011): A Novel Probabilistic Flooding Strategy for Unstructured Peer-to-Peer Networks. Proceedings of 15th Panhellenic Conference on Informatics, 149–153. IEEE Computer Society.
 Patel, J.; Gupta, I.; Contractor, N. (2006): JetStream: Achieving predictable gossip dissemination by leveraging social network principles. Proceedings of 5th IEEE International Symposium on Network Computing and Applications, 32–39. IEEE.
 Sozio, M.; Neumann, T.; Weikum, G. (2008): Near-Optimal Dynamic Replication in Unstructured Peer-to-Peer Networks. Proceedings of 27th ACM SIGMOD Symposium on Principles of Database Systems, 281–290. ACM.
 Tempich, C.; Staab, S. (2005): Semantic query routing in unstructured networks using social metaphors. Chapter in Staab, S.; Stuckenschmidt, H. (eds.): Semantic Web and Peer-to-Peer, 107–123. Springer.
 Vigfusson, Y.; Birman, K.; Huang, Q.; Nataraj, D. (2009): GO: Platform support for gossip applications. Proceedings of 9th IEEE International Conference on Peer-to-Peer Computing, 222–231. IEEE.
 Xu, M.; Zhou, S.; Guan, J.; Hu, X. (2010): A path-traceable query routing mechanism for search in unstructured peer-to-peer networks. Network and Computer Applications, 33(2), 115–127. Elsevier.