• Nu S-Au Găsit Rezultate

Towards a semantic peer-to-peer information retrieval system

N/A
N/A
Protected

Academic year: 2022

Share "Towards a semantic peer-to-peer information retrieval system"

Copied!
28
0
0

Text complet

(1)

Towards a semantic peer-to-peer information retrieval system

Mihai Lupu & Cornelius Croitoru

(2)

Motivation

Exponential increase of available data

Bad results of term matching techniques (due to synonymy and polysemy)

High computational needs for statistical (“semantical”) approaches.

Already existing experience in p2p file- sharing (napster, gnutella, kazaa,

iMesh…)

(3)

Aim

To help the system “understand” the

“meaning” of documents and queries To exploit available files on many

machines

To take advantage of user behavior (i.e.

scientific communities)

(4)

Vector Space Model

Legend:

document query

most “relevant”

document because it is the closest one

(5)

Latent semantic indexing (LSI)

Generate term-document matrix

Apply Singular Value Decomposition Î 3 new matrices with “special” features

Use the new matrices to compute an

approximation of the original TD matrix

Compute similarity between documents

and query (as in the Vector Space Model)

Sort and return “relevant” documents

(6)

Singular value decomposition (SVD)

Original Matrix

U Terms vectors’

matrix

Σ eigen-

values matrix

V

documents’

vectors’

matrix

x x

=

m x n m x r r x r r x n

the folklore view of the eigen-values is that they represent the “features” of the original

(7)

Singular value decomposition (SVD)

k

take the first k columns of U, the first k eigen-values and the first k lines of V the k “most important features” of the matrix

Approxi- mated

Matrix

U Terms vectors’

matrix

Σ eigen-

values matrix

V

documents’

vectors’

matrix

m x n m x r r x r r x n

= x x

k k

k

(8)

Peer-to-peer

client-server architecture

server

client client

client

client

client

client

client

request reply

(9)

Peer-to-peer

Peer-to-peer architecture

server

peer peer peer

peer

peer peer

peer

(10)

PeerVOIRE

Objective:

a P2P IR application that overlays a semantic level (LSI) over a p2p infrastructure

Assumption:

the user community behaves according to the small worlds paradigm

(11)

PeerVOIRE – Small Worlds

scientific communities, the internet etc.

(12)

PeerVOIRE - P2P infrastructure

based on the CAN approach Vector Space Model:

peers, documents or queries are just vectors in some space

documents are lines in the TD matrix

peers are assigned coordinates according to some policy

(13)

PeerVOIRE – “semantic” overlay

idea:

instead of using the TD matrix, use its approximation, computed by SVD

problem:

we need information from all the nodes (all terms and all documents in the network!)

solution:

keep an image of the network on each node and update it periodically

(14)

small worlds

document1 document2 document3 document4 document5

document1 document3 document6 document2

document6 document7

document4 document5 document9

document10

document12 document10 document9

(15)

small worlds

document1 document2 document3 document4 document5

document1 document3 document6 document2

document6 document7

document4 document5 document9

document10 document11 document9

document12 document10 document9

peers

(16)

small worlds

document1 document2 document3 document4 document5

document1 document3 document6 document2

document6 document7

document4 document5 document9

document8

document12 document10 document9

documents on a peer may belong to different

areas 1

2 3

4 6

5

(17)

peer address

Q: what address (vector) should we assign to these nodes? blue or red?

A: both!

we use multiple realities.

each node may have more than one address.

document1 document2 document3 document4 document5

1

document4 document5 document9

6

(18)

adding a new node

(19)

adding a new node

(20)

adding a new node

compute the distance between the new

(21)

adding a new node

if bigger than half the maximum on a certain dimension, slice in half the zone

(22)

adding a new node

if bigger than half the maximum on a certain

(23)

adding a new node

otherwise, do not slice in half. let both nodes manage the zone

(24)

adding a new node

otherwise, do not slice in half; let both

(25)

border nodes

all nodes in a zone know everything except the neighboring zones

we need border nodes

(26)

routing

complete graph inside the area

reach outside through border nodes

(27)

node deletion

voluntary or involuntary

involuntary deletion is treated in a lazy manner

node manages area alone or with others

merging of areas is required in the first case

node is a border node or not

a replacement must be found

(28)

conclusion

Semantic Level over P2P infrastructure Small Worlds architecture is a

characteristic of the web and most user communities

multiple realities Æ increased resilience to

errors

Referințe

DOCUMENTE SIMILARE

Locations of the tibial nerve, popliteal artery, vein (b), and medial sural cutaneous nerve (c), and safe angles for nee- dle insertion (d).. n: tibial nerve, a: popliteal artery,

1. Enlarged spinoglenoid notch veins causing suprascapular nerve compression. Dynamic ultrasonogra- phy of the shoulder. Lafosse L, Tomasi A, Corbett S, Baier G, Willems K,

ductal orifice, and presence of a sphincter-like mecha- nism in the distal 3 cm of the duct [2].In the last four dec- ades, 22 cases with foreign bodies in the submandibular

1 Department of Physical Medicine and Rehabilitation, National Taiwan University Hospital, Bei Hu Branch and National Taiwan University College of Medicine, Taipei, Taiwan,

Transverse (a) and longitudinal (b) transvaginal ultrasound exhibit an isoechoic solid mass measuring 4 cm in size, with mul- tiple intralesional echogenic foci (arrows) and

These themes are: frequency use of New Information and Communication Technologies (NICT), use of NICT in order to find out information about church, cult

⇒ new models for knowledge representation – “The Semantic Web will enable machines. to comprehend semantic documents and data, not human speech

This can be done by replacing each axis with a vector of length one that points in the positive direction of the axis. Let O be the origin of the system and P be any point in the