Sistemas Distribu´ıdos Kademlia, Bittorrent e Magnetic Links

90
MC714 - Sistemas Distribu´ ıdos Kademlia, Bittorrent e Magnetic Links Islene Calciolari Garcia Instituto de Computa¸c˜ ao - Unicamp Primeiro Semestre de 2015

Transcript of Sistemas Distribu´ıdos Kademlia, Bittorrent e Magnetic Links

MC714 - Sistemas Distribuıdos

Kademlia, Bittorrent e Magnetic Links

Islene Calciolari Garcia

Instituto de Computacao - Unicamp

Primeiro Semestre de 2015

Sumario

Exercıcio para entrega

Revisao

Kademlia

Bittorrent

Magnetic links

Exercıcio para entrega

I Pesquise um metodo de busca/compartilhamento em redespeer-to-peer que nao tenha sido visto em aula e

I compare com outro que ja tenha sido apresentado.

I Formato: arquivo .pdf (3 a 6 paginas). Incluir referencias!!!

I Data de entrega: 31 de marco

I Apenas uma pessoa do grupo precisa entregar. O arquivodeve conter o nome dos integrantes do grupo.

SEARCHING TECHNIQUES IN PEER-TO-PEER NETWORK

SAuthor: Xiuqi Li and Jie WuPresenter: Zia Ush ShamszamanANLAB, ICE, HUFS

Fonte: Zia Ush Shamszaman, a partir do trabalho de Xiuqi Li and Jie Wu

CONCEPT OF P2P NETWORK P2P networks are overlay networks on top of Internet, where nodes

are end systems in the Internet and maintain information about a set of other nodes (called neighbors) in the P2P.

P2P networks offer the following benefits They do not require any special administration or financial arrangements. They are self-organized and adaptive. Peers may come and go freely. P2P

systems handle these events automatically. They can gather and harness the tremendous computation and storage r

esources on computers across the Internet. They are distributed and decentralized. Therefore, they are potentially fau

lt-tolerant and load-balanced.

3

Fonte: Zia Ush Shamszaman, a partir do trabalho de Xiuqi Li and Jie Wu

P2P NETWORK CLASSIFICATION-1/2

P2P networks can be classified based on the control over data location and network topology.

There are three categories: Unstructured: In an unstructured P2P network such as Gnutell

a, no rule exists which defines where data is stored and the network topology is arbitrary.

Loosely structured: In a loosely structured network such as Freenet and Symphony, the overlay structure and the data location are not precisely determined.

Highly structured: In a highly structured P2P network such as Chord, both the network architecture and the data placement are precisely specified.

4

Fonte: Zia Ush Shamszaman, a partir do trabalho de Xiuqi Li and Jie Wu

P2P NETWORK CLASSIFICATION-2/2

P2P networks can also be classified as centralized and decentralized In a centralized P2P such as Napster, a central directory of object location, ID a

ssignment, etc. is maintained in a single location. Decentralized P2Ps adopt a distributed directory structure. These

systems can be further divided into Purely decentralized systems, such as Gnutella and Chord, peers are totally equal. Hybrid systems, some peers called dominating nodes or super-peers serve the search r

equest of other regular peers.

Another classification of P2P systems is hierarchical & non-hierarchical based on whether the overlay structure is a hierarchy or not. All hybrid systems and few purely decentralized systems such as Kelips, are hie

rarchical systems. Hierarchical systems provide good scalability, opportunity to take advantage of node heterogeneity, and high routing efficiency

Most purely decentralized systems have flat overlays and are non-hierarchical systems. Non-hierarchical systems offer load-balance and highresilience

5

Fonte: Zia Ush Shamszaman, a partir do trabalho de Xiuqi Li and Jie Wu

WHAT IS “SEARCHING” IN P2P? Searching means locating desired data. Most existing P2P systems support thesimple object lookup by key

or identifier. Some existing P2P systems can handle more complex keyword que

ries, which find documents containing keywords in queries. More than one copy of an object may exist in a P2P system. There

may be more than one document that contains desired keywords. Some P2P systems are interested in a single data item; others are

interested in all data items or as many data items as possible that satisfy a given condition.

Most searching techniques are forwarding-based. Starting with the requesting node, a query is forwarded (or routed) to the desired node/s.

6

Fonte: Zia Ush Shamszaman, a partir do trabalho de Xiuqi Li and Jie Wu

DESIRED FEATURES OF SEARCHING ALGORITHMS IN P2P SYSTEMS High-quality query results Minimal routing state maintained per node High routing efficiency Load balance Resilience to node failures Support of complex queries

7

Fonte: Zia Ush Shamszaman, a partir do trabalho de Xiuqi Li and Jie Wu

QUALITY OF QUERY RESULT

The quality of query results is application dependent. Generally, it is measured by the number of results and relev

ance.

The routing state refers to the number of neighbors each node maintains.

The routing efficiency is generally measured by the number of overlay hops per query.

In some systems, it is also evaluated using the number of messages per query.

Different searching techniques make different trade-offs between these desired characteristics.

8

Fonte: Zia Ush Shamszaman, a partir do trabalho de Xiuqi Li and Jie Wu

1/31

Kademlia: A Peer­to­peer Information System Based on the XOR Metric

Based on slides by Amir H. Payberah ([email protected])

Fonte: https://www.kth.se/social/upload/516479a5f276545d6a965080/3-kademlia.pdf

3/31

Kademlia Basics

•Kademlia is a key­value(object) store.

•Each object is stored at the k closest nodes to the object's ID.

•Distance between id1 and id2: d(id1, id2) = id1 XOR id2 If ID space is 3 bits: 

d(1, 4) = d(0012, 1002) = 0012 XOR 1002

= 1012 = 5

Fonte: https://www.kth.se/social/upload/516479a5f276545d6a965080/3-kademlia.pdf

4/31

Kademlia Routing Table

P

Node

KBucket List

KBucket

•Kbucket: each node keeps a list of references to nodes (contacts) of distance between 2i and 2i+1 for i=1 to i=N.

•Each Kbucket has max k entries.

[1, 2)

[2, 4)

[4, 8)

[8, 16)

[16, 32)

[32, 64)

[64, 128)

[128, 256)

Fonte: https://www.kth.se/social/upload/516479a5f276545d6a965080/3-kademlia.pdf

5/31

Kademlia Tuning Parameters

•B is the size in bits of the keys used to identify nodes and store and retrieve data; in basic Kademlia this is 160, the length of an SHA1 digest (hash).

•k is the maximum number of contacts stored in a Kbucket; this is normally 20.

•alpha () represents the degree of parallelism in network calls, usually 3.

•Other constants used in Kad: tExpire = 86400s, the time after which a key/value pair expires; this is a time-to-live

(TTL) from the original publication date tRefresh = 3600s, after which an otherwise unaccessed bucket must be refreshed tReplicate = 3600s, the interval between Kademlia replication events, when a node

is required to publish its entire database tRepublish = 86400s, the time after which the original publisher must republish a

key/value pair

Fonte: https://www.kth.se/social/upload/516479a5f276545d6a965080/3-kademlia.pdf

6/31

FIND_NODE in Kademlia

P

Node

KBucket List

Lookup Q

closest nodes to Q are stored here

•Closest nodes in ID space

Fonte: https://www.kth.se/social/upload/516479a5f276545d6a965080/3-kademlia.pdf

7/31

FIND_NODE in Kademlia

P

Node

KBucket List

closest nodes to Q are stored here

A B C

... and select nodes from the appropriate kbucket

Lookup Q

Fonte: https://www.kth.se/social/upload/516479a5f276545d6a965080/3-kademlia.pdf

8/31

FIND_NODE in Kademlia

FIND_NODE(Q)

P

A

B

C

FIND_NODE(Q)

FIND_NODE(Q)

Fonte: https://www.kth.se/social/upload/516479a5f276545d6a965080/3-kademlia.pdf

9/31

FIND_NODE in Kademlia

A

Find k closest nodes to Q

Find k closest nodes to Q

B

Find k closest nodes to Q

Find k closest nodes to Q

C

Find k closest nodes to Q

Find k closest nodes to Q

Fonte: https://www.kth.se/social/upload/516479a5f276545d6a965080/3-kademlia.pdf

10/31

FIND_NODE in Kademlia

Returns k closest nodes to Q

P

A

B

C

Returns k closest nodes to Q

Returns k closest nodes to Q

Fonte: https://www.kth.se/social/upload/516479a5f276545d6a965080/3-kademlia.pdf

11/31

FIND_NODE in Kademlia, Update Kbuckets

Received responses from A, B and C

P

When P receives a response from a node, it updates the appropriate Kbucket for the sender’s node ID.

KBucket List

M

 P issues up to new requests to nodes it has not yet queried from the set of nodes received in the responses

N

O

Fonte: https://www.kth.se/social/upload/516479a5f276545d6a965080/3-kademlia.pdf

12/31

FIND_NODE in Kademlia

FIND_NODE(Q)

P

M

N

O

FIND_NODE(Q)

FIND_NODE(Q)

Fonte: https://www.kth.se/social/upload/516479a5f276545d6a965080/3-kademlia.pdf

13/31

FIND_NODE in Kademlia

Received information in round n­1

P

Received information in round n

Repeats this procedure iteratively until received information in round n­1 and n are the same.

Fonte: https://www.kth.se/social/upload/516479a5f276545d6a965080/3-kademlia.pdf

14/31

FIND_NODE in Kademlia

P

T S XReceived information in round n R

P resends the FIND_NODE to k closest nodes it has not already queried ...

Fonte: https://www.kth.se/social/upload/516479a5f276545d6a965080/3-kademlia.pdf

15/31

Let's Look Inside Kademlia

Fonte: https://www.kth.se/social/upload/516479a5f276545d6a965080/3-kademlia.pdf

16/31

Node State

•Kbucket: each node keeps a list of information for nodes of distance between 2i and 2i+1.

0 <= i < 160

Sorted by time last seen.

110

111

100101

000011 010 001

[1, 2)

[2, 4)

[4, 8)

Fonte: https://www.kth.se/social/upload/516479a5f276545d6a965080/3-kademlia.pdf

17/31

Node State

•Kbucket: each node keeps a list of information for nodes of distance between 2i and 2i+1.

0 <= i < 160

Sorted by time last seen.

110

111

100101

000011 010 001

[1, 2) ­ Two first bits in common

[2, 4) ­ First bit in common

[4, 8) ­ No common prefix

Fonte: https://www.kth.se/social/upload/516479a5f276545d6a965080/3-kademlia.pdf

18/31

Kademlia RPCs

•PING Probes a node to see if it is online.

•STORE Instructs a node to store a <key, value> pair.

• FIND_NODE Returns information for the k nodes it knows about closest to the target ID. It can be from one kbucket or more.

• FIND_VALUE Like FIND_NODE, ... But if the recipient has stored they <key, value>, it just returns the stored value.

Fonte: https://www.kth.se/social/upload/516479a5f276545d6a965080/3-kademlia.pdf

19/31

Store Data

• The <key, value> data is stored in k closest nodes to the key.

Fonte: https://www.kth.se/social/upload/516479a5f276545d6a965080/3-kademlia.pdf

20/31

Lookup Service

001

000

011010

110 100 111

[1, 2)

[2, 4)

[4, 8)

110

111

100

000011 010 001

[1, 2)

[2, 4)

[4, 8)

100

101

110111

011001 000 010

[1, 2)

[2, 4)

[4, 8)

Step1

Step2

Step3

Fonte: https://www.kth.se/social/upload/516479a5f276545d6a965080/3-kademlia.pdf

21/31

Maintaining Kbucket List (Routing Table)

•When a Kademlia node receives any message from another node, it updates the appropriate kbucket for the sender’s node ID.

Fonte: https://www.kth.se/social/upload/516479a5f276545d6a965080/3-kademlia.pdf

22/31

Maintaining Kbucket List (Routing Table)

•When a Kademlia node receives any message from another node, it updates the appropriate kbucket for the sender’s node ID.

• If the sending node already exists in the kbucket: Moves it to the tail of the list.

Fonte: https://www.kth.se/social/upload/516479a5f276545d6a965080/3-kademlia.pdf

23/31

Maintaining Kbucket List (Routing Table)

•When a Kademlia node receives any message from another node, it updates the appropriate kbucket for the sender’s node ID.

• If the sending node already exists in the kbucket: Moves it to the tail of the list.

•Otherwise: If the bucket has fewer than k entries:

• Inserts the new sender at the tail of the list.  Otherwise:

• Pings the kbucket’s least­recently seen node:• If the least­recently seen node fails to respond:

– it is evicted from the k­bucket and the new sender inserted at the tail.• Otherwise:

– it is moved to the tail of the list, and the new sender’s contact is discarded.

Fonte: https://www.kth.se/social/upload/516479a5f276545d6a965080/3-kademlia.pdf

24/31

Maintaining Kbucket List (Routing Table)

•Buckets should generally be kept constantly fresh, due to traffic of requests travelling through nodes.

•When there is no traffic: each peer picks a random ID in kbucket's range and performs a node search for that ID.

Fonte: https://www.kth.se/social/upload/516479a5f276545d6a965080/3-kademlia.pdf

25/31

Join

•Node P contacts an already participating node Q.

•P inserts Q into the appropriate kbucket.

•P then performs a node lookup for its own node ID.

Fonte: https://www.kth.se/social/upload/516479a5f276545d6a965080/3-kademlia.pdf

26/31

Leave And Failure 

•No action!

• If a node does not respond to the PING message, remove it from the table.

Fonte: https://www.kth.se/social/upload/516479a5f276545d6a965080/3-kademlia.pdf

27/31

Kademlia vs. Chord

Fonte: https://www.kth.se/social/upload/516479a5f276545d6a965080/3-kademlia.pdf

28/31

Kademlia vs. Chord

• like Chord When = 1 the lookup algorithm resembles Chord's in term of message 

cost.

•Unlike Chord XOR metric is symmetric, while Chord's metric is asymmetric.

Fonte: https://www.kth.se/social/upload/516479a5f276545d6a965080/3-kademlia.pdf

29/31

Summary

001

000

011010

110 100 111

[1, 2)

[2, 4)

[4, 8)

110

111

100

000011 010 001

[1, 2)

[2, 4)

[4, 8)

100

101

110111

011001 000 010

[1, 2)

[2, 4)

[4, 8)

Step1

Step2

Step3

Fonte: https://www.kth.se/social/upload/516479a5f276545d6a965080/3-kademlia.pdf

30/31

References

•Kademlia Specification http://xlattice.sourceforge.net/components/protocol/kademlia/specs.html

• Petar Maymounkov and David Mazieres, "Kademlia: A Peer-to-Peer Information System Based on the XOR Metric", IPTPS '02

http://www.cs.rice.edu/Conferences/IPTPS02/109.pdf

•Daniel Stutzbach and Reza Rejaie, "Improving Lookup Performance over a Widely-Deployed DHT", INFOCOM '06

http://www.barsoom.org/~agthorr/papers/infocom-2006-kad.pdf

•Raul Jimenez, Flutra Osmani and Bjorn Knutsson, “Sub-Second Lookups on a Large-Scale Kademlia-Based Overlay”, P2P '11.

http://people.kth.se/~rauljc/p2p11/jimenez2011subsecond.pdf

Fonte: https://www.kth.se/social/upload/516479a5f276545d6a965080/3-kademlia.pdf

The BitTorrent Protocol

Fonte: Prof. Sukumar Ghosh

What is BitTorrent?

Efficient content distribution system using file

swarming. Does not perform all the functions of a typical

p2p system, like searching.

The throughput increases with the number of down

loaders via the efficient use of network bandwidth

Fonte: Prof. Sukumar Ghosh

File sharingTo share a file or group of files, the initiator first creates a .torrent file, a small file that contains

Metadata about the files to be shared, and Information about the tracker, the computer

that coordinates the file distribution.

Downloaders first obtain a .torrent file, and then connect to the specified tracker, which tells them from which other peers to download the pieces of the file.

Fonte: Prof. Sukumar Ghosh

How it works

The file to be distributed is split up into pieces and an SHA-1 hash is calculated for each piece

Fonte: Prof. Sukumar Ghosh

BT ComponentsBT Components

The peers first obtain a metadata file for each objectThe metadata contains:

The SHA-1 hashes of all pieces A mapping of the pieces to files Piece size Length of the file A tracker reference

Fonte: Prof. Sukumar Ghosh

BT ComponentsBT Components

The tracker is a central server keeping a list of all peers participating in the swarm

A swarm is the set of peers that are participating in distributing the same files A peer joins a swarm by asking the tracker for a peer list and connects to those peers.

Fonte: Prof. Sukumar Ghosh

BitTorrent LingoSeeder = a peer that provides the complete file.Initial seeder = a peer that provides the initial copy.

Initial seeder

Seeder

Leecher

One who is downloading(not a derogatory term)

Leecher

Fonte: Prof. Sukumar Ghosh

Simple exampleSimple example

Seeder: A

Downloader B

{1,2,3,4,5,6,7,8,9,10}

{}{1,2,3}

Downloader C

{}{1,2,3}

{1,2,3,4}

{1,2,3,5}

{1,2,3,4,5}

Fonte: Prof. Sukumar Ghosh

Basic Idea

As a leecher downloads pieces of the file, replicas of the pieces are created. More downloads mean more replicas available

As soon as a leecher has a complete piece, it can potentially share it with other downloaders. Eventually each leecher becomes a seeder by obtaining all the pieces, and assembles the file. Verifies the checksum.

Fonte: Prof. Sukumar Ghosh

Operation

Fonte: Prof. Sukumar Ghosh

Download in progress

Fonte: Prof. Sukumar Ghosh

Download in progress

Fonte: Prof. Sukumar Ghosh

Pipelining

When transferring data over TCP, always have several

requests pending at once (typically 5), to avoid a

delay between pieces being sent.

Every time a piece or a sub-piece arrives, a new

request is sent out.

Fonte: Prof. Sukumar Ghosh

Piece Selection

• The order in which pieces are selected by different peers is critical for good performance

• If an inefficient policy is used, then peers may end up in a situation where each has all identical set of easily available pieces, and none of the missing ones.

• If the original seed is prematurely taken down, then the file cannot be completely downloaded! What are “good policies?”

Fonte: Prof. Sukumar Ghosh

Piece Selection

Small overlap is good Large overlap is bad­­ wastes bandwidth

Fonte: Prof. Sukumar Ghosh

Piece selectionPiece selection

• Strict Priority• Rarest First

– General rule• Random First Piece

– Special case, at the beginning• Endgame Mode

– Special case

Fonte: Prof. Sukumar Ghosh

Random First Piece

• Initially, a peer has nothing to trade• Important to get a complete piece ASAP• Select a random piece of the file and

download it

Fonte: Prof. Sukumar Ghosh

Rarest Piece First

• Determine the pieces that are most rare among

your peers, and download those first.

• This ensures that the most commonly available

pieces are left till the end to download.

Fonte: Prof. Sukumar Ghosh

Endgame Mode

Near the end, missing pieces are requested from every peer containing them.

This ensures that a download is not prevented from completion due to a single peer with a slow transfer rate.

Some bandwidth is wasted, but in practice, this is not too much.

Fonte: Prof. Sukumar Ghosh

BT: internal mechanismBT: internal mechanism

• Built-in incentive mechanism (where all the

magic happens):

– Choking Algorithm

– Optimistic Unchoking

Fonte: Prof. Sukumar Ghosh

Choking• Choking is a temporary refusal to upload. It is one

of BT’s most powerful idea to deal with free

riders (those who only download but never

upload).

• Tit-for-tat strategy is based on game-theoretic

concepts.

Fonte: Prof. Sukumar Ghosh

Choking

Reasons for choking: – Avoid free riders– Network congestion

A good choking algorithm caps the number of

simultaneous uploads for good TCP performance.

Fonte: Prof. Sukumar Ghosh

More on Choking

Peers try out unused connections once in a while to find out if they might be better than the current ones (optimistic unchoking).

Fonte: Prof. Sukumar Ghosh

Optimistic unchokingOptimistic unchoking

• A BT peer has a single “optimistic unchoke” to which it uploads regardless of the current download rate from it. This peer rotates every 30s

• Reasons:– To discover currently unused connections that

are better than the ones being used– To provide minimal service to new peers

Fonte: Prof. Sukumar Ghosh

Upload-Only mode

• Once download is complete, a peer can only upload. The question is, which nodes to upload to?

• Policy: Upload to those with the best upload rate. This ensures that pieces get replicated faster, and new seeders are created fast

Fonte: Prof. Sukumar Ghosh

Questions about BT

• What is the effect of bandwidth constraints?

• Is the Rarest First policy really necessary?

• Must nodes perform seeding after downloading is complete?

• How serious is the Last Piece Problem?

• Does the incentive mechanism affect the performance much?

Fonte: Prof. Sukumar Ghosh

Trackerless torrents

BitTorrent also supports "trackerless" torrents, 

featuring a DHT implementation that allows the client to download torrents that have been created without using a BitTorrent tracker.

Fonte: Prof. Sukumar Ghosh

UOZ–FS-CS

Magnet LinksMagnet Links

An Introduction..

Karwan Jacksi

An Introduction..

Karwan JacksiFaculty of Science

Computer Science DepartmentUniversity of Zakho

22/04/2012

Faculty - Department Seminar

Fonte: Karwan Jacksi

Outline:

• Background

• Client-Server vs. Peer to Peer Model.

UOZ–FS-CS

• Client-Server vs. Peer to Peer Model.

• BitTorrent Protocol.

• DHT Networks.

• Peer Exchange.

• Magnet Links

• History

• Use of Content Hashes• Use of Content Hashes

• Technical Description

• The Pirate Bay

Karwan Jacksi22/04/2012

Fonte: Karwan Jacksi

Background

• Client Server Model

– The server has to upload the file to all clients that are requesting.

UOZ–FS-CS

– The server has to upload the file to all clients that are requesting.

– The server bandwidth is the bottleneck when many concurrent applicants request.

– Would get congested and overload the server with too many requests.

– Lacks the robustness.

– since it has a single point of failure.

• Peer to Peer (P2P) Model• Peer to Peer (P2P) Model

– Offers more than a single source for files to be downloaded.

– Getting pieces from other peers increase while the number of concurrent peers increase.

– bandwidth is used efficiently.

Karwan Jacksi22/04/2012

Fonte: Karwan Jacksi

Background

• BitTorrent Protocol

– One of many P2P file sharing prototypes in existence. e.g. Napster, Kazaa… etc.

UOZ–FS-CS

– One of many P2P file sharing prototypes in existence. e.g. Napster, Kazaa… etc.

– One of few P2P protocols that has managed to attract millions of users.

– One of the most common protocols for transferring large files.

– Its power comes from splitting the file into several smaller pieces.

– once a piece is obtained by a peer, it can be shared with other peers in the swarm.

– To download a file via BitTorrent, you need:

– Torrent file: a small metadata file with .torrent extension.

• Contains: information about files e.g. names, size, etc., and URL of a Tracker.

– Tracker: a navigation centre for the swarm and is responsible for helping clients to

find each other in their swarm.

Karwan Jacksi22/04/2012

Fonte: Karwan Jacksi

Background

• Distributed hash table (DHT)

– A class of a decentralized distributed system that provides a lookup

UOZ–FS-CS

– A class of a decentralized distributed system that provides a lookup

service similar to a hash table;

– Usually most file sharing programs use a distributed hash table.

– The DHT Network is used to find IP addresses of peers present in a

swarm, instead of those provided by a tracker.

– DHT allows to search for peers using queries based on info hash and – DHT allows to search for peers using queries based on info hash and

requires no interaction whatsoever with the tracker(s) of that torrent.

– Search engines use DHT networks to look up

– what search terms are the most popular, and

– what different parts of the search engine most people use most frequently.

Karwan Jacksi22/04/2012

Fonte: Karwan Jacksi

Background

• Peer Exchange (PEX)

– A communications protocol that augments the BitTorrent protocol.

UOZ–FS-CS

– A communications protocol that augments the BitTorrent protocol.

– It allows a group of peers that are collaborating to share a given file.

– The original design of the BitTorrent protocol, peers in a "swarm“ relied

upon a central computer server “tracker” to find each other.

– PEX greatly reduces the reliance of peers on a tracker by allowing each

peer to directly update others in the swarm as to which peers are

currently in the swarm.

– By reducing dependency on a centralized tracker, PEX increases the

speed, efficiency, and robustness of the BitTorrent protocol.

Karwan Jacksi22/04/2012

Fonte: Karwan Jacksi

What is Magnet Link?

– According to the original BitTorrent design, .torrent files are

downloaded from torrent web sites (usually index sites).

UOZ–FS-CS

– Upon downloading the file, the BitTorrent client calculates a 20-byte

SHA-1 hash of the info key from the .torrent file which it uses in the

query made to the tracker to uniquely identify the torrent and find out

IP addresses of other peers sharing that torrent

– to which it will subsequently connect and download the contents referred in the

.torrent file.

– Magnet Links take that a step further, since they contain embedded as

a parameter, not the link to a .torrent file but instead, the info-hash a parameter, not the link to a .torrent file but instead, the info-hash

value already calculated for that specific torrent file.

– Therefore, by clicking on a Magnet Link your client gets the info-hash of

the torrent passed to it, which it further uses to query the DHT Network

and find other peers which share that torrent.

Karwan Jacksi22/04/2012

Fonte: Karwan Jacksi

Background

• ‘.torrent ‘ files

– For years, BitTorrent clients, trackers and indexers have relied on .torrent files to

UOZ–FS-CS

– For years, BitTorrent clients, trackers and indexers have relied on .torrent files to

store information on the files shared with the popular p2p protocol.

– These files are stored by indexing sites and are used by BitTorrent clients to

connect to the tracker sites.

– The files hold several types of data, a URL of the tracker site, names for the files

it shared, as well as hash codes of files.

– All of this is used by the client to connect with peers that have the files in the torrent,

or portions of them, and also to ensure that the downloaded data is accurate.

– This system has several disadvantages, some technical, but one of the

biggest is that BitTorrent indexers have to store the .torrent files on

their servers, which leaves them vulnerable to legal threats if the

content shared happens to be infringing despite containing no actual

infringing data by themselves.Karwan Jacksi

22/04/2012

Fonte: Karwan Jacksi

Magnet Links

– Magnet links though are just links, they have no files associated with

them just data.

UOZ–FS-CS

– The links are an evolving URI standard developed primarily to be used by

P2P networks.

– They differ from URLs in that they don't hold information on the

location of a resource but rather on the content of the file or files to

which they link.

– Technically, magnet links are made up of a series of parameters

containing various data in no particular order. containing various data in no particular order.

– In the case of BitTorrent :

– they hold the hash value of the torrent which is then used to locate copies of the files

among the peers.

– they may also hold file name data or links to trackers used by the torrent.

Karwan Jacksi22/04/2012

Fonte: Karwan Jacksi

Magnet Links

– With magnet links, BitTorrent indexers don't have to store any file at all,

just a few snippets of data leaving the individual client apps to do all the

UOZ–FS-CS

heavy lifting.

– Magnet links can be copy-pasted as plain text by users and shared via

email, IM or any other medium.

– For the indexer sites, the allure is clear, using magnet links makes it

harder for them to be accused of any wrong-doing in court.

– Theoretically, magnet links should not have any disadvantages for the

users over .torrent files either. users over .torrent files either.

– It would also potentially make downloads faster as it would enable the

clients to download from peers which have identical files but with

different names.

Karwan Jacksi22/04/2012

Fonte: Karwan Jacksi

Magnet Links

– In practice though, since the technology is still being actively developed,

some kinks still creep up.

UOZ–FS-CS

– Up until very recently, many of the major BitTorrent clients didn't support

magnet links at all.

– After the Pirate Bay introduced them, this is no longer a problem, but there

are still things to work out.

– Indexer sites haven't agreed on a single link format, so it’s up to the

clients to support the various implementations.

Karwan Jacksi22/04/2012

Fonte: Karwan Jacksi

Magnet Links

– And for the users, the experience isn't on par with using plain .torrent

files yet.

UOZ–FS-CS

– For example, magnet links on the Pirate Bay don't have any additional data

on the torrent other than its content so when the link is opened in uTorrent,

for example, the torrent won't have a name or list the files in it.

– This leads to a second problem, without knowing the contents of the

torrent, uTorrent starts downloading it directly in the default location,

preventing users from selecting a custom location or selecting just some files

in a multiple-file torrent.

– These are likely to be just temporary set-backs, the recently-launched – These are likely to be just temporary set-backs, the recently-launched

TorrIndex, the world's first magnet link-only BitTorrent indexer, is listing

links which have additional information like tracker URLs and the torrent's

name.

– And with broader support from BitTorrent clients and indexers, magnet links

will eventually replace .torrent files sooner than you might expect.

Karwan Jacksi22/04/2012

Fonte: Karwan Jacksi

Magnet Links

– Magnet links don't require a tracker (since it uses DHT), nor does it

require you to download a separate file before starting the download,

which is convenient.

UOZ–FS-CS

which is convenient.

– The main reason torrent sites are moving toward magnet links—apart

from convenience to the user—is that these links (probably) free torrent

sites like The Pirate Bay from legal trouble. Since The Pirate Bay won't

be hosting files that link to copyrighted content—that is, the torrent

files—it's more difficult to claim the site is directly enabling the

downloading of copyrighted material.

Karwan Jacksi22/04/2012

Fonte: Karwan Jacksi

Magnet Links

• History

– The standard was developed in 2002, partly as a "vendor- and project-neutral

UOZ–FS-CS

– The standard was developed in 2002, partly as a "vendor- and project-neutral

generalization" of the ed2k: and freenet: URI schemes used

by eDonkey2000 and Freenet, respectively, and attempts to follow

official (Internet Engineering Tast Force) IETF URI standards as closely as

possible.

– Applications supporting magnet links include

μTorrent, aMule, BitComet, BitSpirit, BitTorrent, DC++, Deluge, FrostWire, gtkg

nutella, I2P, KTorrent, MLDonkey, Morpheus, Qbittorrent, rTorrent,Shareaza, Tra

nsmission and Vuze.nsmission and Vuze.

Karwan Jacksi22/04/2012

Fonte: Karwan Jacksi

Magnet Links

• Use of content hashes

– The most common use of magnet links is to link to a particular file based

UOZ–FS-CS

– The most common use of magnet links is to link to a particular file based

on a hash of its contents, producing a unique identifier for the file,

similar to an ISBN or catalog number.

– Unlike traditional identifiers, however, content-based signatures can be

generated by anyone who already has the file, and so do not need a

central authority to issue them.

– This makes them popular for use as "guaranteed" search terms within

the file sharing community where anyone can distribute a magnet link to the file sharing community where anyone can distribute a magnet link to

ensure that the resource retrieved by that link is the one intended,

regardless of how it is retrieved.

Karwan Jacksi22/04/2012

Fonte: Karwan Jacksi

Magnet Links

• Use of content hashes

– While it is theoretically possible that two files could have the same hash

UOZ–FS-CS

– While it is theoretically possible that two files could have the same hash

value (known as a "hash collision"), cryptographic hash functions are

designed so that the probability of this event is a practical impossibility

– even if an expert is intentionally looking to find two files with the same hash value.

– Another advantage of magnet links is their open nature and platform

independence:

– the same magnet link can be used to download a resource from one of any number of

applications on almost any operating system. applications on almost any operating system.

– Because magnets are concise and plain-text, it is possible for users to

simply copy-and-paste the links into emails or instant messages, a

property not found in, for example,BitTorrent files.

Karwan Jacksi22/04/2012

Fonte: Karwan Jacksi

Magnet Links

• Technical description

– Magnet links consist of a series of one or more parameters, the order of

UOZ–FS-CS

– Magnet links consist of a series of one or more parameters, the order of

which is not significant, formatted in the same way as the query

string on the end of many HTTP URLs.

– The most common parameter is "xt", meaning "exact topic", which is

generally a URN formed from the content hash of a particular file, e.g..

magnet:?xt=urn:sha1:YNCKHTQCWBTRNJIV4WNAE52SJUQCZO5C

referring to the Base32 encoded SHA-1 hash of the file in question.

– Note that although this refers to a particular file, a search must still be

carried out by the client application to determine where, if anywhere, it

can obtain that file.

Karwan Jacksi22/04/2012

Fonte: Karwan Jacksi

Magnet Links

• Technical description

– Other parameters defined by the draft standard are:

UOZ–FS-CS

– Other parameters defined by the draft standard are:

– "dn" ("display name"): a filename to display to the user, for convenience

– "kt" ("keyword topic"): a more general search, specifying search terms

rather than a particular file

– "mt" ("manifest topic"): a URI pointing to a "manifest", e.g. a list of

further items

– The standard also suggests that multiple parameters of the same type – The standard also suggests that multiple parameters of the same type

can be used by appending ".1", ".2", etc. to the parameter name, e.g.:

magnet:?xt.1=urn:sha1:YNCKHTQCWBTRNJIV4WNAE52SJUQCZO5C&xt.2=urn:sha1:T

XGCZQTH26NL6OUQAJJPFALHG2LTGBC7

Karwan Jacksi22/04/2012

Fonte: Karwan Jacksi

Magnet Links

• The Pirate Bay

– The world's largest BitTorrent tracker is shutting down!.

UOZ–FS-CS

– The world's largest BitTorrent tracker is shutting down!.

– As of January 2012, The Pirate Bay has switched to magnet links as the

default option and may use magnet links exclusively eventually.

– On February 28, 2012, The Pirate Bay started using magnet links

entirely.

– It has decided that there is no need to run a tracker anymore, so it will

remain down! It's the end of an era, but the era is no longer up to date.

– We have put a server in a museum already, and now the tracking can be

put there as well,” the Pirate Bay announced.

Karwan Jacksi22/04/2012

Fonte: Karwan Jacksi

Magnet Links

• The Pirate Bay

– Recently though, technologies like Distributed Hash Table (DHT) and

UOZ–FS-CS

– Recently though, technologies like Distributed Hash Table (DHT) and

Peer Exchange (PEX) have rendered trackers useless as they are able to

find peers without the need for a tracker server. This makes the whole

system a lot more stable and resilient to technical problems, but

perhaps even more important, it makes it a lot harder to be attacked by

anti-piracy organizations.

– Instead, the Pirate Bay, now feature a magnet link, allowing users to get

access to a torrent without the need to download any file making the access to a torrent without the need to download any file making the

sites even less susceptible to legal threats.

Karwan Jacksi22/04/2012

Fonte: Karwan Jacksi

Magnet Links

• Advantages

– They do not need a central authority to issue.

UOZ–FS-CS

– They do not need a central authority to issue.

– They has open nature and platform independence, the same magnet link can be

used as long as the system has the appropriate application.

– They are more user based, easy to use.

– All the system need is an application that support magnet links.

– Since it is user based, it is so easy to share resources.

– Most Magnet links application has a search function.

Disadvantages:• Disadvantages:– Slower speed.

– Less control on speed and on the contend that is being downloaded

Karwan Jacksi22/04/2012

Fonte: Karwan Jacksi

Torrents

• Advantages

• Faster connection.

UOZ–FS-CS

• Faster connection.

• Easier to search through the web.

• Disadvantages:

– Trackers are needed when downloading a contend. If the tracker is

down and there are no existing connections, the download may never be

finished.

– Most torrent client do not have search function. Torrents usually would – Most torrent client do not have search function. Torrents usually would

be find from the internet.

– If a torrent was stored on web for long time, the tracker may be expired

already. It is almost impossible to find existing seed or leechers.

Karwan Jacksi22/04/2012

Fonte: Karwan Jacksi

References

• Further Development of BitTorrent Simulator in Erlang (Karwan Jacksi)

• Is P2P dying or just hiding? (Thomas Karagiannis UC Riverside [email protected])

UOZ–FS-CS

• Is P2P dying or just hiding? (Thomas Karagiannis UC Riverside [email protected])

• Distributed algorithms for improving BitTorrent performance (ANIL CAN AKAY)

• Incentives Build Robustness in BitTorrent (Bram Cohen)

• http://lifehacker.com/5411311/bittorrents-future-dht-pex-and-magnet-links-explained

• http://en.wikipedia.org/wiki/Distributed_hash_table

• http://en.wikipedia.org/wiki/Peer_exchange• http://en.wikipedia.org/wiki/Peer_exchange

• http://en.wikipedia.org/wiki/The_Pirate_Bay

• http://en.wikipedia.org/wiki/Secure_Hash_Algorithm

Karwan Jacksi22/04/2012

Fonte: Karwan Jacksi