Info Hash Torrent Searching Technique - CE55

Download as pdf or txt
Download as pdf or txt
You are on page 1of 3

JOURNAL OF INFORMATION, KNOWLEDGE AND RESEARCH IN

COMPUTER ENGINEERING

INFO HASH TORRENT SEARCHING TECHNIQUE


1
SURAJIT KARMAKAR, 2 POOJA MEHTA, 3 VAIBHAVI SHAH, 4 HARDIK SOMAIYA

Department Of Computer Engineering


1
K. J. Somaiya Institute of Engineering and Information Technology, Sion, Mumbai.
2
Shah and Anchor Kutchhi Engineering College, Chembur, Mumbai.
3, 4
K. J. Somaiya College of Engineering, Vidyavihar, Mumbai.

[email protected], [email protected],
[email protected], [email protected]

ABSTRACT: In this paper, we look closely at the BitTorrent P2P protocol. We extract problems that have
already been studied from the protocol and discuss those problems. We propose a system for efficient searching
which indexes torrents from multiple sources so that the users can have access to a large number of torrents
from a single source.

KEYWORDS: Peer-to-Peer, BitTorrent, SHA-1, hash, database, caching.

1. INTRODUCTION higher bandwidth connection they will be limited to a


lower bandwidth because of the lower bandwidth of
Peer-to-peer networking, often referred to as the sender leading to waste of time and bandwidth.
P2P, is perhaps one of the most useful and yet To share any file the sender must be online and while
misunderstood technologies emerging in recent years. in offline phase, transmission of files in not possible
When people think of P2P they usually think of one with Direct Connect (DC).
thing: sharing music files, often illegally. This is
because file-sharing applications such as BitTorrent The BitTorrent protocol is peer-to-peer in
have risen in popularity at a staggering rate and these nature, its innovative approach in the beginning, was
applications use P2P technology to work. Although due to not be centered about the creation a real
P2P is used in file-sharing applications, that doesn’t distributed network but around the specific shared
mean it doesn’t have other applications. Indeed, as resources, in this case files, preferably large files, as
you see in this paper, P2P can be used for a vast array users connect to each other directly to send and
of applications, and is becoming more and more receive portions of a large file from other peers who
important in the interconnected world in which we have also downloaded either the file or parts of the it.
live. The two protocols of P2P networks are: These pieces are then reassembled into the full file.
Each downloader reports to all of its peers what
1. Direct Connect Protocol pieces it has. To verify data integrity, the SHA1
2. BitTorrent protocol hashes of all the pieces are included in the .torrent
file, and peers don’t report that they have a piece until
Direct connect clients connect to a they’ve checked the hash. Since the users are
central hub and can download files directly from one downloading from each other and not from one
another. Hubs feature a list of clients or users central server, the bandwidth load of downloading
connected to them. Users can search large files is divided between the many sources that
for files and download them from other clients, as the user is downloading from. This decreases the
well as chat with other users. It is a text-based bandwidth cost for people hosting large files, and
increases the download speeds for the people
computer protocol, in which commands and their
downloading large files, because the protocol makes
information are sent in clear text. As clients connect use of the upstream bandwidth of every downloader
to a central source of distribution (the hub) of to increase the effectiveness of the distribution as a
information, the hub is required to have a substantial whole, and to gain advantage on the part of the
amount of upload bandwidth available. downloader. However, there is a central server (called
a tracker) which coordinates the action of all such
The biggest disadvantage is that while downloading peers. The tracker only manages connections, it does
from public hubs, although the receiver might have a not have any knowledge of the contents of the files

ISSN: 0975 – 6760| NOV 12 TO OCT 13 | VOLUME – 02, ISSUE – 02 Page 273
JOURNAL OF INFORMATION, KNOWLEDGE AND RESEARCH IN
COMPUTER ENGINEERING
being distributed, and therefore a large number of but also from the PEX implementation, creating
users can be supported with relatively limited tracker something like a distributed Database of shared
bandwidth. By reducing dependency on a centralized torrents acting as backup tracker when all other
tracker, PEX increases the speed, efficiency, and trackers are down or can't deliver enough peers, as
robustness of the BitTorrent protocol. well as enabling trackerless torrents. The DHT acts
Within BitTorrent, a torrent file is a computer and is added to torrents as a pseudo-tracker if the
file that contains metadata about the files to be shared client has the option enabled and DHT trackers can
and about the tracker, the computer that coordinates be enabled and disabled per torrent just like regular
the file distribution. A seeder is a client that has a trackers. Clients using this permanent DHT tracking
complete copy of the torrent and still offers it for are now a fully connected decentralized P2P network,
upload. The more seeders there are, the better the they enter the DHT as a new node, this of course
chances of getting a higher download speed. makes it necessary for private trackers (or non-public
A downloader/leecher is any peer that does not have distributions) to exclude themselves from the
the entire file and is downloading the file. Bram participating.
chose the term downloader over leech because
BitTorrent's tit-for-tat ensures downloaders also Magnet links:
upload and thus do not unfairly qualify as leeches. The Magnet URI scheme refers to
With the adoption of DHT (Distributed Hash Tables) resources available for download via peer-to-
the BitTorrent protocol starts to become more that a peer networks. Such a link typically identifies a file
semi-centralized distribution network around a single not by location, but by content more precisely, by the
resource, it becomes more decentralized and removes content’s cryptographic hash value. Although it could
the static point of control, the tracker, this is done by be used for other applications, it is particularly useful
relying in DHTs and the use of the PEX extension. in a peer-to-peer context, because it allows resources
Enabling the volatile Peer to operate also as a tracker, to be referred to without the need for a continuously
but even if this addressed the need for static tracker available host. Traditionally, .torrent files are
servers, there is still a centralization of the network downloaded from torrent sites. But several clients
around the content. Peers don't have any default also support the Magnet URI scheme. A magnet link
ability to contact each other outside of that context. can provide not only the torrent hash needed to seek
the needed nodes sharing the file in the DHT, but
may include a tracker for the file.

The attributes of BitTorrent are Web seeds,


PEX , Global and local connections, Tracker URL,
Piece hash values, Info hash, File length, Piece
length, Bencode where

Bencode is the encoding used by the P2P file


sharing system BitTorrent for storing and transmitting
loosely structured data. It supports four different
types of values: byte strings, integers, lists and
dictionaries (associative arrays). Bencoding is most
commonly used in torrent files. These metadata files
are simply bencoded dictionaries.

Message digest:
A Message Digest is a digitally created hash
(fingerprint) created from a plaintext block. All the
information of the message is used to construct the
Message Digest hash, but the message cannot be
recovered from the hash. For this reason, Message
Digests are also known as one way hash functions.

SHA-1:
2. CHARACTERISTICS OF BITTORRENT
SHA-1 is the most widely used of the
existing SHA hash functions, and is employed in
Permanent DHT tracking:
several widely used applications and protocols. SHA-
With the PEX implementation and reliance
1 produces a 160-bit message digest. SHA-1 and
on the distributed hash table (DHT), the evolution
SHA-2 are the secure hash algorithms required by
into creating a real P2P overlay network that is
law for use in certain U.S. Government applications,
completely serverless was the next logical step. The
including use within other cryptographic algorithms
DHT will take information not only from old trackers

ISSN: 0975 – 6760| NOV 12 TO OCT 13 | VOLUME – 02, ISSUE – 02 Page 274
JOURNAL OF INFORMATION, KNOWLEDGE AND RESEARCH IN
COMPUTER ENGINEERING
and protocols, for the protection of sensitive server and return results that are related to the search
unclassified information. string specified to the user. If implementing as a web
based system, the system can accept a search string
Disadvantages:
from the client and return the results to the client
The main disadvantage of the BitTorrent
network is that many of the torrents are not accessible browser.
to the users participating in the file sharing process.
There is no single place to have access to all the Through this system, the users are exposed
torrents in the system. The websites that host or cache and made accessible to a large number of torrents on
the torrent files have some restriction or there is some the network through which they can share more data
inefficiency to index all files. and it is accessible to a large number of users in the
BitTorrent network.
3. PROPOSED SYSTEM
Advantages:
The major disadvantage in the whole This proposed system will allow a user to
BitTorrent system is that there is no access to all of access any torrent uploaded on a website not familiar
the torrents available and thus there is not much with the user, making its major advantage of
sharing among the peers. Although there are some accessing any remote torrent and this will create an
hamsters/ bots that collect the torrent information efficient system for the required search.
from a considerable number of websites which host
the torrent files, there is a limitation to this. Another Also, a torrent uploaded on multiple sites
way is the use of torrent caching sites which cache will be shown as a single result in our proposed
the torrent files on their servers and are accessible system, unlike other search engines, which provide
only through their hash. There exists many torrent multiple results for a single torrent.
sites that provide torrent cache, but one cannot search
through them until they have the hash for the torrent 4. CONCLUSION
In this paper, we have clearly presented the
they want. This becomes very much inconvenient for
terms and characteristics all of the BitTorrent
a naive user to search through these sites. One way is
protocol. The disadvantages of the direct connect
to map info hash values of each torrent with the name
protocol are covered in the BitTorrent protocol, still
of the torrent by parsing the torrent file. The hash
as every coin has two sides, the BitTorrent protocol
would be mapped with the torrent names along with a
also must be having its disadvantages. As we can see
set of URLs and magnet links from where the torrent
above in this paper, our proposed system gives access
files can be downloaded and store them in a database
to large number of torrents that might not be
from where the user would be able to search for
accessible from familiar websites, thus allowing an
torrents using the name of the torrent. This can be
efficient torrent searching for everyone including the
implemented in client software where it will interact
naive users too.
with the database on the server or a web based search.
5. REFERENCES
The pre-requisite for such a system would be
a strong database capable of handling a large number [1] Bram Cohen, The BitTorrent Protocol
of records at a given point of time, higher bandwidth Specification
internet connection (possibly the bandwidth of a http://www.bittorrent.org/beps/bep_0003.html
server), and a little bit knowledge of the BitTorrent [2] http://wiki.theory.org/BitTorrentSpecification#
protocol. Bencoding
[3] http://en.wikipedia.org/wiki/Glossary_of_BitT
The database can first be populated by orrent_terms
mapping the hash value of the torrents and their other [4] John Hoffman, HTTP Seeding
http://www.bittorrent.org/beps/bep_0017.html
key properties and inserting these records into the
[5] J.A. Pouwelse, P. Garbacki, D.H.J. Epema,
database. After this step, a search function needs to H.J. Sips, The Bittorrent P2P File-Sharing
be implemented that can search the database related System: Measurements And Analysis
to the keywords specified by the user returning the http://www.cs.unibo.it/babaoglu/courses/cas04
links where the torrent file can be downloaded by the -05/papers/bittorrent.pdf
user. If implementing this system as a standalone [6] http://en.wikipedia.org/wiki/Comparison_of_B
itTorrent_sites
application software, the software may accept a
search string from the user, query the database on the

ISSN: 0975 – 6760| NOV 12 TO OCT 13 | VOLUME – 02, ISSUE – 02 Page 275

You might also like