5.project Documentation
5.project Documentation
5.project Documentation
Project report in partial fulfillment of the requirement for the award of the
degree of
Bachelor of Technology
In
COMPUTER SCIENCE & ENGINEERING
Submitted By
University Area, Plot No. III – B/5, New Town, Action Area – III, Kolkata – 700160.
ACKNOWLEDGEMENT
We would like to take this opportunity to thank everyone whose cooperation and
encouragement throughout the ongoing course of this project remains invaluable to us.
We are sincerely grateful to our guides Prof. Sukalyan Goswami & Prof. Sumit Anand of the
Department of Computer Science & Engineering, UEM, Kolkata, for their wisdom, guidance and
inspiration that helped us to go through with this project and take it to where it stands now.
We would also like to express our sincere gratitude to Prof. Sukalyan Goswami, HOD,
Computer Science & engineering, UEM, Kolkata and all other departmental faculties for their
ever-present assistant and encouragement.
Last but not the least, we would like to extend our warm regards to our families and peers who
have kept supporting us and always had faith in our work.
Anirban Chowdhury
Arpan Maity
Archan Biswas
Monojeet Jana
TABLE OF CONTENTS
ABSTRACT` Page 1
In a pipelined cloud system having system caching enabled and task-monitor in gateways to
map exit functions of network relays. The distributed traffic handlers encrypt and decrypt relay
traffic data with immediate effect , the pipeline is generated in runtime for the user to interact
with the containers that store the data for the web app using certain apis , the file system is
mounted for local data transfer during login session and unmounted as soon as the session
expires , the tags ensure that data is being carried by the user to the service end and not others
, tags are issued by services at cloud end as a token each time user interacts. Processing of
threads and datas are being extended to multicores if additional resource is required to help
achieve data parallelism. The cluster orchestration mechanism deployed using K8s interacts
with an abstraction layer before reaching the containers. The containers have Object blocks with
tag link maintaining mechanism to switch data blocks within a container cluster to help prevent
data breach.
INTRODUCTION
To understand the working and necessity of such works we need to first understand the cloud
system as a whole its uses , varieties the benefits and problems associated with it ,which will
help us to develop new things.
Although this project started off with a bare problem that we face daily , which is data leak and
high latency for data transferring/processing operations , we proposed few hypothesis for
solving the problems which we are developing into theorems and products .
Starting off with the cloud/cloud computing , cloud computing simply refers to storing/accessing
data from storage devices over the internet.
Cloud services are mainly categorized as per
● Infrastructure as a service (IaaS)
● Platform as a service (PaaS)
● Software as a service (SaaS)
Understanding how the cloud works, basically all the resources your program needs are held
"somewhere" on the internet. You interact with them via a defined service contract; SOAP,
REST, POX or whatever and what happens after that is up to the service provider. You don't
care about how your information is stored or how the service is provided, just that it is.
If, for example, you wanted to store files, you may choose to use Google’s storage system. You
connect to the service and upload your files; you don't know or care where the files are stored,
only the location of the entry point to that service.
If you have an application then it may also be ran in the cloud, assuming it's suitable. Live Mesh
for example is a virtual machine which you can code against and run your software both locally
and within the cloud, so your user simply goes to a URL and finds your program, you don't care
where it is beyond it being available somewhere on the cloud.
Diving deep into the mechanism we get to see the services that are responsible for the cloud
operations.
We are using Vsftpd/ProFTPd for creating a FTPS server system with SSL/TLS binding , the
Client is root encrypted unix system with a python client having embedded SSL/TLS protocols.
The Server is running as a virtual instance in an Amazon EC2 system with public DNS and
Elastic IP bindings, the server runs an Ubuntu system over virtualization to help isolate the
environment. All the modules running in the server clusters are orchestrated using a flavor of
kubernetes. The storages are isolated disk drives in a remote server with no random access
and with endpoint gateway security having single access and retrieving pathway for every
userspace.
LITERATURE SURVEY
Although several attempts have been made to ensure the cloud data transaction remains safe
and less complex , it fails sometime , attempts have been made in terms of technology.
The publication of Homomorphic Encryption Applied to the cloud and internet security highlights
two very important aspects of cloud computing , how to make sure the data clients are storing in
the cloud is always secure and how to keep client identity confidential even to the cloud provider
for the sake of anonymity.
Their proposal was simple - to encrypt data before sending it to the cloud provider, but to
execute the calculations the data should be decrypted every time we need to work on it. Until
now it was impossible to encrypt data and to trust a third party to keep them safe and able to
perform distant calculations on them. So to allow the Cloud provider to perform the operations
on encrypted data without decrypting them requires using the cryptosystems based on
Homomorphic Encryption.
Definition: An encryption is homomorphic, if: from Enc(a) and Enc(b) it is possible to compute
Enc(f (a, b)), where f can be: +, ×, ⊕ and without using the private key. Among the Homomorphic
encryption we distinguish, according to the operations that allows to assess on raw data, the
additive Homomorphic encryption (only additions of the raw data) is the Paillier and
Goldwasser-Micali cryptosystems, and the multiplicative Homomorphic encryption (only
products on raw data) is the RSA and Elgamal cryptosystems.
Similar works have been done to ensure safe storage and retrieval but still the leaks happen ,
considering the Tor projects by the us defence and mozilla has some pretty noticeable faults in
their data transferring system and network configuration for example
The cryptosystems poses further challenges to the already sluggish architecture like latency ,
huge overhead , large concurrent access leading to service denial.
As per statistics outdated libraries and caching system can cause high latency and can
bottleneck the whole system.Encryption key management for the complicated process is not a
failsafe against breaches and high latency , cause most of the time the keys are stored as
plaintext files in containers , cause keeping the keys encrypted in a virtual container poses
further delay in storing and retrieving data off a storage space.
Multichannel architecture is a problem as no proper orchestration for the Server and clients
uses Gateway routing and encryption standards , to add cream to the coffee they cache and
store every data that pass through their pipeline for analysis which is critical as it if revealed can
be used to leak not only the data from the cloud but personal records from the client end too.
TorPS
The Tor Path Simulator (TorPS) is a tool for efficiently simulating path selection in Tor. It
chooses circuits and assigns user streams to those circuits in the same way that Tor does.
TorPS is fast enough to perform thousands of simulations over periods of months, which can
not be used in case of predicting the tracks for immediate data transfer as it takes months of
time.
On the author's 2.5 GHz Intel Core 2 laptop, spiped operates at approximately 300 Mbps.
Redis Security
Redis has been designed for use within a trusted private network, and does not support SSL
encrypted connections. While some cloud providers offer private networks, not all of them do.
So if you want to run a Redis master on one server and your application on another, you have
no choice but to leave that connection unencrypted. Leaving that sensitive traffic to be sent
across the cloud providers network or even the general internet with no protection from
someone with a network sniffer.Redis is designed to be accessed by trusted clients inside
trusted environments. This means that usually it is not a good idea to expose the Redis instance
directly to the internet or, in general, to an environment where untrusted clients can directly
access the Redis TCP port or UNIX socket.For instance, in the common context of a web
application implemented using Redis as a database, cache, or messaging system, the clients
inside the front-end (web side) of the application will query Redis to generate pages or to
perform operations requested or triggered by the web application user.In this case, the web
application mediates access between Redis and untrusted clients (the user browsers accessing
the web application).This is a specific example, but, in general, untrusted access to Redis
should always be mediated by a layer implementing ACLs, validating user input, and deciding
what operations to perform against the Redis instance.
What is stunnel
The stunnel application is a SSL encryption wrapper that can tunnel unencrypted traffic (like
redis) through a SSL encrypted tunnel to another server. While stunnel adds SSL encryption it
does not guarantee 100% that the traffic will never be captured unencrypted. If an attacker was
able to compromise either the server or client server they could capture unencrypted local traffic
as it is being sent to stunnel.
To summarize: • Data is chunked and encrypted with DEKs • DEKs are encrypted with KEKs •
KEKs are stored in KMS • KMS is run on multiple machines in data centers globally • KMS keys
are wrapped with the KMS master key, which is stored in Root KMS • Root KMS is much
smaller than KMS and runs only on dedicated machines in each data center • Root KMS keys
are wrapped with the root KMS master key, which is stored in the root KMS master key
distributor • The root KMS master key distributor is a peer-to-peer infrastructure running
concurrently in RAM globally on dedicated machines; each gets its key material from other
running instances • If all instances of the distributor were to go down (total shutdown), a master
key is stored in (different) secure hardware in (physical) safes in limited Google locations. • The
root KMS master key distributor is currently being phased in, to replace a system that operated
in a similar manner but was not peer to peer.
Pico Computing has announced that it has achieved the highest-known benchmark speeds for
56-bit DES decryption, with reported throughput of over 280 billion keys per second achieved
using a single, hardware-accelerated server.
Hulton's DES cracking algorithm uses brute force methods to analyze the entire DES 56-bit
keyspace. The massively parallel algorithm iteratively decrypts fixed-size blocks of data to find
keys that decrypt into ASCII numbers. This technique is often used for recovering the keys of
encrypted files containing known types of data. The candidate keys that are found in this way
can then be more thoroughly tested to determine which candidate key is correct.
Such brute force attacks are computationally expensive and beyond the reach of software
algorithms running on standard servers or PCs, even when equipped with GPU accelerators.
According to Hulton, current-generation CPU cores can process approximately 16 million DES
key operations per second. A GPU card such as the GTX-295 can be programmed to process
approximately 250 million such operations per second.
The 56-bit Data Encryption Standard (DES) is now considered obsolete, having been replaced
by newer and more secure Advanced Encryption Standard (AES) encryption methods.
Nonetheless DES continues to serve an important role in cryptographic research, and in the
development and auditing of current and future block-based encryption algorithms.
When using a Pico FPGA cluster, however, each FPGA is able to perform 1.6 billion DES
operations per second. A cluster of 176 FPGAs, installed into a single server using standard
PCI Express slots, is capable of processing more than 280 billion DES operations per second.
This means that a key recovery that would take years to perform on a PC, even with GPU
acceleration, could be accomplished in less than three days on the FPGA cluster.
"Our research efforts in cryptography and our real-world customer deployments have given us
unique insights into parallel computing methods for other domains, including genomics and
simulations," added Pico Computing's Robert Trout. "The use of an FPGA cluster greatly
reduces the number of CPUs in the system, increases computing efficiency and allows the
system to be scaled up to keep pace with the data being processed."
Block Storage
Block storage devices provide fixed-sized raw storage capacity. Each storage volume can be
treated as an independent disk drive and controlled by an external server operating system.
This block device can be mounted by the guest operating system as if it were a physical disk.
The most common examples of Block Storage are SAN, iSCSI, and local disks.
Block storage is the most commonly used storage type for most applications. It can be either
locally or network-attached and are typically formatted with a file system like FAT32, NTFS,
EXT3, and EXT4.
Use cases
● Ideal for databases, since a DB requires consistent I/O performance and low-latency
connectivity.
● Use block storage for RAID Volumes, where you combine multiple disks organized
through striping or mirroring.
● Any application which requires service side processing, like Java, PHP, and .Net will
require block storage.
● Running mission-critical applications like Oracle, SAP, Microsoft Exchange, and
Microsoft SharePoint.
Block storage volumes can only be accessed when they’re attached to an operating system. But
data kept on object storage devices, which consist of the object data and metadata, can be
accessed directly through APIs or http/https.
PROBLEM STATEMENT
The Problem statement has been subdivided into categories based on cloud topology ,
necessary functions and related dependencies.
1. Understanding the traffic flow through a system using sniffing tools between clients and
servers. Modelling the traffic with tailed variables is not efficient but can be achieved if
fractal sets are used for node generation.
2. Pipeline for data transmission has to be secured with encryption without faltering the
latency of the medium , standard pipeline encryption and processing wont help cause
the process responsible for crypto operations are expected to return immediate results
without considering the bandwidth/data congestion.
3. Distributing data processes over cores by refactoring process threads to reduce server
workload.
4. The kernel Crypto-Api in linux does not include garbling module for encrypting
computation, which has to be developed for wrapping up client tasks over the primary
memory.
5. Devising a syncing and storage mechanism to store data for processing without affecting
data traffic,during sessional logins
6. Storage systems in cloud do provide with encryption standards but the key stays with
them , some provides with no rest- encryption while some do , the ones providing it
bears a huge process overlay to handle the crypto operations for it , which needs to be
solved.
SOLUTION
The first part was developing a FTPS server on a remote machine , we did it with vsftpd on a
linux instance created over LXD running off virtually on an Amazon EC2 Instance, with
encrypted root and storage for server config and platform security.
We decided using Vsftpd over ProFTPd for creating a FTPS server system with SSL/TLS
binding , the Client is root encrypted unix system with a python client having embedded
SSL/TLS protocols.
The Server is running as a virtual instance in an Amazon EC2 system with public DNS and
Elastic IP bindings, the server runs an Ubuntu system over virtualization to help isolate the
environment. All the modules running in the server clusters are orchestrated using a flavor of
kubernetes. The storages are isolated disk drives in a remote server with no random access
and with endpoint gateway security having single access and retrieving pathway for every
userspace.
The Client software was written in python using ftplib and pyftpdlib for ftp access over SSL/TLS
secure channel , and that was not enough so we decided to compress data while transferring it
to the remote FTPS server and decompress it while downloading , we used the DEFLATE
compression algorithm The deflate compressor is given a great deal of flexibility as to how to
compress the data. The programmer must deal with the problem of designing smart algorithms
to make the right choices, but the compressor does have choices about how to compress data.
There are three modes of compression that the compressor has available:
● Not compressed at all. This is an intelligent choice for, say, data that's already been
compressed. Data stored in this mode will expand slightly, but not by as much as it
would if it were already compressed and one of the other compression methods was
tried upon it.
● Compression, first with LZ77 and then with Huffman coding. The trees that are used to
compress in this mode are defined by the Deflate specification itself, and so no extra
space needs to be taken to store those trees.
● Compression, first with LZ77 and then with Huffman coding with trees that the
compressor creates and stores along with the data.
The data is broken up in ``blocks,'' and each block uses a single mode of compression. If the
compressor wants to switch from non-compressed storage to compression with the trees
defined by the specification, or to compression with specified Huffman trees, or to compression
with a different pair of Huffman trees, the current block must be ended and a new one begun.
The FTPS Server uses SSH with RSA 2048 key sharing for establishing a connection with the
S3 storage server running on a virtual environment with user policies for accessing the S3
bucket for individual FTPS server users.
s3fs allows Linux and macOS to mount an S3 bucket via FUSE. s3fs preserves the native object
format for files, allowing use of other tools like AWS CLI.
The s3fs runs natively on the FTPS server where the user Mounts a partition in runtime while
they log in using the client , the files are cached locally and is then synced to the remote storage
server as encrypted objects using Boto3. Boto is the Amazon Web Services (AWS) SDK for
Python. It enables Python developers to create, configure, and manage AWS services, such as
EC2 and S3. Boto provides an easy to use, object-oriented API, as well as low-level access to
AWS services.
The remote bucket storage session stays active till the partition stays mounted and data transfer
takes place it disconnects from the server as soon as the client ends session, reducing risk of
middle man attack.
The Amazons S3 bucket system helps ensure data stability over Block storage
Features
And as the files are already getting compressed before storing it on either local storage cache or
the remote storage , with nested compression , data cannot be spoofed in middle of transfers
via any other tools without corrupting the data , furthermore the object storage ensures that the
data stays encrypted in the remote server as whole objects and the server only maintains
access with it , no third party or even the cloud gets access to the files in the storage server.
SETUP AND RESULT
The server setup was done both locally and over virtual instances in Amazon's aws EC2 over
ubuntu linux distro , the server configuration was done manually using shell scripting and python
CONCLUSION
The project was tested in both ideal and real life scenarios and can be produced as a product as
an alternative to the other cloud storage environments with high stability and secure model
Most of the problems was looked into but further work on developing a pattern recognition for
the data traffic can be done based on this model to help visualize the dataflow and analyze the
traffics loads
The first problem statement can be addressed using application of fractals , internet traffic tends
to show a recursive fractal pattern. Without an appropriate model for the Internet traffic it is
impossible to obtain the insight that is required to efficiently plan, manage or operate a network
to render a satisfactory quality of service to the users. Although parsimony in models is desired
the Internet traffic, due to its complicated structure and stochastic behaviour, may require a
number of parameters for the characteristics that specify its behaviour.
Multi Process pipeline for data striping and storing is the next target of this whole project and
can be achieved by GPU coding.
GPU enabled encryption and decryption based on cuda for significant performance gain can be
looked into but GPU variations and legacy devices poses a challenge to this idea.
A python library for autonomous creation of File storage server is a much interested topic for us
to look into based on the fact that we used library that can be used to gain significant
performance over regular python libraries.
BIBLIOGRAPHY
https://www.statista.com/statistics/321215/global-consumer-cloud-computing-users/
https://ijarcce.com/wp-content/uploads/2016/10/IJARCCE-88.pdf
https://www.researchgate.net/profile/Abdellatif_El_Ghazi/publication/261083917_Homomorphic
_encryption_method_applied_to_Cloud_Computing/links/59c4cc42458515548f28738d/Homom
orphic-encryption-method-applied-to-Cloud-Computing.pdf
https://patents.google.com/patent/US7401154B2/en
https://www.cs.columbia.edu/~angelos/Papers/2013/cloudfence.pdf
http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.96.6391&rep=rep1&type=pdf
http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.146.3693&rep=rep1&type=pdf
http://keddiyan.com/files/AHCI/week2/8.pdf
http://xuekecn.com/FileUpload/images/2014/05/11/12050808029772.pdf
https://s3.amazonaws.com/academia.edu.documents/27476239/10.pdf?AWSAccessKeyId=AKI
AIWOWYYGZ2Y53UL3A&Expires=1543265274&Signature=YUr5cnJibyIRStBa4Pd%2FMDvCG
Gg%3D&response-content-disposition=inline%3B%20filename%3DA_survey_on_security_issu
es_in_service_d.pdf
https://s3.amazonaws.com/academia.edu.documents/30867268/281.pdf?AWSAccessKeyId=AK
IAIWOWYYGZ2Y53UL3A&Expires=1543265232&Signature=%2BtaTHoYBvQarBFavMyiEzCiY
NLI%3D&response-content-disposition=inline%3B%20filename%3DEnabling_Public_Verifiabilit
y_and_Data_D.pdf
https://pdfs.semanticscholar.org/58b1/d75e64cafee0609bc2e11187874ac6f405e8.pdf
http://www.pnas.org/content/pnas/99/21/13382.full.pdf
https://patentimages.storage.googleapis.com/68/7d/ec/42d03c146484d4/US6944603.pdf
https://ieeexplore.ieee.org/abstract/document/1458100
https://patents.google.com/patent/US6842422B1/en
https://crypto.stackexchange.com/questions/37991/what-exactly-is-a-garbled-circuit eI