Ghost Imputation Project

GHOST IMPUTATION PROJECT
Project report submitted in partial fulfilments of the requirements of the degree
Of
BACHELOR OF SCIENCE IN COMPUTER SCIENCE

By:-
SNEHA KAUSHAL
Examination Roll no: UBS18CSC-001
RAHUL BARIHA
Examination Roll no: UBS18CSC-004
School of Computer Science & IT
GANGADHAR MEHER UNIVERSITY, AMRUTA VIHAR

SAMBALPUR, ODISHA
CERTIFICATE
It is certified that the work contained in this report titled “GHOST IMPUTATION PROJECT”
original work done by SNEHA KAUSHAL(UBS18CSC-001), RAHUL BARIHA(UBS18CSC-004)
and has been carried out under my supervision.
________________________
Madhusmita Panda
Head of the Department
(Project Supervisor)
GANGADHAR MEHER UNIVERSITY, AMRUTA VIHAR SAMBALPUR
DECLARATION
We hereby declare that the work which is being presented in this project entitled, “GHOST
IMPUTATION PROJECT” submitted to GANGADHAR MEHER UNIVERSITY,
AMRUTA VIHAR SAMBALPUR in the partial fulfilment of the requirements for the degree
of Bachelor of Science in Computer Science, is an authentic record of our own work carried out
under the supervision of Dr. Madhusmita Panda .The matter embodied in this project report has
not been submitted by us for the award of any other degree.
SNEHA KAUSHAL(UBS18CSC-001)
RAHUL BARIHA(UBS18CSC-004)
Date:
Place:
ACKNOWLEDGEMENT
The success and final outcome of this project required a lot of guidance and
assistance from many people. We are extremely fortunate to have this all
along the completion of our project work. Whatever we have done is only due
to such guidance and assistance and we would not forget to thank them. First
of all, we would like to thank the university for giving us such a great
opportunity in carving ourselves technically. The Vice Chancellor Prof. Atanu
kumar Pati has supported us in every way possible. Second of all, we show our
gratitude for the Head of the Department Dr. Madhusmita Panda for
motivating and encouraging us every time. We are extremely grateful and
remain indebted to our guide Dr. Madhusmita Panda for being a source of
inspiration and for their constant support in the Design, Implementation and
Evaluation of the project. We are thankful to them for their constant
constructive criticism and invaluable suggestions, which benefited us a lot
while developing the project, “GHOST IMPUTATION PROJECT ” . They have
been a constant source of inspiration and motivation for hard work. They have
been very co-operative throughout this project work.
Sneha Kaushal (UBS18CSC001)

Rahul Bariha (UBS18CSC004)
ABSTRACT
Noise and missing data are intrinsic characteristics of real-world data, leading to uncertainty
that negatively affects the quality of knowledge extracted from the data. The burden imposed
by missing data is often severe in sensors that collect data from the physical world, where
large gaps of missing data may occur when the system is temporarily off or disconnected.
How can we reconstruct missing data for these periods? We introduce an accurate and
efficient algorithm for missing data reconstruction (imputation), that is specifically designed
to recover off-period segments of missing data. This algorithm, Ghost, searches the
sequential dataset to find data segments that have a prior and posterior segment that matches
those of the missing data. If there is a similar segment that also satisfies the constraint – such
as location or time of day – then it is substituted for the missing data. A baseline approach
results in quadratic computational complexity, therefore we introduce a caching approach that
reduces the search space and improves the computational complexity to linear in the common
case.
Introduction
With recent technological advances and increases in computing capabilities, data
intensive scientific discovery is being widely used. This has led to the introduction of
methods for analyzing data collected from multiple sources of information, i.e., “multivariate
data” 1. One of the inevitable challenges of real-world data analysis is uncertainty rising from
noise and missing data. This uncertainty negatively affects the quality of knowledge extracted
from the data. Indeed, the burden imposed by missing data is often severe in applications
collecting data from the physical world, e.g., mobile sensing [33] or genome sequencing
[6].For example, consider battery powered devices, such as smart watches, equipped with
inexpensive sensors such as ambient light and accelerometer. Due to sensor quality, battery
limits, and user preferences, context-sensing application scan not continuously and
effectively collect data [33]and there are often segments of missing data, e.g., the device is
turned off. These missing segments affect the quality of knowledge-extraction methods [31].
Although, missing data reconstruction is an important requirement of these system, it has not
received much attention. There are longstanding efforts in statistics [10], [40], [43] to
reconstruct missing data. These imputation methods assume the missing data points occur at
random, i.e., missing at random (MAR) or missing completely at random (MCAR).
If the data is missing not at random (MNAR) [40], the imputation process is more
challenging. In this paper we propose a novel algorithm for imputation of multivariate sensor
data. This algorithm only uses (i) a constraint such as time of the day or location (ii) the data
values immediately prior to the missing event, and (iii) thedata values immediately following
the missing event. Since our method does not rely on statistical methods, it might be able to
handle some MNAR data, but only if a similar segment exists in the dataset. In particular, our
algorithm operates on multivariate and sequential data streams. It reads two adjacent data
segments–one before and one after the missing data (missing segment) –and searches the
dataset to find two segments similar to the adjacent segments of the missing segment. If
segment between these two similar segments is of the same length as missing segment, it is a
candidate recovery segment. Next, if the constraint values of the segment of interest matches
the constraint values of the missing segment, the algorithm substitutes the missing segment
with the content of this candidate recovery segment. A naïve approach imposes a quadratic
computational complexity, so we add a pre-processing step that reads all data segments and
their indexes into a cache, achieving a linear computational complexity in the common case.
The characteristics and contributions of our algorithm2 areas follows.2. To allow full
reproducibility of our claims, we share the algorithm and its implementation with the
community; all of our code and links to acquire access to the datasets are available
online:https://sites.google.com/view/ghostimputation.
Heterogeneous Multivariate Real-World Data: Statistical imputation approaches are
optimized to handle numerical data [40], [43]. Real-world systems, however, produce data in
numerical, categorical, or binary forms. Our algorithm relies on a categorical abstraction of
the original data by converting data values to categorical symbols, e.g., by bucketing numeric
data into categories. Therefore, in contrast to statistical-based imputation, any data type,
regard less of its distribution, can be fed into this algorithm, i.e. non parametric imputation.
All datasets we employed in this work are real-world datasets and most of them (wearable,
mobile, IoT, and news media datasets) have not previously been used for imputation studies.
We recommend to use this algorithm mainly for multivariate sensor data used in consumer
IoT and mobile devices, but to demonstrate its versatility, we experiment it with two other
real-world datasets as well (clinical data and real estate data).
Instance based Learning: Inspired from algorithms that learn from a single labelled instance,
our algorithm tries to estimate the missing data from the first similar instance that can be
found in a sequential search of the dataset. Clearly, relying on a single label (similar instance)
is prone to false positive errors. Instead, we rely on a constraint, as a controlling variable that
significantly reduces false positives. Our definition of constraint is inspired by binary
constraint in constraint satisfaction problems (CSP), but, unlike traditional CSP it is not used
to reduce the search space.
Search Space: Continuously collecting and storing data can be expensive in terms of resource
usage, especially in battery-powered wireless devices. Data is typically not stored long-term
on these devices, and most data processing is conducted in cloud servers. Our algorithm can
reconstruct the missing data merely by finding the first match for the missing segment
without a need to search the entire dataset. For instance, we have used only three days for a
smart phone dataset and only seven days for a smart watch dataset. Their datasets are fairly
small, but in both of these examples, our algorithm outperforms state-of the-art algorithms
and reconstructs the missing data with higher accuracy. Note that all versions of our
algorithm, i.e. baseline and cache based ones, have only one efficient parameter and itis the
window size. In the evaluation section we identify an optimal window size value for each
dataset. Therefore, based on the target dataset (or application), this parameter could be
assigned automatically and there is no need for the user, who has a domain knowledge, to
decide about its optimal value. There is another parameter used for tolerating slight
dissimilarity, we will demonstrate why users should not tolerate dissimilarity.
SCOPE
Heterogeneous Multivariate Real-World Data: Statistical imputation approaches are
optimized to handle numerical data. Real-world systems, however, produce data in
numerical, categorical, or binary forms. Our algorithm relies on a categorical abstraction of
the original data by converting data values to categorical symbols, e.g., by bucketing numeric
data into categories. Therefore, in contrast to statistical-based imputation, any data type,
regardless of its distribution, can be fed into this algorithm, i.e. nonparametric imputation. All
datasets we employed in this work are real-world datasets and most of them (wearable,
mobile, IoT, and news media datasets) have not previously been used for imputation studies.
We recommend to use this algorithm mainly for multivariate sensor data used in consumer
IoT and mobile devices, but to demonstrate its versatility, we experiment it with two other
real-world datasets as well (clinical data and real estate data).
Our algorithm can reconstruct the missing data merely by finding the first match for the
missing segment without a need to search the entire dataset. For instance, we have used only
three days for a smart phone dataset and only seven days for a smart watch dataset. Their
datasets are fairly small, but in both of these examples, our algorithm outperforms state-of
the-art algorithms and reconstructs the missing data with higher accuracy.
System Requirements Specification
Introduction
A Software Requirements Specification (SRS) – a requirements specification
for a software system – is a complete description of the behaviour of a system to be
developed. It includes a set of use cases that describe all the interactions the users will have
with the software. In addition to use cases, the SRS also contains non-functional
requirements. Non-functional requirements are requirements which impose constraints on the
design or implementation (such as performance engineering requirements, quality standards,
or design constraints).
System requirements specification: A structured collection of information that embodies

the requirements of a system. A business analyst, sometimes titled system analyst, is
responsible for analyzing the business needs of their clients and stakeholders to help identify
business problems and propose solutions. Within the systems development life cycle domain,
typically performs a liaison function between the business side of an enterprise and the
information technology department or external service providers. Projects are subject to three
sorts of requirements:
 Business requirements describe in business terms what must be delivered or
accomplished to provide value.
 Product requirements describe properties of a system or product (which could be one of
Several ways to accomplish a set of business requirements.)
 Process requirements describe activities performed by the developing organization.
For instance, process requirements could specify specific methodologies that must be
followed, and constraints that the organization must obey.
Product and process requirements are closely linked. Process requirements often specify the
activities that will be performed to satisfy a product requirement. For example, a maximum
development cost requirement (a process requirement) may be imposed to help achieve a
maximum sales price requirement (a product requirement); a requirement that the product be
maintainable (a Product requirement) often is addressed by imposing requirements to follow
particular development styles
PURPOSE
A systems engineering, a requirement can be a description of what a system must

do, referred to as a Functional Requirement. This type of requirement specifies something
that the delivered system must be able to do. Another type of requirement specifies
something about the system itself, and how well it performs its functions. Such requirements
are often called Non-functional requirements, or 'performance requirements' or 'quality of
service requirements.' Examples of such requirements include usability, availability,
reliability, supportability, testability and maintainability.
A collection of requirements defines the characteristics or features of the desired system. A

'good' list of requirements as far as possible avoids saying how the system should implement
the requirements, leaving such decisions to the system designer. Specifying how the system
should be implemented is called "implementation bias" or "solution engineering". However,
implementation constraints on the solution may validly be expressed by the future owner, for
example for required interfaces to external systems; for interoperability with other systems;
and for commonality (e.g. of user interfaces) with other owned products.
In software engineering, the same meanings of requirements apply, except that the focus of
interest is the software itself.
FEASIBILITY STUDY
Preliminary investigation examine project feasibility, the likelihood the system

will be useful to the organization. The main objective of the feasibility study is to test the
Technical, Operational and Economical feasibility for adding new modules and debugging
old running system. All system is feasible if they are unlimited resources and infinite time.
There are aspects in the feasibility study portion of the preliminary investigation:
 Technical Feasibility
 Operational Feasibility
 Economical Feasibility
ECONOMIC FEASIBILITY
A system can be developed technically and that will be used if installed must still be a
good investment for the organization. In the economical feasibility, the development cost in
creating the system is evaluated against the ultimate benefit derived from the new systems.
Financial benefits must equal or exceed the costs.
The system is economically feasible. It does not require any addition hardware or
software. Since the interface for this system is developed using the existing resources and
technologies available at NIC, There is nominal expenditure and economical feasibility for
certain.
OPERATIONAL FEASIBILITY
Proposed projects are beneficial only if they can be turned out into information
system. That will meet the organization’s operating requirements. Operational feasibility
aspects of the project are to be taken as an important part of the project implementation.
Some of the important issues raised are to test the operational feasibility of a project includes
the following: -
 Is there sufficient support for the management from the users?

 Will the system be used and work properly if it is being developed and implemented?
 Will there be any resistance from the user that will undermine the possible application
benefits?
This system is targeted to be in accordance with the above-mentioned issues.
Beforehand, the management issues and user requirements have been taken into
consideration. So there is no question of resistance from the users that can undermine the
possible application benefits.
The well-planned design would ensure the optimal utilization of the computer resources and
would help in the improvement of performance status.
TECHNICAL FEASIBILITY
The technical issue usually raised during the feasibility stage of the investigation
includes the following:
 Does the necessary technology exist to do what is suggested?

 Do the proposed equipments have the technical capacity to hold the data required to use
the new system?
 Will the proposed system provide adequate response to inquiries, regardless of the
number or location of users?
 Can the system be upgraded if developed?
 Are there technical guarantees of accuracy, reliability, ease of access and data security?
Earlier no system existed to cater to the needs of ‘Secure Infrastructure
Implementation System’. The current system developed is technically feasible. It is a web
based user interface for audit workflow at NIC-CSD. Thus it provides an easy access to the
users. The database’s purpose is to create, establish and maintain a workflow among various
entities in order to facilitate all concerned users in their various capacities or roles.
Permission to the users would be granted based on the roles specified. Therefore, it provides
the technical guarantee of accuracy, reliability and security. The software and hard
requirements for the development of this project are not many and are already available in-
house at NIC or are available as free as open source. The work for the project is done with the
current equipment and existing software technology. Necessary bandwidth exists for
providing a fast feedback to the users irrespective of the number of users using the system.
Functional Requirements:
Admin
1. Home
2. View All Data Users
3. View In Active Data Users
4. Import Dataset
5. User Dataset Requests (Accept/Reject)
6. Logout
User
1. Home
2. Search Dataset Record , View, Recover the Data
3. Send request to Admin
4. Logout
Non-Functional Requirements:
The major non-functional requirements of the system are as follows
Usability:
The system is designed with completely automated process hence there is no or less user
intervention
Reliability:
The system is more reliable because of the qualities that are inherited
From the chosen platform in java. The code built by using java is more reliable.
Performance:
This system is developing in the high level languages and using the advanced front-end and
back-end technologies
It will give response to the end user on client system with in very less time
Chapter- 3
Detailed Design Document
3.1 Introduction
3.1.1 Purpose of this document
This document's purpose is to provide a high-level design framework around which to

build project tracking system. It also provides a list of requirements against which to test
the final project and determine whether we were able to successfully implement the
system according to design.
3.1.2 Overview of document
The System architecture description provides an overview of the system's major

components and architecture, as well as specifications on the interaction between the
system and the user. The Detailed description of components section will describe lower-
level classes, components, and functions, as well as the interaction between these
internal components.
3.2 ER DIAGRAM
ER diagrams are related to data structure diagrams (DSDs), which focus on the relationships of
elements within entities instead of relationships between entities themselves. ER diagrams also are
often used in conjunction with data flow diagrams (DFDs), which map out the flow of information for
processes or systems.
Entity Relation Diagram
E-R Diagram :-
3.3 DATA FLOW DIAGRAMS
A data flow diagram is a graphical representation or technique depicting information flow

and transform that are applied as data moved from input to output. The DFD are
partitioned into levels that represent increasing information flow and functional details.
The processes, data store, data flow, etc. are described in Data Dictionary.
Context Level Data Flow Diagrams

Admin Data Flow Diagram
3.4 SYSTEM ARCHITECTURE
3.5 DATA DICTIONARY
3.6 SEQUENCE DIAGRAM
Admin Sequence Diagram
User Sequence Diagram

3.7 ACTIVITY DIAGRAM
Admin Activity Diagram
User Activity Diagram
3.8 USECASE DIAGRAM
What is a UML Use Case Diagram?
Use case diagrams model the functionality of a system using actors and use cases. Use cases
are services or functions provided by the system to its users.
Basic Use Case Diagram Symbols and Notations
System
Draw your system's boundaries using a rectangle that contains use cases. Place actors outside
the system's boundaries.
Use Case
Draw use cases using ovals. Label with ovals with verbs that represent the system's functions.
Actors
Actors are the users of a system. When one system is the actor of another system, label the
actor system with the actor stereotype.
Relationships
Illustrate relationships between an actor and a use case with a simple line. For relationships
among use cases, use arrows labeled either "uses" or "extends." A "uses" relationship
indicates that one use case is needed by another in order to perform a task. An "extends"
relationship indicates alternative options under a certain use case.
Usecase Diagram
System Usecase Diagram
Admin Usecase Diagram

User Usecase Diagram
Chapter 4
4.1 Implementation (Screenshots)

4.1.1 Home Page
4.1.2 Login Page

4.1.3 Registration Page
4.1.4 After Registration

Admin::
4.1.5 Home
4.1.6 View All Active Data Users

4.1.7 View All InActive Data Users
4.1.8 User/Admin Registration as Admin

4.1.9 Import Dataset (Clinic Dataset)
4.1.10 View Dataset (Clinic pre-processing dataset)

4.1.11 User Dataset Requests as a Admin
User::
4.1.12 Home page
4.1.13 Request Dataset

4.1.14 After search dataset record
4.1.15 After click on recover button

4.1.16 After click on recover button
4.1.17 Logout
4.2 Database Table Design (Screenshots)
4.2.1 Tables
4.2.2 Tables With Data
users
4.2.3 dataset
4.2.4 reqdata
CHAPTER 5 TESTING & TESTCASES
5.1 TESTING
TESTING METHODOLOGIES
The following are the Testing Methodologies:
o Unit Testing.
o Integration Testing.
o User Acceptance Testing.
o Output Testing.
o Validation Testing.
5.1.1 Unit Testing

Unit testing focuses verification effort on the smallest unit of Software design that is
the module. Unit testing exercises specific paths in a module’s control structure to ensure
complete coverage and maximum error detection. This test focuses on each module
individually, ensuring that it functions properly as a unit. Hence, the naming is Unit Testing.
During this testing, each module is tested individually and the module interfaces are
verified for the consistency with design specification. All important processing path are
tested for the expected results. All error handling paths are also tested.
5.1.2 Integration Testing

Integration testing addresses the issues associated with the dual problems of
verification and program construction. After the software has been integrated a set of high
order tests are conducted. The main objective in this testing process is to take unit tested
modules and builds a program structure that has been dictated by design.
The following are the types of Integration Testing:
1)Top Down Integration

This method is an incremental approach to the construction of program structure.
Modules are integrated by moving downward through the control hierarchy, beginning with
the main program module. The module subordinates to the main program module are
incorporated into the structure in either a depth first or breadth first manner.
In this method, the software is tested from main module and individual stubs are
replaced when the test proceeds downwards.
2. Bottom-up Integration
This method begins the construction and testing with the modules at the lowest level
in the program structure. Since the modules are integrated from the bottom up, processing
required for modules subordinate to a given level is always available and the need for stubs is
eliminated. The bottom up integration strategy may be implemented with the following steps:
 The low-level modules are combined into clusters into clusters that perform a specific
Software sub-function.
 A driver (i.e.) the control program for testing is written to coordinate test case input
and output.
 The cluster is tested.
 Drivers are removed and clusters are combined moving upward in the program
structure
 The bottom up approaches tests each module individually and then each module is
module is integrated with a main module and tested for functionality.
5.1.3 User Acceptance Testing

User Acceptance of a system is the key factor for the success of any system. The
system under consideration is tested for user acceptance by constantly keeping in touch with
the prospective system users at the time of developing and making changes wherever
required. The system developed provides a friendly user interface that can easily be
understood even by a person who is new to the system.
5.1.4 Output Testing

After performing the validation testing, the next step is output testing of the proposed
system, since no system could be useful if it does not produce the required output in the
specified format. Asking the users about the format required by them tests the outputs
generated or displayed by the system under consideration. Hence the output format is
considered in 2 ways – one is on screen and another in printed format.
5.1.5 Validation Checking

Validation checks are performed on the following fields.
Text Field:
The text field can contain only the number of characters lesser than or equal to its
size. The text fields are alphanumeric in some tables and alphabetic in other tables. Incorrect
entry always flashes and error message.
Numeric Field:
The numeric field can contain only numbers from 0 to 9. An entry of any character
flashes an error messages. The individual modules are checked for accuracy and what it has
to perform. Each module is subjected to test run along with sample data. The individually
tested modules are integrated into a single system. Testing involves executing the real data
information is used in the program the existence of any program defect is inferred from the
output. The testing should be planned so that all the requirements are individually tested.
A successful test is one that gives out the defects for the inappropriate data and
produces and output revealing the errors in the system.
Preparation of Test Data
Taking various kinds of test data does the above testing. Preparation of test data plays
a vital role in the system testing. After preparing the test data the system under study is tested
using that test data. While testing the system by using test data errors are again uncovered
and corrected by using above testing steps and corrections are also noted for future use.
Using Live Test Data:
Live test data are those that are actually extracted from organization files. After a system is
partially constructed, programmers or analysts often ask users to key in a set of data from their normal
activities. Then, the systems person uses this data as a way to partially test the system. In other
instances, programmers or analysts extract a set of live data from the files and have them entered
themselves.
It is difficult to obtain live data in sufficient amounts to conduct extensive testing.
And, although it is realistic data that will show how the system will perform for the typical
processing requirement, assuming that the live data entered are in fact typical, such data
generally will not test all combinations or formats that can enter the system. This bias toward
typical values then does not provide a true systems test and in fact ignores the cases most
likely to cause system failure.
Using Artificial Test Data:
Artificial test data are created solely for test purposes, since they can be generated to test all
combinations of formats and values. In other words, the artificial data, which can quickly be
prepared by a data generating utility program in the information systems department, make
possible the testing of all login and control paths through the program.
The most effective test programs use artificial test data generated by persons other than those
who wrote the programs. Often, an independent team of testers formulates a testing plan,
using the systems specifications.
The package “Virtual Private Network” has satisfied all the requirements specified as per
software requirement specification and was accepted.
12Whenever a new system is developed, user training is required to educate them
about the working of the system so that it can be put to efficient use by those for whom the
system has been primarily designed. For this purpose the normal working of the project was
demonstrated to the prospective users. Its working is easily understandable and since the
expected users are people who have good knowledge of computers, the use of this system is
very easy.
MAINTAINENCE
This covers a wide range of activities including correcting code and design errors. To reduce
the need for maintenance in the long run, we have more accurately defined the user’s
requirements during the process of system development. Depending on the requirements, this
system has been developed to satisfy the needs to the largest possible extent. With
development in technology, it may be possible to add many more features based on the
requirements in future. The coding and designing is simple and easy to understand which will
make maintenance easier.
TESTING STRATEGY :
A strategy for system testing integrates system test cases and design techniques into a well
planned series of steps that results in the successful construction of software. The testing
strategy must co-operate test planning, test case design, test execution, and the resultant data
collection and evaluation .A strategy for software testing must accommodate low-level
tests that are necessary to verify that a small source code segment has been correctly
implemented as well as high level tests that validate major system functions against user
requirements.
Software testing is a critical element of software quality assurance and represents the
ultimate review of specification design and coding. Testing represents an interesting anomaly
for the software. Thus, a series of testing are performed for the proposed system before
the system is ready for user acceptance testing.
SYSTEM TESTING:
Software once validated must be combined with other system elements (e.g. Hardware,
people, database). System testing verifies that all the elements are proper and that overall
system function performance is achieved. It also tests to find discrepancies between the
system and its original objective, current specifications and system documentation.
UNIT TESTING:
In unit testing different are modules are tested against the specifications produced during the
design for the modules. Unit testing is essential for verification of the code produced during
the coding phase, and hence the goals to test the internal logic of the modules. Using the
detailed design description as a guide, important Conrail paths are tested to uncover errors
within the boundary of the modules. This testing is carried out during the programming stage
itself. In this type of testing step, each module was found to be working satisfactorily as
regards to the expected output from the module.
In Due Course, latest technology advancements will be taken into consideration. As

part of technical build-up many components of the networking system will be generic in
nature so that future projects can either use or interact with this. The future holds a lot to
offer to the development and refinement of this project.
5.2 TEST CASES
Test cases can be divided in to two types. First one is Positive test cases and second one is
negative test cases. In positive test cases are conducted by the developer intention is to get the
output. In negative test cases are conducted by the developer intention is to don’t get the
output.
5.2.1 +VE TEST CASES
S .No Test case Description Actual value Expected value Result

1 Register the user Data Enter the Registration True
Owner/ Data User personal info. successfully
registration process
2 Enter the username and Verification of Waiting for True
password login details. Admin
Acceptance
3 Upload File a Dataset file Uploaded File uploaded True
successful successfully
4 Search Dataset as a User It will show the We can see the True
all data all files as you’re
encounter id expected.
5 By Clicking It will It will True
View/Recover View/Recover View/Recover
the dataset the dataset
5.2.2 -VE TEST CASES
S .No Test case Description Actual value Expected value Result

1 Register the user Data Enter the Registration False
Owner/ Data User personal info. successfully
registration process
2 Enter the username and Verification of Waiting for False
password login details. Admin
Acceptance
3 Upload File a Dataset file Uploaded File uploaded False
successful successfully
4 Search Dataset as a User It will show the We can see the all False
all data files as you’re
encounter id expected.
5 By Clicking It will It will False
View/Recover View/Recover View/Recover the
the dataset dataset
CHAPTER 6
CONCLUSION AND FUTURE SCOPE

In this paper we have introduced an accurate imputation algorithm, Ghost, that can operate on
multivariate datasets. It uses a constraint and the first similar segments, adjacent to the
missing data segment to perform the imputation process. To improve its efficiency of the
algorithm we use a cache based optimization. Our algorithm accuracy has outperformed
state-of-the-art algorithm by 18% in F-score and 25% in Precision. Our proposed algorithm is
appropriate for systems that produce data streams and can not hold data for long term.
Moreover, it is useful for systems that prioritize accuracy over the response time.
As a future work we will try to develop a distance function that can identify prior and
posterior segments with are in the proximity (not adjacent) of the missing segments. Finding
priors and posteriors patterns and their distance to the missing segments could increase
number of recovery segments, and thus accuracy of the algorithm.

Ghost Imputation Project

Uploaded by

Copyright:

Available Formats

Ghost Imputation Project

Uploaded by

Document Information

Original Description:

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Ghost Imputation Project

Uploaded by

Copyright:

Available Formats

GHOST IMPUTATION PROJECT

Project report submitted in partial fulfilments of the requirements of the degree

BACHELOR OF SCIENCE IN COMPUTER SCIENCE

School of Computer Science & IT

GANGADHAR MEHER UNIVERSITY, AMRUTA VIHAR

School of Computer Science & IT

Sneha Kaushal (UBS18CSC001)

System requirements specification: A structured collection of information that embodies

A systems engineering, a requirement can be a description of what a system must

A collection of requirements defines the characteristics or features of the desired system. A

Preliminary investigation examine project feasibility, the likelihood the system

 Is there sufficient support for the management from the users?

 Does the necessary technology exist to do what is suggested?

Detailed Design Document

3.1.1 Purpose of this document

This document's purpose is to provide a high-level design framework around which to

3.1.2 Overview of document

The System architecture description provides an overview of the system's major

Entity Relation Diagram

A data flow diagram is a graphical representation or technique depicting information flow

Context Level Data Flow Diagrams

Admin Sequence Diagram

User Sequence Diagram

Basic Use Case Diagram Symbols and Notations

System Usecase Diagram

Admin Usecase Diagram

4.1 Implementation (Screenshots)

4.1.2 Login Page

4.1.4 After Registration

4.1.6 View All Active Data Users

4.1.8 User/Admin Registration as Admin

4.1.10 View Dataset (Clinic pre-processing dataset)

4.1.13 Request Dataset

4.1.15 After click on recover button

5.1.1 Unit Testing

5.1.2 Integration Testing

The following are the types of Integration Testing:

1)Top Down Integration

5.1.3 User Acceptance Testing

5.1.4 Output Testing

5.1.5 Validation Checking

Preparation of Test Data

Using Live Test Data:

In Due Course, latest technology advancements will be taken into consideration. As

5.2.1 +VE TEST CASES

S .No Test case Description Actual value Expected value Result

S .No Test case Description Actual value Expected value Result

CONCLUSION AND FUTURE SCOPE

You might also like