Unit-1: 1. What Is Big Data? Discuss Different Challenges of Conventional System. Answer
Unit-1: 1. What Is Big Data? Discuss Different Challenges of Conventional System. Answer
Unit-1: 1. What Is Big Data? Discuss Different Challenges of Conventional System. Answer
Process Challenges
o It can take significant exploration to find the right model for analysis, and the
ability to iterate very quickly and ‘fail fast’ through many (possible throw
away) models—at scale—is critical. Process challenges with deriving insights
include:
Capturing data.
Aligning data from different sources (e.g., resolving when two objects
are the same).
Transforming the data into a form suitable for analysis.
Modeling it, whether mathematically, or through some form of
simulation.
Understanding the output, visualizing and sharing the results, think
for a second how to display complex analytics on a iPhone or a mobile
device.
Management Challenges
o The main management challenges are
Data privacy
Security
Governance
Ethical
o The challenges are: Ensuring that data are used correctly (abiding by its
intended uses and relevant laws), tracking how the data are used,
transformed, derived, etc., and managing its lifecycle.
2. How Big Data can be represented on to a platform? Discuss trades of Big Data.
Answer
Volume
o The Quantity of data that is generated is very easy important in this context.
o It is the size of the data which determines the value and potential of the data
under consideration and whether it can be actually be considered as big data or
not.
o The name ‘Bid Data’ itself contains a term which is related to size and hence the
characteristics.
Variety
o The next aspect of Big Data is its variety.
o This means that the category to which Big Data belongs to is also a very essential
fact that needs to be known by the data analysis.
o This helps the people, who are closely analyzing the data and are associated with
it, to effectively use the data to their advantage and thus upholding the
importance of the Big Data.
Velocity
o The term ‘Velocity’ in this context refers to the speed of generation of data or
how fast the data is generated and processed to meet the demands and the
challenges which lie ahead in the path of growth and development.
Variability
o This is a factor which can be a problem for those who are analyze the data.
o This refers to the inconsistency which can be shown by the data at times, this
hampering the process of being able to handle and manage the data
effectively.
Complexity
o Data management can become a very complex process, especially when large
volumes of data come from multiple sources.
o These data need to be linked, connected and correlated in order to be able to
grasp the information that is supposed to be conveyed by these data.
o This situation is therefore, termed as the ‘complexity’ of big data.
3. Write a short note on web data analysis and modern data analytics tools?
Answer
Web analysis is the methodological study of online/offline patterns and trends.
It is a technique that you can employ to collect, measure, report and analyze your
website data.
It is normally carried out to analyze the performance of a website and optimize its web
usage.
We use web analytics to track key metrics and analyze visitor’s activity and traffic flow.
It is a tactical approach to collect data and generate reports.
Web analytics is an ongoing is an ongoing to collect data helps in attracting more traffic
to a site thereby, increasing the return on investment.
Analytics tools offers on insight into the performance of your websites, visitor’s
behavior and data flow.
These tools are inexpensive and easy to use sometimes, they are even free.
These tools are basically use to generate on –
o Acquisition analysis
o Behavior analysis
o Conversion analysis
6. Define neural network? How one can help a neural network to learn and
generalize on specific topic.
Answer
Neural network is a network inspired by biological neural networks, which area
used to estimate or approximate function that can depend on a large number of
impulse that can depend on a large number of impulse that are generally unknown.
The principal reason why neural networks have attracted such interest, is the
existence of learning algorithms, for neural networks: algorithms of learning that
use data to estimate the optimal weights in a network to perform some task.
These are three basic approaches to learning in neural network:
o Supervised: Learning uses a training set that consist of a set of pattern pair,
an input pattern and the corresponding desired output pattern.
o Reinforcement: If a networks aim to perform that some task, then the
reinforcement signal in a simple yes or no at the end of the task to indicate
whether the task has been performed satisfactorily.
o Unsupervised: Learning only uses input data there is no training signal.
o The role of neural network training is to identify this “mystery function”
given only the training data.
What the training process does is to estimate the parameters of the function so that
it replicates the data as well as possible & generalizes to new data well.
Generalization is measures that tell us how well the network perform on the actual
problem once training is complete.
This can be measure by looking at the performance of the network on evaluation
data unseen during the training process.
Following three form of principal to apply generalization:
o Good performance on the training data does not necessarily lead to good
generalization performance.
o Simple solutions are better than complex solutions.
o Larger network require more training data.
9. How time series analysis is helpful? Discuss linear system analysis with
example.
Answer
Time series analysis
Time series analysis comprises methods for analyzing time series data in order to
extract meaningful statistics and other characteristics of the data.
Time series analysis can be applied to real-valued, continuous data, discrete
numeric data or discrete symbolic data.
The usage of time series model is twofold.
Obtain an understanding of the underlying, monitoring or even feedback and feed-
forward control.
Linear system Analysis
Linear system analysis is concerned with the study of equilibrium and change in
dynamical system that is sin system that contain variables that may change with
time.
To perform the analysis, relationships between these variables are described by a
set of equation known as model.
In order for linear system analysis to be applicable, the model must process the
linearly property: It must be a linear model.
Example
Ax1(t) Ay1 (t) = Ay1 (t) + By2 (t)
Linear System
Bx2(t) By1 (t)
10. Why Fuzzy Logic is named as fuzzy? How one can extract data from Fuzzy
model.
Answer
Fuzzy logic is a form of many valued logic in which the truth values of variables may
be any real number between 0 and 1 considered to be “Fuzzy”.
Fuzzy logic has been employed to handle the concept of partial truth, where the
truth value may range between completely true and completely false. Hence they
are termed as fuzzy.
A static / dynamic systems which make use of fuzzy set is called a fuzzy system.
These are called rule-based system, also known as fuzzy models as extracting fuzzy
rules from data allows relationship in the data to be modeled by ‘if-then’ rules that
are easy to understand, verify and extend.
Example
If antecedent proposition then consequent proposition.
A classic example of extracting data from a fuzzy model is ‘fuzzy decision tree”.
They combine fuzzy representation and it approximate reasoning with symbolic
decision tree to give an approximate output or generating data based on the
condition of the rule specified as the node of the tree.
12. What is Fuzzy decision tree? How does it help in getting data from huge
database?
Answer
Decision tree is a decision support tool that uses a tree like graph or model of
decisions and their possible consequences, including chance event outcomes,
resources costs and utility.
Fuzzy decision tree are more advanced in the sense that they model uncertainty
around the split values of the feature, represented in soft instead of hard split.
Fuzzy decision tree combines fuzzy representation and it approximate reasoning,
with symbolic decision tree.
As such, they provide for handling of language related uncertainly, noise missing or
finally uncertainly, noise missing or finally features robust behavior, while also
providing comprehensive knowledge interpretation.
Fuzzy decision tree help in forming flexible query, such as set of includes fuzzy
rules can be associated with a database as a knowledge base that can be used to help
answering frequent queries.
Fuzzy decision tree help in information retrieval and mining because of its
capability to represent miscellaneous data in synthetic way, it robustness with
regard to changes of the parameter of the user environment and its unique
expression.