Major Issues in Data Mining
Major Issues in Data Mining
Major Issues in Data Mining
the fact that data is not always available in one place. It must be compiled from a
variety of various sources. These elements also cause some issues. So now, I will
discuss the major issues regarding data mining. These issues are Mining
Methodology and User Interaction, Performance Issues and Diverse Data Types
Issues.
The first and second issue is the “Mining Methodology and User Interaction “ it refers
to the following issues, Mining different kinds of knowledge in databases, Interactive
mining of knowledge at multiple levels of abstraction, Incorporation of background
knowledge, Data mining query languages and ad hoc data mining, Presentation and
visualization of data mining results, Handling noisy or incomplete data, and Pattern
evaluation.
In the Mining different kinds of knowledge in databases, different users may be
interested in different kinds of knowledge. Therefore it is necessary for data mining
to cover a broad range of knowledge discovery task. The second part is the
Interactive mining of knowledge at multiple levels of abstraction. Interactive mining is
important because it allows the user to narrow down the search for patterns, as well
as provide and refine data mining requests based on the results. The third part is
incorporation of background knowledge, it says that the main goal of background
knowledge is to keep the discovery process going and to point out any patterns or
trends that develop. Background knowledge can also be used to convey the patterns
or trends that have been identified in a brief outline manner. It can be represented at
many degrees of abstraction as well. The fourth part of issue is the data mining
query languages and ad hoc data mining, the data mining query language is in
charge of granting access to users, and it must be coupled with a data warehouse
query language to define ad hoc mining jobs. The fifth is presentation and
visualization of data mining results, the discovered patterns or trends are to be
expressed in high level languages and graphic representations in this issue. The
representation must be written in such a way that it is easily understood by all. The
six part of issue is Handling noisy or incomplete data, for this process, the data
cleaning methods are used. It is a convenient way of handling the noise and the
incomplete objects in data mining. Without data cleaning methods, there will be no
accuracy in the discovered patterns. And then these patterns will be poor in quality.
Last part of issue is pattern evaluation , the discovered patterns should be interesting
because they either represent common knowledge or lack novelty.
The third issue of data mining is performance. This issue includes efficiency and
scalability of data mining algorithms, and Parallel, distributed, and incremental
mining algorithms. In the first part of the issue in performance which is efficiency and
scalability of data mining algorithms, and Parallel, it says to extract information from
large amounts of data in the data set, the Data Mining technique should be scalable
and efficient.. Next part is parallel, distributed, and incremental mining algorithms. In
this part of issue, the development of parallel and distributed algorithms in data
mining can be attributed to a variety of sources. These criteria include a vast
database, a large spread of data, and a complicated data mining process. The
algorithm separates the data from the database into multiple partitions as the first
and most important stage in this procedure. The data is then processed in the next
stage so that it can be arranged in parallel. The outcome of the partition is then
integrated in the last stage.
The last issue of data mining is Diverse Data Types Issues. This issue has to parts
they are Handling of relational and complex types of data, and Mining information
from heterogeneous databases and global information systems. The Handling of
relational and complex types of data the issue of this part is complex data items,
multimedia data objects, spatial data, temporal data, and other types of data may be
stored in the database. It is impossible for a single machine to mine all of this
information. Then in the mining information from heterogeneous databases and
global information systems. The data is available at different data sources on LAN or
WAN. These data source may be structured, semi structured or unstructured.
Therefore mining the knowledge from them adds challenges to data mining.