My Library

     
Limit search to available items
Result page: Previous Record Next Record
MANUSCRIPT
Title Accurate and efficient clustering algorithms for very large data sets / Syed Abdul Quddus.
Published Mt. Helen, Vic. : Federation University Australia, 2017.

ITEM LOCATION CALL NO. STATUS
 MTH Closed Access  519.53 Q29A 2017    LIB USE ONLY
Descript. 152 leaves : illustrations ; 30 cm.
Notes This thesis is submitted in total fulfilment of the requirement for the degree of Doctor of Philosophy, Faculty of Science and Technology, Federation University Australia.
Principal Supervisor: Assoc. Prof. Adil Bagirov.
"Submitted in April 2017" - Title page.
Thesis Thesis (PhD) -- Federation University Australia, 2017.
Bibliog. Bibliography: leaves 142-152.
Notes Mt Helen Closed Access. For use only within the Library.
Summary "The ability to mine and extract useful information from large data sets is a common concern for organizations. Data over the internet is rapidly increasing and the importance of development of new approaches to collect, store and mine large amounts of data is significantly increasing. Clustering is one of the main tasks in data mining. Many clustering algorithms have been proposed but there are still clustering problems that have not been addressed in depth especially the clustering problems in large data sets. Clustering in large data sets is important in many applications and such applications include network intrusion detection systems, fraud detection in banking systems, air traffic control, web logs, sensor networks, social networks and bioinformatics. Data sets in these applications contain from hundreds of thousands to hundreds of millions of data points and they may contain hundreds or thousands of attributes. Recent developments in computer hardware allows to store in random access memory and repeatedly read data sets with hundreds of thousands and even millions of data points. This makes possible the use of existing clustering algorithms in such data sets. However, these algorithms require a prohibitively large CPU time and fail to produce an accurate solution. Therefore, it is important to develop clustering algorithms which are accurate and can provide real time clustering in such data sets. This is especially important in a big data era. The aim of this PhD study is to develop accurate and real time algorithms for clustering in very large data sets containing hundreds of thousands and millions of data points. Such algorithms are developed based on the combination of heuristic algorithms with the incremental approach. These algorithms also involve a special procedure to identify dense areas in a data set and compute a subset most informative representative data points in order to decrease the size of a data set. It is the aim of this PhD study to develop the center-based clustering algorithms. The success of these algorithms strongly depends on the choice of starting cluster centers. Different procedures are proposed to generate such centers. Special procedures are designed to identify the most promising starting cluster centers and to restrict their number. New clustering algorithms are evaluated using large data sets available in public domains. Their results will be compared with those obtained using several existing center-based clustering algorithms." -- Abstract.
Notes Online version available through FedUni ResearchOnline. http://researchonline.federation.edu.au/vital/access/HandleResolver/1959.17/162586
Subject Cluster analysis -- Data processing.
Data mining -- Mathematical models.
Data structures (Computer science).
Algorithms -- Mathematical models -- Research.
Computer algorithms -- Research.
Other Author Federation University Australia. Faculty of Science and Technology
Result page: Previous Record Next Record