Skip to content

Print

Login

Library home

My Library

Start Over

Add to My Lists

Export

MARC Display

Return To Browse

Result page:

Previous Record

MANUSCRIPT

Title

Accurate and efficient clustering algorithms for very large data sets / Syed Abdul Quddus.

Author

Quddus, Syed Abdul, author

Published

Mt. Helen, Vic. : Federation University Australia, 2017.

ITEM LOCATION	CALL NO.	STATUS
MTH Closed Access	519.53 Q29A 2017	LIB USE ONLY

Descript.	152 leaves : illustrations ; 30 cm.
Notes	This thesis is submitted in total fulfilment of the requirement for the degree of Doctor of Philosophy, Faculty of Science and Technology, Federation University Australia.
	Principal Supervisor: Assoc. Prof. Adil Bagirov.
	"Submitted in April 2017" - Title page.
Thesis	Thesis (PhD) -- Federation University Australia, 2017.
Bibliog.	Bibliography: leaves 142-152.
Notes	Mt Helen Closed Access. For use only within the Library.
Summary	"The ability to mine and extract useful information from large data sets is a common concern for organizations. Data over the internet is rapidly increasing and the importance of development of new approaches to collect, store and mine large amounts of data is significantly increasing. Clustering is one of the main tasks in data mining. Many clustering algorithms have been proposed but there are still clustering problems that have not been addressed in depth especially the clustering problems in large data sets. Clustering in large data sets is important in many applications and such applications include network intrusion detection systems, fraud detection in banking systems, air traffic control, web logs, sensor networks, social networks and bioinformatics. Data sets in these applications contain from hundreds of thousands to hundreds of millions of data points and they may contain hundreds or thousands of attributes. Recent developments in computer hardware allows to store in random access memory and repeatedly read data sets with hundreds of thousands and even millions of data points. This makes possible the use of existing clustering algorithms in such data sets. However, these algorithms require a prohibitively large CPU time and fail to produce an accurate solution. Therefore, it is important to develop clustering algorithms which are accurate and can provide real time clustering in such data sets. This is especially important in a big data era. The aim of this PhD study is to develop accurate and real time algorithms for clustering in very large data sets containing hundreds of thousands and millions of data points. Such algorithms are developed based on the combination of heuristic algorithms with the incremental approach. These algorithms also involve a special procedure to identify dense areas in a data set and compute a subset most informative representative data points in order to decrease the size of a data set. It is the aim of this PhD study to develop the center-based clustering algorithms. The success of these algorithms strongly depends on the choice of starting cluster centers. Different procedures are proposed to generate such centers. Special procedures are designed to identify the most promising starting cluster centers and to restrict their number. New clustering algorithms are evaluated using large data sets available in public domains. Their results will be compared with those obtained using several existing center-based clustering algorithms." -- Abstract.
Notes	Online version available through FedUni ResearchOnline. http://researchonline.federation.edu.au/vital/access/HandleResolver/1959.17/162586
Subject	Cluster analysis -- Data processing.
	Data mining -- Mathematical models.
	Data structures (Computer science).
	Algorithms -- Mathematical models -- Research.
	Computer algorithms -- Research.
Other Author	Federation University Australia. Faculty of Science and Technology

Bookmark link for this record

Result page:

Previous Record