Muhammad Qasim Pasta

personal website

Models to generate complex networks and benchmark graphs

PDF COPY

 

Abstract:

We live in an inter-connected world and any inter-connected system can be represented by a graph, also known as a network. The World Wide Web is a good example which is an inter-connected system of web-pages. Recent decades have witnessed tremendous growth in the study of such networks. To understand the dynamics and rules behind the formation of networks, researchers try to generate synthetic networks that can mimic real-world networks and a number of models have been proposed from scholars of different domains for this purpose. These models not only help us to understand the evolution of networks but also in prediction of network growth, sampling smaller sized networks, and the formation of communities. They also help us to analyze the behaviour of different processes on these networks such as information diffusion and epidemic spreading.

This thesis contributes to the development of models that can generate synthetic networks with similar properties to real-world networks while also enabling the embedding of ground-truth community structures. We especially focus on two major areas: models for incorporating demographical attributes and models for generating ground truth communities. We analyze the shortcomings of existing models and subsequently propose new models in an attempt to overcome these shortcomings.

The field of network science has become rich as scholars from different domains have contributed to it. However, it comes at the cost of different terminologies and analogies from different domains. We propose a framework to compare and evaluate different network models. The framework allows comparison of two network models regardless of their originating domains. We compare a number of models using the proposed framework to demonstrate its effectiveness.

We also propose a model to generate networks on the basis of social and demographical attributes. We successfully evaluate the model by re-generating publicly available Facebook datasets. The model is generic and easily tunable to generate a variety of networks using demographical and social attributes.

Generation of synthetic networks with the presence of community structures has attracted a lot of attention in recent years. However, the generation of such networks with ground truth communities has not received much attention. The models to generate networks with ground truth communities are important as they can be used to evaluate the performance of community detection algorithms, especially for very large graphs where ground truth is simply not available.

We propose different models to generate networks with the presence of community structures. We discuss the possibilities of converting an existing network generation model into one that is capable of generating ground truth communities. We also empirically analyze the performance of community detection algorithms on one such model. We also perform a detailed study to evaluate the performance of community detection algorithms on networks with different topological properties. We show that existing models use for evaluation of performance of community detection algorithms are not sufficient and current community detection algorithms do not perform well enough on networks with different topological properties.

Dedicate: 

dedicated to my family and my community,
"The Okhai Memon Community",
for their support in pursuing of higher education

Acknowledgement:

First of all, I thank the Almighty God, most beneficent and merciful, for providing me the opportunity and giving me the capability to achieve all of this. Furthermore, this could not be accomplished without the help and guidance of a number of people and I sincerely would like to thank all of them.

I would like to express my greatest gratitude to my supervisor Dr. Faraz Zaidi, without his support and guidance, I would not have achieved this at all. He has been a great mentor throughout my journey and always encouraged to set a high standard for my work.  I am thankful for his continuous assistance, patience, and very valuable time he has provided me.

I would like to thank Dr. Muzaffar Mahmood, Dean, Graduate School of Science and Engineering, PAF – Karachi Institute of Economics and Technology for his support. I am grateful to Dr. Tariq Mahmood, Dr. Khurram Junejo and Dr. Maaz Bin Ahmed for their unconditional support and kind words. I also would like to acknowledge the support of Dr. Arnaud Sallaberry, Dr. Guy Melançon, and Dr. Céline Rozenblat for reviewing my work and providing valuable comments. I would like to extend my thanks to Mr. Baqar Mangrani for his kind services to proofread this thesis.

As rightly said, "Some people leave their mark on your life" – other than my supervisor, there are few more who left their marks on my life. I am greatly indebted to Dr. Muhammad Mohiuddin for leaving the mark of "integrity", to Engr. Parkash Lohana for leaving the mark of "impact on society", and to Dr. Irfan Hyder for leaving the mark of "gratitudeness".

I am also thankful to my colleagues and friends with whom I have worked during my research, especial thanks to Mr. Affan Alam, Mr. Ayub Latif, and last but not least to Mr. Khalid Khan.

I owe a great debt of gratitude to my loving family for their extraordinary support and encouragement all over the years. I have no words to thank them – especially to my father: who always inspired me to strive hard, my mother: who is an inspiration to never give up, my younger brother: who backed me to focus, my wife: who taught me how not to worry, and my lovely daughter: who herself is a symbol of hope - and we all know how important "hope" is for a Ph.D. student.

I am also thankful to my first cousin Mr. Ibrahim Memon for being my mentor since my childhood and being my inspiration to join the field of computer science, my friends Mr. Adnan Memon and Mr. Rehan Ranavaya for lifting my morals throughout this journey.