Discover millions of audiobooks, ebooks, and so much more with a free trial

From $11.99/month after trial. Cancel anytime.

Modern Graph Theory Algorithms with Python: Harness the power of graph algorithms and real-world network applications using Python
Modern Graph Theory Algorithms with Python: Harness the power of graph algorithms and real-world network applications using Python
Modern Graph Theory Algorithms with Python: Harness the power of graph algorithms and real-world network applications using Python
Ebook587 pages4 hours

Modern Graph Theory Algorithms with Python: Harness the power of graph algorithms and real-world network applications using Python

Rating: 0 out of 5 stars

()

Read preview
LanguageEnglish
PublisherPackt Publishing
Release dateJun 7, 2024
ISBN9781805120179
Modern Graph Theory Algorithms with Python: Harness the power of graph algorithms and real-world network applications using Python

Related to Modern Graph Theory Algorithms with Python

Related ebooks

Intelligence (AI) & Semantics For You

View More

Reviews for Modern Graph Theory Algorithms with Python

Rating: 0 out of 5 stars
0 ratings

0 ratings0 reviews

What did you think?

Tap to rate

Review must be at least 10 words

    Book preview

    Modern Graph Theory Algorithms with Python - Colleen M. Farrelly

    Cover.pngA close-up of a logo Description automatically generated

    Modern Graph Theory Algorithms with Python

    Copyright © 2024 Packt Publishing

    All rights reserved. No part of this book may be reproduced, stored in a retrieval system, or transmitted in any form or by any means, without the prior written permission of the publisher, except in the case of brief quotations embedded in critical articles or reviews.

    The authors acknowledge the use of cutting-edge AI (NightCafe’s Stable Diffusion algorithms) for the figures illustrated in this book. It’s important to note that the content itself has been crafted by the authors and edited by a professional publishing team.

    Every effort has been made in the preparation of this book to ensure the accuracy of the information presented. However, the information contained in this book is sold without warranty, either express or implied. Neither the authors, nor Packt Publishing or its dealers and distributors, will be held liable for any damages caused or alleged to have been caused directly or indirectly by this book.

    Packt Publishing has endeavored to provide trademark information about all of the companies and products mentioned in this book by the appropriate use of capitals. However, Packt Publishing cannot guarantee the accuracy of this information.

    Group Product Manager: Ali Abidi

    Publishing Product Manager: Yasir Khan

    Book Project Manager: Hemangi Lotlikar

    Senior Editor: Tazeen Shaikh

    Technical Editor: Rahul Limbachiya

    Copy Editor: Safis Editing

    Proofreader: Tazeen Shaikh

    Indexer: Subalakshmi Govindhan

    Production Designer: Jyoti Kadam

    DevRel Marketing Coordinator: Nivedita Singh

    First published: June 2024

    Production reference: 1230524

    Published by Packt Publishing Ltd.

    Grosvenor House

    11 St Paul’s Square

    Birmingham

    B3 1RB, UK.

    ISBN 978-1-80512-789-5

    www.packtpub.com

    Many thanks and much gratitude to Peter Schnable, for his encouragement and guidance during my journey into science and mathematics; for long discussions about herpetology, conservation, and ethics; and for inspiring me to share my knowledge with others and to forge my own path in STEM for social good.

    - Colleen Molloy Farrelly

    To my beloved mother, Ngoy Justine, whose unwavering love and encouragement fueled my passion for knowledge. Throughout the writing of these pages, her memory has been a guiding light, inspiring me to pursue excellence and share the joy of discovery. Though she is no longer with us, her spirit lives on in the words and sentiments expressed within these chapters. With profound gratitude and love, I dedicate this work to the woman whose influence continues to shape my journey.

    To Meda, Divine, and Abigael.

    - Franck Kalala Mutombo

    Foreword

    As the CTO/CIO of life sciences as well as automotive, energy, and high-tech industry-focused companies, I have continuously been challenged with creating meaningful data insights from a great variety of unstructured and semi-structured data sources. A decade ago, my advanced analytics and machine learning journey experienced the biggest push forward when I benefited from one of the most impactful introductions in my career. Ever since I got to work with Colleen Farrelly on our ontology-focused data science requirements when building out a genomics diagnostics platform, my approach to AI has been tremendously elevated. Subsequently, I got to work on Colleen Farrelly’s exceptional book, The Shape of Data, which intrigued me deeply with her unique delivery of how to combine geometry and machine learning-based algorithms and supporting data structures to create powerful topological representations of complex data problems.

    In this equally captivating sequel, Modern Graph Theory Algorithms with Python, Colleen Farrelly and Franck Kalala Mutombo take you to the next level of unleashing the potential of network science by diving deep into specific graph theory approaches to solve a great variety of industry problems ranging from ecological to financial, spatial and temporal sales, and clinical data challenges. It is easy to see why readers and data science practitioners of any level will find each chapter profoundly valuable based on the supporting Python library and code examples along with the underlying mathematical explanations.

    With each turn of a page, I found myself wanting to reinforce the presented learnings by applying well-defined examples to my own business requirements. Being a keen user of graph databases, I very much enjoyed the hands-on practical discussions on specific technologies such as in Chapter 12

    to illustrate the relative ease of Python integrations and built-in capabilities of open source solutions that can be leveraged in today’s growing powerful set of available tools.

    This comprehensive book provides just-enough and just-in-time fundamental concepts to enable data scientists and software engineers to greatly elevate their machine learning techniques with directly applicable well-structured scenarios. Each of the consistent problem-solution breakdowns includes key considerations for the required data wrangling, transformation, and modeling aspects.

    Furthermore, this book provides a convincing lead into the relevance of new frontiers such as quantum network science algorithms, neural network architectures as graphs, hierarchical networks, and hypergraphs. With a concise and easy-to-follow thought process, this book provides you with the important context of how to reduce the large volumes of parameterization required for large language models and address the critical aspect of metadata management via hypergraph databases, for example.

    The use of graph theory today is highly relevant to every industry and science domain. Whether the challenge is to provide predictive modeling or simulations or the optimization of business operations or clinical outcomes and many more requirements, this book is an indispensable guide to mastering the complexities of these critical real-world challenges. With all the key insights and GitHub repository examples at your fingertips, you will be transformed instantly into a subject matter expert. The Modern Graph Theory Algorithms with Python exploration is a must-read and thoroughly enjoyable book.

    Michael Giske

    Chairman of Inomo Technologies and Global CIO of B-ON

    Contributors

    About the authors

    Colleen Molloy Farrelly is a chief mathematician, data scientist, and researcher who has expertise in applying math to the biological, medical, social, and physical sciences. She has also authored the book, The Shape of Data. She has mentored, coauthored papers, and worked with people across Latin America, Africa, Europe, and Asia.

    She is based in Miami, Florida in the US and holds a master’s in biostatistics from the University of Miami. She is passionate about educational initiatives in the developing world and speaks at conferences such as Women in Data Science, IEEE conferences, PyData, and Applied Machine Learning Days.

    I want to thank the people who have supported me over the years, especially early on, including John and Nancy Farrelly, Peter Schnable, the Warmus family, Mr. and Mrs. De Jong, the Mayor families, Justin and Christy Moeller, Luke Robinson, and many professors and colleagues throughout my career.

    To all my students and those who will come after me who motivate me to teach and share my knowledge, you can use math and science to change the world for the better.

    Franck Kalala Mutombo is a professor of mathematics at Lubumbashi University and former academic director of AIMS-Senegal. He previously worked in a research position at the University of Strathclyde and AIMS-South Africa in a joint appointment with the University of Cape Town. He holds a PhD in mathematical sciences from the University of Strathclyde, Glasgow, Scotland. His current research considers the impact of network structure on long-range interactions applied to epidemics, diffusion, object clustering, differential geometry of manifolds, finite element methods for PDEs, and data science. Currently, he teaches at the University of Lubumbashi and across the AIMS Network.

    I express gratitude to my supportive network throughout my journey. I’m thankful for friends and professors who’ve contributed to my career. I am indebted to the countless students across Africa and those who will succeed them. Their enthusiasm and curiosity serve as constant reminders of the profound impact that mathematics and science can have on shaping a better world. It is with gratitude that I embrace the opportunity to teach and share knowledge, fostering a community of learners committed to leveraging the transformative potential of these disciplines for the greater good.

    About the reviewer

    Casey Moffatt, with a master’s in applied mathematics and a double bachelor’s in pure mathematics and philosophy, specializes in graph theory, optimization, and computer science. He is proficient in Python and various essential software for graph data science, machine learning, and algorithm development. He is eager to push boundaries in mathematics and computer science. He would like to thank Packt Publishing and contributors for enabling projects like this and to the countless individuals behind open source technologies.

    Table of Contents

    Preface

    Part 1: Introduction to Graphs and Networks with Examples

    1

    What is a Network?

    Technical requirements

    Introduction to graph theory and networks

    Formal definitions

    Creating networks in Python

    Random graphs

    Examples of real-world social networks

    Other type of networks

    Advanced use cases of network science

    Summary

    References

    2

    Wrangling Data into Networks with NetworkX and igraph

    Technical requirements

    Introduction to different data sources

    Social interaction data

    Spatial data

    Temporal data

    Biological networks

    Other types of data

    Wrangling data into networks with igraph

    Social network examples with NetworkX

    Summary

    References

    Part 2: Spatial Data Applications

    3

    Demographic Data

    Technical requirements

    Introduction to demography

    Demographic factors

    Geographic factors

    Homophily in networks

    Francophone Africa music spread

    AIMS Cameroon student network epidemic model

    Summary

    References

    4

    Transportation Data

    Technical requirements

    Introduction to transportation problems

    Paths between stores

    Fuel costs

    Time to deliver goods

    Navigational hazards

    Shortest path applications

    Traveling salesman problem

    Max-flow min-cut algorithm

    Summary

    References

    5

    Ecological Data

    Technical requirements

    Introduction to ecological data

    Exploring methods to track animal populations across geographies

    Exploring methods to capture plant distributions and diseases

    Spectral graph tools

    Clustering ecological populations using spectral graph tools

    Spectral clustering on text notes

    Summary

    References

    Part 3: Temporal Data Applications

    6

    Stock Market Data

    Technical requirements

    Introduction to temporal data

    Stock market applications

    Introduction to centrality metrics

    Application of centrality metrics across time slices

    Extending network metrics for time series analytics

    Summary

    References

    7

    Goods Prices/Sales Data

    Technical requirements

    An introduction to spatiotemporal data

    The Burkina Faso market dataset

    Store sales data

    Analyzing our spatiotemporal datasets

    Summary

    References

    8

    Dynamic Social Networks

    Technical requirements

    Social networks that change over time

    Friendship networks

    Triadic closure

    A deeper dive into spreading on networks

    Dynamic network introduction

    SIR models, Part Two

    Factors influencing spread

    Example with evolving wildlife interaction datasets

    Crocodile network

    Heron network

    Summary

    References

    Part 4: Advanced Applications

    9

    Machine Learning for Networks

    Technical requirements

    Introduction to friendship networks and friendship relational datasets

    Friendship network introduction

    Friendship demographic and school factor dataset

    ML on networks

    Clustering based on student factors

    Clustering based on student factors and network metrics

    Spectral clustering on the friendship network

    DL on networks

    GNN introduction

    Example GNN classifying the Karate Network dataset

    Summary

    References

    10

    Pathway Mining

    Technical requirements

    Introduction to Bayesian networks and causal pathways

    Bayes’ Theorem

    Causal pathways

    Bayesian networks

    Educational pathway example

    Outcomes in education

    Course sequences

    Antecedents to success

    Analyzing course sequencing to find optimal student pathways to graduation

    Introduction to a dataset

    bnlearn analysis

    Structural equation models

    Summary

    References

    11

    Mapping Language Families – an Ontological Approach

    Technical requirements

    What is an ontology?

    Introduction to ontologies

    Representing information as an ontology

    Language families

    Language drift and relationships

    Nilo-Saharan languages

    Mapping language families

    Summary

    References

    12

    Graph Databases

    Introduction to graph databases

    What is a graph database?

    What can you represent in a graph database?

    Querying and modifying data in Neo4j

    Basic query example

    More complicated query examples

    Summary

    References

    13

    Putting It All Together

    Technical requirements

    Introduction to the problem

    Ebola spread in the Democratic Republic of Congo – 2018-2020 outbreak

    Geography and logistics

    Introduction to GEEs

    Mathematics of GEEs

    Our problem and GEE formulation

    Data transformation

    Python wrangling

    GEE input

    Data modeling

    Running the GEE in Python

    Summary

    References

    14

    New Frontiers

    Quantum network science algorithms

    Graph coloring algorithms

    Max flow/min cut

    Neural network architectures as graphs

    Deep learning layers and connections

    Analyzing architectures

    Hierarchical networks

    Higher-order structures and network data

    An example using gene families

    Hypergraphs

    Displaying information

    Metadata

    Summary

    References

    Index

    Other Books You May Enjoy

    Preface

    Hello there! Network science combines the power of analytics with the deep theoretical tools of graph theory to solve difficult problems in data analytics. This empowers researchers and industry engineers/data scientists to analyze data at scale and reframe intractable analytics problems to produce powerful insights into problems and predictions about system behaviors, including biological, physical, and social systems of interest.

    There are many important applications of network science today, including these:

    Social network data

    Spatial data

    Time series data

    Spatiotemporal data

    More advanced data structures, such as ontologies or hypergraphs

    This book gives a brief overview of social network applications and focuses on the cutting edge of network science applications to areas of data science, such as transportation logistics, conversation, public health, linguistics, and education. By the end of your journey, you’ll be able to frame your own data problem within the framework of network science to derive insights and tackle difficult problems in your field.

    We will provide the necessary mathematical background as we dive into practical examples and code related to our work in academia and industry over the past decades, including work on predicting Ebola outbreaks, forecasting food price volatility, modeling genetic and linguistic relationships, and mining social networks for insights into social tie formation. As the world faces food shortages, public health crises, economic inequality, supply chain breakdowns, and environmental crises, network science will play an important role in big data analytics for social good.

    Who this book is for

    This book is for you if you are working with data. To get the most out of the book, you should have some familiarity with Python, particularly the pandas and numpy packages. In addition, some familiarity with data analytics is assumed, though the network science tools and problems we tackle are built from scratch for readers without a background in those problems or methods.

    Network science has a rich history in many scientific disciplines, including epidemiology, biomedical engineering, sociology, genetics, environmental science, particle physics, computer science, and economics. Its foundations in graph theory influence research in many areas of pure and applied mathematics as well. Anyone in the fields of science, technology, engineering, and mathematics can benefit from network science’s toolset and approach to problem-solving.

    What this book covers

    Chapter 1

    , What Is a Network?, introduces the theoretical concept of a network and provides several examples of networks in real-world applications, including work with random graphs. We’ll also get started with Python’s igraph and NetworkX packages.

    Chapter 2

    , Wrangling Data into Networks with NetworkX and igraph, builds on Chapter 1

    by providing three examples of real-world data that can be formulated as network data and showing how to convert data into network form in Python. We’ll introduce problems involving spatial data, temporal data, and spatiotemporal data and explore how network science can solve these problems by converting the data into network form.

    Chapter 3

    , Demographic Data, explores two real-world projects using demography data from the developing world to understand network structures and capacity for information/infectious disease spread. We’ll consider the demographic characteristics and network properties of a friend group to see how both types of information can influence disease spread.

    Chapter 4

    , Transportation Data, provides a real-world example of a transportation network and introduces tools related to minimum paths and network flow. We’ll consider optimal routing and the shortest paths to destinations, including multistop pathways from one location to another.

    Chapter 5

    , Ecological Data, shows a real-world example of an ecological network and introduces spectral graph theory tools, including spectral clustering and graph Laplacians.

    Chapter 6

    , Stock Market Data, examines a real-world example of stock market data analysis with network tools, including edge-based centrality measures of volatility. We’ll mine data for tipping points, heralding either a period of market growth or market crash.

    Chapter 7

    , Goods Prices/Sales Data, provides two real-world examples of commerce data analysis over both space and time with tools previously covered in time series and spatial data applications. We’ll examine sales and pricing trends across time and space to better understand consumer behavior and the impacts of pricing changes across time and space.

    Chapter 8

    , Dynamic Social Networks, introduces a real-world example of social network datasets evolving over time and analyzes their vulnerability to spreading processes, such as epidemics and misinformation flow. We’ll consider factors influencing ecological social networks’ vulnerability to spread of disease.

    Chapter 9

    , Machine Learning for Networks, presents a comprehensive description on network-based machine learning and deep learning, including examples with supervised, unsupervised, and semi-supervised learning to understand disease risks within social networks.

    Chapter 10

    , Pathway Mining, introduces Bayesian networks and mining for causal pathways using an educational data example, where we’ll see how course sequencing and performance influence student outcomes.

    Chapter 11

    , Mapping Language Families – an Ontological Approach, covers ontologies and maps between ontologies using a linguistic data example from the Nilo-Sudanic language family and its lexicon variations.

    Chapter 12

    , Graph Databases, introduces graph databases with Neo4j, including data from prior chapters and how to query Neo4j with graph tools introduced in prior chapters and Neo4j’s query language. We’ll see how graph databases and network science tools create synergy in data science, as well as efficient data storage solutions.

    Chapter 13

    , Putting It All Together, ties together material from previous chapters into a final project, analyzing spatiotemporal network data and demographic data from Ituri and North Kivu provinces with generalized estimating equations to understand the evolution of the 2019 Ebola epidemic.

    Chapter 14

    , New Frontiers, introduces quantum graph algorithms, graph theory for neural network optimization, hierarchical networks, and hypergraphs.

    To get the most out of this book

    We provide Python scripts and assume some knowledge of basic Python analytics packages (such as NumPy and scikit-learn) and Python syntax. We assume some knowledge of basic analytics tasks such as summary statistics and working with different types of data in Python with either numpy or pandas. Scripts are written for each chapter, with later scripts often depending on earlier scripts in the chapter to build knowledge. Other concepts in Python and in analytics will be introduced conceptually and then with Python code examples.

    If you are using the digital version of this book, we advise you to type the code yourself or access the code from the book’s GitHub repository (a link is available in the next section). Doing so will help you avoid any potential errors related to the copying and pasting of code.

    You are encouraged to try out the code examples in this book on your real-world data science projects. If you want to delve deeper into graph algorithms and network science, we encourage you to look at the latest research papers on network science topics. Google Scholar and arXiv are two good references for network science methods and application papers.

    Download the example code files

    You can download the example code files for this book from GitHub at https://github.com/PacktPublishing/Modern-Graph-Theory-Algorithms-with-Python

    . If there’s an update to the code, it will be updated in the GitHub repository.

    We also have other code bundles from our rich catalog of books and videos available at https://github.com/PacktPublishing/

    . Check them out!

    Conventions used

    There are a number of text conventions used throughout this book.

    Code in text: Indicates code words in text, database table names, folder names, filenames, file extensions, pathnames, dummy URLs, user input, and Twitter handles. Here is an example: This script shows that average subgraph centrality varies between the two subfamily trees, with Greenberg’s average subgraph centrality of 2.478 and Dimmendaal’s average subgraph centrality of 3.276.

    A block of code is set as follows:

    #compare subgraph centrality of language families

    gs=nx.subgraph_centrality(G)

    print(np.mean(np.array(list(gs.values()))))

    gs2=nx.subgraph_centrality(G2)

    print(np.mean(np.array(list(gs2.values()))))

    Bold: Indicates a new term, an important word, or words that you see onscreen. For instance, words in menus or dialog boxes appear in bold. Here is an example: When you hover over the Movie DBMS label on the right-hand side of the screen, you’ll see a Start button that launches the connection to this database. Click on Start.

    Tips or important notes

    Appear like this.

    Get in touch

    Feedback from our readers is always welcome.

    General feedback: If you have questions about any aspect of this book, email us at [email protected]

    and mention the book title in the subject of your message.

    Errata: Although we have taken every care to ensure the accuracy of our content, mistakes do happen. If you have found a mistake in this book, we would be grateful if you would report this to us. Please visit www.packtpub.com/support/errata

    and fill in the form.

    Piracy: If you come across any illegal copies of our works in any form on the internet, we would be grateful if you would provide us with the location address or website name. Please contact us at [email protected]

    with a link to the material.

    If you are interested in becoming an author: If there is a topic that you have expertise in and you are interested in either writing or contributing to a book, please visit authors.packtpub.com

    .

    Share Your Thoughts

    Once you’ve read Modern Graph Theory Algorithms with Python, we’d love to hear your thoughts! Please click here to go straight to the Amazon review page

    for this book and share your feedback.

    Your review is important to us and the tech community and will help us make sure we’re delivering excellent quality content.

    Download a free PDF copy of this book

    Thanks for purchasing this book!

    Do you like to read on the go but are unable to carry your print books everywhere?

    Is your eBook purchase not compatible with the device of your choice?

    Don’t worry, now with every Packt book you get a DRM-free PDF version of that book at no cost.

    Read anywhere, any place, on any device. Search, copy, and paste code from your favorite technical books directly into your application.

    The perks don’t stop there, you can get exclusive access to discounts, newsletters, and great free content in your inbox daily

    Follow these simple steps to get the benefits:

    Scan the QR code or visit the link below

    https://packt.link/free-ebook/9781805127895

    Submit your proof of purchase

    That’s it! We’ll send your free PDF and other benefits to your email directly

    Part 1:Introduction to Graphs and Networks with Examples

    This part of the book builds

    Enjoying the preview?
    Page 1 of 1