Analysing Human Mobility in Thailand to improve Epidemiological Models

Human mobility is a fundamental driver of infectious disease dynamics. While pathogens such as the dengue virus are transmitted locally by mosquito vectors, the movement of infected humans connects geographically distant populations, facilitating the spatial spread of infection and influencing the timing and intensity of epidemics. Capturing these mobility patterns accurately is therefore essential for realistic epidemiological modelling. Within the Arbothai project, we have developed a comprehensive, multi-modal human mobility dataset for Thailand, explicitly designed to be integrated into spatial models of dengue transmission. The project combines large-scale national transportation data with fine-grained urban mobility information for Bangkok, providing an unprecedented representation of population movement across the country. In the figure below, an overview of intercity bus, train, ferry, and domestic flight connections across Thailand can be seen, illustrating the spatial extent and density of the national mobility system.

Summary

The Arbothai project provides a robust, data-driven framework for analysing human mobility in Thailand and integrating it into epidemiological models. By combining multimodal transportation data, network analysis, and origin–destination matrices, the project moves beyond abstract mobility assumptions and grounds disease modelling in real-world movement patterns.

This approach not only enhances our understanding of dengue transmission in Thailand but also offers a scalable methodology for studying mobility-driven disease dynamics in other contexts.

Modelling context

Traditional epidemiological models often rely on simplified assumptions about movement, such as uniform mixing between regions or gravity-type mobility models. While these approaches are useful, they may fail to capture real-world heterogeneities in connectivity, travel intensity, and network structure.

The Arbothai mobility framework aims to overcome these limitations by:

  • grounding mobility assumptions in empirical transportation data,
  • representing movement as networks and origin–destination flows, and
  • producing standardised outputs that can be directly embedded into mechanistic and statistical disease models.

The primary application of this work is dengue transmission modelling in Thailand, but the methodological framework is general and transferable to other diseases and settings. To illustrate this, figure bellow shows a heat map of the combined intensity of arrivals and departures by province, highlighting strong spatial heterogeneity in national mobility patterns.

A multimodal and multiscale mobility dataset

To capture the full spectrum of human movement, the project integrates multiple transportation modes across different spatial scales.

At the national level, the dataset includes:

  • intercity bus services,
  • long-distance and regional trains,
  • passenger ferries connecting coastal and island regions,
  • domestic air travel.

At the metropolitan level, the focus is on Bangkok, whose dense and complex transit system plays a dominant role in national mobility. Urban data includes:

  • elevated and underground metro systems (BTS and MRT),
  • commuter rail lines,
  • the airport rail link,
  • an extensive bus network,
  • river and canal boat services.

Together, these layers capture both long-range interprovincial connectivity and short-range urban movement, allowing models to represent how infections may spread between provinces and then circulate within large population centres.

Data sources and collection strategy

All data are collected from publicly accessible and official sources, selected to maximise coverage, consistency, and reproducibility. Route catalogues, service frequencies, and stop information are extracted from transport platforms and operator data, while aviation flows are obtained from official airport statistics.

Because different sources provide different types of information, a key challenge is data harmonisation. The project addresses this by applying a unified processing pipeline that standardises naming conventions, resolves geographic inconsistencies, and ensures that all transportation modes can be analysed within a common framework.

Importantly, service frequency (e.g. trips per day) is used as a proxy for movement intensity. While this does not directly measure passenger counts, it provides a consistent and widely available indicator of potential mobility flows across all modes.

From routes to networks: data processing and architecture

Raw route listings and schedules are not directly suitable for epidemiological modelling. To bridge this gap, the Arbothai project implements a structured, multi-step processing pipeline.

The pipeline:

  1. extracts route catalogues for each transport mode,
  2. parses route details such as origins, destinations, operators, and service frequencies,
  3. identifies unique geographic locations and geocodes them,
  4. merges route and location data into integrated datasets,
  5. constructs directed, weighted transportation networks,
  6. generates origin–destination matrices at multiple spatial resolutions.

This architecture ensures that every modelling input can be traced back to its original data source, supporting transparency and reproducibility.

Network representation of mobility

Mobility is represented as a set of directed graphs, where:

  • nodes correspond to geographic entities (provinces at the national level, stops at the Bangkok level),
  • edges represent direct transport connections,
  • edge weights reflect service frequency.

This representation captures not only whether two locations are connected, but also the strength and directionality of that connection. Such a structure is essential for understanding asymmetric flows and identifying routes that disproportionately contribute to national connectivity.

Network analysis and centrality metrics

Once the transportation networks are constructed, they are analysed using tools from network science to identify structurally important locations.

Several complementary centrality measures are calculated:

  • Degree centrality identifies highly connected locations.
  • Betweenness centrality highlights provinces that act as bridges between otherwise weakly connected regions.
  • Eigenvector centrality and PageRank identify nodes that are connected to other important nodes, capturing indirect influence within the network.

These metrics reveal strong spatial hierarchies in Thailand’s mobility system. Bangkok consistently emerges as the dominant hub across all measures, reflecting its role as the country’s economic and transportation centre.

To better understand regional dynamics beyond the capital, additional analyses remove Bangkok from the network. This reveals secondary hubs and alternative transmission pathways that may become critical during Bangkok-centred outbreaks or targeted interventions.

The interplay between these metrics is further explored through a direct comparison of eigenvector centrality and PageRank at the provincial level. This analysis identifies specific locations that exert a disproportionate influence on national connectivity, functioning as primary drivers of the network’s structural integrity. To assess the robustness of this system, the impact of sequentially removing these highly central provinces is evaluated. The resulting decline in overall network connectivity illustrates the critical role of these key mobility hubs; their removal leads to a rapid fragmentation of the network, highlighting their importance as potential targets for localized interventions to mitigate the spread of vector-borne diseases.

Origin–destination matrices as model inputs

The central modelling outputs of the project are origin–destination (OD) matrices, which quantify mobility flows between all pairs of locations.

OD matrices are produced at multiple spatial resolutions:

  • point-to-point (individual stops),
  • sub-district,
  • district,
  • province.

Each matrix entry represents the aggregated intensity of movement from an origin to a destination, across one or more transportation modes. These matrices serve as the primary interface between mobility data and epidemiological models.

Integrating mobility into epidemiological models

In dengue transmission models, OD matrices are used to couple local transmission dynamics across space. Specifically, they allow models to:

  • simulate the importation and exportation of infections between regions,
  • weight interregional transmission by empirically observed connectivity,
  • explore counterfactual scenarios, such as mobility reductions or disruptions to highly central provinces.

By embedding real mobility patterns into the models, simulated epidemics more accurately reproduce observed spatial spread and timing, thereby improving both explanatory power and predictive accuracy.

Data quality, validation, and limitations

Extensive validation procedures ensure high data quality, including geographic checks, consistency tests, and cross-referencing with independent sources. Nevertheless, important limitations remain:

  • mobility is static rather than time-varying,
  • service frequency is used instead of true passenger counts,
  • Informal transport modes are not included.

These limitations are explicitly documented and inform ongoing and future work, including the planned integration of alternative mobility data sources and temporally dynamic flows.

Scroll to Top