The mobility dataset was developed for the Arbothai project with the aim of understanding human mobility patterns in Thailand for dengue transmission modelling.

The dataset encompasses multiple transportation modes at both national and metropolitan (Bangkok) scales, including bus (8,036 routes), train (386 routes), ferry (177 routes), and flight (434 domestic routes), as well as detailed urban transit data for Bangkok, comprising 369 unique routes consisting of bus, river and canal ferries, and commuter rail (metro and traditional railway) systems.

Data collection at the national level employed web data collection from BusOnlineTicket.co.th (2023 to 2024), complemented by aviation data from Airports of Thailand (Annual Report 2023). For Bangkok’s internal transportation system, data was collected from TransitBangkok.com with trip frequencies from General Transit Feed Specification (GTFS) data provided by the Ministry of Transport (2024).

The resulting origin-destination (OD) matrices provide mobility flows from point-to-point connections to province-level aggregations, with integrated network centrality measures. Validation against multiple sources confirms the reliability of the data.

Figure 1: Overview of intercity bus, train, ferry, and domestic flight connections across Thailand, illustrating the spatial extent and density of the national mobility system.

Modelling context

Dengue spreads locally through mosquitoes, but infected travellers can carry the virus to new regions, where local mosquitoes may then transmit it. Standard epidemiological models assume uniform mixing between regions or use gravity models based on population and distance. These approaches may fail to capture actual movement and thereby disease transmission patterns. This project uses actual transport data to build networks and flow matrices that can be used to improve the accuracy of disease transmission models.

Figure 2: Top 100 origin-destination pairs by volume. Line thickness and colour (red = high, blue = low) show flow. Bangkok dominates with radial connections. Secondary corridors: Chiang Mai to Chiang Rai (north), Hat Yai to Songkhla (south).

A multimodal and multiscale mobility dataset

To capture the full spectrum of human movement, the project integrates multiple transportation modes across different spatial scales.

At the national level, the dataset includes:

  • intercity bus services,
  • long-distance and regional trains,
  • passenger ferries connecting coastal and island regions,
  • domestic air travel.

At the metropolitan level, the focus is on Bangkok, whose dense and complex transit system plays a dominant role in national mobility. Urban data includes:

  • elevated and underground metro systems (BTS and MRT),
  • commuter rail lines,
  • the airport rail link,
  • an extensive bus network,
  • river and canal boat services.

Together, these layers capture both long-range interprovincial connectivity and short-range urban movement, allowing models to represent how infections may spread between provinces and then circulate within large population centres.

Figure 3: Detailed map of Bangkok’s metro, rail, bus, and water‑based transport systems, illustrating fine‑scale urban mobility in the country’s main transportation hub.

Data sources and collection strategy

All data are collected from publicly accessible sources, selected to maximise coverage, consistency, and reproducibility. Route catalogues, service frequencies, and stop information are extracted from transport platforms and operator data, while aviation flows are obtained from official air travel reports.

Because different sources provide different types of information, a key challenge was data harmonisation. The project applies a unified processing pipeline that standardises naming conventions, resolves geographic inconsistencies, and ensures that all transportation modes can be analysed within a common framework.

Importantly, service frequency (e.g. trips per day) is used as a proxy for movement intensity. While this does not directly measure passenger counts, it provides a consistent and widely available indicator of potential mobility flows across all modes.

Routes to networks: data processing and architecture

Raw route listings and schedules are not directly suitable for epidemiological modelling. To bridge this gap, the Arbothai project implements a structured, multi-step processing pipeline.

The pipeline:

  1. extracts route catalogues for each transport mode,
  2. parses route details such as origins, destinations, operators, and service frequencies,
  3. identifies unique geographic locations and geocodes them,
  4. merges route and location data into integrated datasets,
  5. constructs directed, weighted transportation networks,
  6. generates origin–destination matrices at multiple spatial resolutions.

This architecture ensures that every modelling input can be traced back to its original data source, supporting transparency and reproducibility.

Network representation of mobility

Mobility is represented as a set of directed graphs, where:

  • nodes correspond to geographic entities (aggregated to provinces at the national level, districts and subdistricts at the Bangkok level),
  • edges represent direct transport connections,
  • edge weights reflect service frequency.

This representation captures not only whether two locations are connected, but also the strength and directionality of that connection. Such a structure is essential for understanding asymmetric flows and identifying routes that disproportionately contribute to national connectivity.

Figure 4: Force‑directed layouts of bus, train, ferry, flight, and combined networks, showing differences in topology and connectivity between modes.

Network analysis and centrality metrics

Once the transportation networks are constructed, they are analysed using tools from network science to identify structurally important locations.

Several complementary centrality measures are calculated:

  • Degree centrality identifies highly connected locations.
  • Betweenness centrality highlights provinces that act as bridges between otherwise weakly connected regions.
  • Eigenvector centrality and PageRank identify nodes that are connected to other important nodes, capturing indirect influence within the network.

These metrics reveal strong spatial hierarchies in Thailand’s mobility system. Bangkok consistently emerges as the dominant hub across all measures, reflecting its role as the country’s economic and transportation centre.

To better understand regional dynamics beyond the capital, additional analyses remove Bangkok from the network. This reveals secondary hubs and alternative transmission pathways that may become critical during Bangkok-centred outbreaks or targeted interventions.

Figure 5: Comparison of eigenvector centrality and PageRank at the province level, highlighting locations that play a disproportionately important role in national connectivity.
Figure 6: Impact of sequentially removing highly central provinces on overall network connectivity, illustrating the structural importance of key mobility hubs.

Origin-destination matrices as model inputs

The central modelling outputs of the project are origin-destination (OD) matrices, which quantify mobility flows between all pairs of locations.

OD matrices are produced at multiple spatial resolutions:

  • point-to-point (individual stops),
  • sub-district,
  • district,
  • province.

Each matrix entry represents the aggregated intensity of movement from an origin to a destination, across one or more transportation modes. These matrices serve as the primary interface between mobility data and epidemiological models.

Figure 7: Heat map representations of OD matrices for selected transportation modes, illustrating the intensity and directionality of interprovincial mobility flows.

Integrating mobility into epidemiological models

In dengue transmission models, OD matrices are used to couple local transmission dynamics across space. Specifically, they allow models to:

  • simulate the importation and exportation of infections between regions,
  • weight interregional transmission by observed connectivity,
  • explore counterfactual scenarios, such as mobility reductions or disruptions to highly central provinces (like Covid).

By embedding real mobility patterns into the models, they can more accurately reproduce observed spatial spread and timing, thereby improving both explanatory power and predictive accuracy.

Data quality, validation, and limitations

Extensive validation procedures ensure high data quality, including geographic checks, consistency tests, and cross-referencing with independent sources. Nevertheless, important limitations remain:

  • mobility is static rather than time-varying,
  • service frequency is used instead of true passenger counts,
  • informal transport modes are not included.

These limitations are explicitly documented and inform ongoing and future work, including the planned integration of alternative mobility data sources and temporally dynamic flows.

Scroll to Top