SODAS Data Sprint

📡🚢🚢🚢📡

Jonas Skjold Raaschou-Pedersen

2025-02-28

Data

  • AIS data from the Danish Maritime Authority
  • We provide 4.5 years of processed data
  • Data is hosted at tresorit; see e-mail for link + password
  • Checkout the data sprint’s repo at https://github.com/jsr-p/sodas-data-sprint

Zip’ed CSV files vs Parquet

  • 4.5 years of AIS data zipped amounts to 821.6 GB
  • The same data in Parquet format amounts to 524.7 GB
Type Zip Parquet
year
2021 205.2 112.6
2022 199.0 109.7
2023 186.2 116.2
2024 207.1 167.7
2025 24.1 18.5

Final data sets size in GB

  • Resample data to get ship position every x minutes
  • Resampling to intervals 15m, 30m and 1h yields even smaller data sets
    • 15m; total size: 4.38 GB
    • 30m; total size: 1.95 GB
    • 1h; total size: 1.05 GB
  • 4.5 years of AIS data got reduced from 821.6 GB to just around 1GB
Freq 15m 30m 1h
Year
2021 1.08 0.48 0.25
2022 1.10 0.48 0.26
2023 0.96 0.45 0.24
2024 1.13 0.50 0.27
2025 0.11 0.04 0.03

Reading data

import polars as pl
# Download data from tresorit; plug and play
df = pl.read_parquet("data/aisdk-2024-1h.parquet")  
print(f"Shape of data: {df.shape}")
df.head(2)
Shape of data: (26953310, 26)
MMSI # Timestamp Type of mobile Latitude Longitude Navigational status ROT SOG COG Heading IMO Callsign Name Ship type Cargo type Width Length Type of position fixing device Draught Destination ETA Data source type A B C D
str datetime[μs] str f64 f64 str f64 f64 f64 f64 str str str str str f64 f64 str f64 str datetime[μs] str f64 f64 f64 f64
"205246000" 2024-01-01 00:00:00 "Class A" 56.702297 8.219783 "Under way using engine" 0.0 0.0 199.0 227.0 "Unknown" "Unknown" "Z510 DENNIS" "Undefined" null 9.0 38.0 "Undefined" null "Unknown" null "AIS" 10.0 28.0 4.0 5.0
"205246000" 2024-01-01 01:00:00 "Class A" 56.702277 8.219765 "Under way using engine" 0.0 0.0 26.0 227.0 "9215969" "OPUF" "Z510 DENNIS" "Fishing" null 9.0 38.0 "GPS" null "FISHING GROUNDS" null "AIS" 10.0 28.0 4.0 5.0

Inspect 2024

Inspect 2021-2025

Tracing 🦅 every hour

Tracing 🦅 every 15 minutes

Tracing 🦅 every hour 2023

🔌 + 🦅