Get the latest tech news
Loading a trillion rows of weather data into TimescaleDB
- What are we even doing? - The insertstatement - The copystatement - Tools - Tweaking Postgres settings - So what’s the best method? - Appendices - Footnotes What are we even doing? Why build a weather data warehouse? I think it would be cool to have historical weather data from around the world to analyze for signals of climate change we’ve already had rather than think about potential future change. If we had a huge weather data warehouse we could query it to figure out whether Jakarta is actually warmer or stormier these days and exactly how is it warmer (heat waves, winter highs, etc.).
Hourly data stretching back to 1940 is 727,080 snapshots in time for each variable like temperature, precipitation, cloud cover, wind speed, etc. At first it would seem that pg_bulkload is much faster, however, this is because by default it bypasses the shared buffers and skips WAL logging so data recovery following a crash may not be possible while timescaledb-parallel-copy does not and does things more safely. This may not always be the case but inserting into a regular table then converting it to a hypertable and migrating the data will probably always be slower as the conversion/migration process is not super fast and seems to be single-threaded.
Or read this on Hacker News