Distributed DuckDB Instance

(github.com)

25 pontos | por citguru 7 horas atrás

4 comentários

  • herpderperator
    6 horas atrás
    Does this help with DuckDB concurrency? My main gripe with DuckDB is that you can't write to it from multiple processes at the same time. If you open the database in write mode with one process, you cannot modify it at all from another process without the first process completely releasing it. In fact, you cannot even read from it from another process.

    So if you typically use a file-backed DuckDB database in one process and want to quickly modify something in that database using the DuckDB CLI (like you might connect SequelPro or DBeaver to make changes to a DB while your main application is 'using' it), then it complains that it's already in use and thsu locked.

    This is unlike SQLite, which supports and handles this in a thread-safe manner out of the box. I know it's DuckDB's explicit design decision[0], but it would be amazing if DuckDB could behave more like SQLite, in this way.

    [0] https://duckdb.org/docs/current/connect/concurrency

  • nehalem
    6 horas atrás
    I have a deep appreciation for DuckDB, but I am afraid the confluence of brilliant ideas makes it ever more complicated to adopt —- and DuckLake is another example for this trend.

    When I look at SQLite I see a clear message: a database in a file. I think DuckDb is that, too. But it’s also an analytics engine like Polars, works with other DB engines, supports Parquet, comes with a UI, has two separate warehouse ideas which both deviate from DuckDB‘s core ideas.

    Yes, DuckLake and Motherduck are separate entities, but they are still part of the ecosystem.

  • Lucasoato
    6 horas atrás
    Last week I’ve sent my first PR in duckdb to support iceberg views in catalogs like Polaris! Let’s hope for the best :)
  • citguru
    7 horas atrás
    This is an attempt to replicate MotherDucks differential storage and implement hybrid query execution on DuckDB
    • zurfer
      6 horas atrás
      As someone working in the field I have to admit that I'm not familiar with the terms differential storage nor do I really understand what hybrid execution means. Maybe you could describe it both from a simple technical point of view and what benefits it has to me as a user?