distributed database systems

Published on 30/01/2024

Distributed database systems are systems composed of several machines into a network. Usually the data is partitioned across the machines and the execution of the query can be distributed in the network.

Execution approaches

There is two main approaches to execute the query are the following :

pushing query to data : send the query to the machines storing the data
pull data to query : send the data to the machine executing the query

The approach 1 seems to be more efficient in general, but there are cases where 2 is the only approach available e.g. : data integration.

System architecture

Two main options:

shared nothing where each part of the data is stored in at most one machine, each machine is efficient to apply filter on its part
shared disk where storage is fully distributed (object stores: S3, Garage).

The shared nothing approach should have better performance, since the filters are applied before the data transfer. On the other hand, shared disk is more resilient to machine failure and more scalable.

Execution approaches

System architecture

Author