distributed database systems
Distributed database systems are systems composed of several machines into a network. Usually the data is partitioned across the machines and the execution of the query can be distributed in the network.
Execution approaches
There is two main approaches to execute the query are the following :
- pushing query to data : send the query to the machines storing the data
- pull data to query : send the data to the machine executing the query
The approach 1 seems to be more efficient in general, but there are cases where 2 is the only approach available e.g. : data integration.
System architecture
Two main options:
- shared nothing where each part of the data is stored in at most one machine, each machine is efficient to apply filter on its part
- shared disk where storage is fully distributed (object stores: S3, Garage).
The shared nothing approach should have better performance, since the filters are applied before the data transfer. On the other hand, shared disk is more resilient to machine failure and more scalable.