distributed database systems

Distributed database systems are systems composed of several machines into a network. Usually the data is partitioned across the machines and the execution of the query can be distributed in the network.

Execution approaches

There is two main approaches to execute the query are the following :

  1. pushing query to data : send the query to the machines storing the data
  2. pull data to query : send the data to the machine executing the query

The approach 1 seems to be more efficient in general, but there are cases where 2 is the only approach available e.g. : data integration.

System architecture

Two main options:

  1. shared nothing where each part of the data is stored in at most one machine, each machine is efficient to apply filter on its part
  2. shared disk where storage is fully distributed (object stores: S3, Garage).

The shared nothing approach should have better performance, since the filters are applied before the data transfer. On the other hand, shared disk is more resilient to machine failure and more scalable.

This post accepts webmentions. Do you have the URL to your post?

Otherwise, send your comment on my service.

Or interact from the fediverse with your username:

fediverse logo Share on the Fediverse