When it comes to database replication, what is the use of global transaction identifiers? Why do we need to prevent concurrency across the servers? How is that prevention achieved exactly?
I tried to read the documentation at http://dev.mysql.com/doc/refman/5.7/en/replication-gtids.html but still could not understand it clearly. This may sound very basic but I would really appreciate if someone can explain the concepts to me.
Best How To :
The reason for the Global Transaction ID is to allow a MySQL slave to know if it has applied a given transaction or not, to keep things in sync between Master and Slave. It can also be used for restarting a slave if a connection goes down, again to know the point in time. Without using GTIDs, replication must be controlled based on the position in a given binary transaction log file (bin log). This is much harder to manage than the GTID method.
A master is the only server that is typically written to, so that slaves merely rebuild a copy of the master by applying each transaction in sequence.
It is also important to understand that MySQL replication can run in one of 3 modes:
- Statement-based: Each SQL statement is logged to the binlog and replicated as a statement to the slave. This can be in some cases ambiguous at the slave causing the data to not match exactly. (Most of the time it is fine for common uses).
- Row-based: In this mode MySQL replicates the actual data changes to each table, with a "before" and "after" picture of each row, which is fully accurate. This can result in a much larger binlog, for example if you have a bulk update query, like: UPDATE t1 SET c1 = 'a' WHERE c2 = 'b'.
- Mixed: In this mode, MySQL will use a mix of statement-based and row-based logging in the binlog.
I only mention the modes of replication, because it is mentioned in the doc you referenced that Row-based is the recommended option if you are using GTIDs.
There is another option called Master-Master replication, where you can write to two masters (each acting as a slave for the other), but this requires a special configuration to ensure that the data written to each master is unique. It is much trickier to manage than a typical Master/Slave setup.
Therefore, the prevention of writes to a Slave is something that you must ensure from your application for a typical replication process to function correctly. It is fine to read from a Slave, but you should not write to it. Note that the Slave can be behind the Master if you are using it for reads, so it is best to perform queries for things that can be behind the Master (like reports that are not critical up to the second or millisecond). You can ensure no writes to the Slave by making your common application user a read-only user for the Slave server, and a read-write user for the Master.