I am playing with Cassandra cluster on Azure. It took me more than 1 week to get the setup done and working. Its working now but i think i misunderstood 1 concept of Cassandra i.e. how data is stored?
i was under the impression that when ever i upload the data, same data will be available across all machines i.e. if i upload 10 MB file and 4 nodes (2 seeds), all 4 will have 10MB of consumption. I guess i am wrong about this.
Because i just created 4 nodes and uploaded around (2 + 5 + 20 + 20 = 47 MB) of data (blob) but when i check the status, i see below
-- Address Load Tokens Owns (effective) Host ID Rack UN 10.1.2.5 28.32 MB 256 47.5% xxxxxxxx-eb9a-46fb-8213-c7487074d9a8 rc1 UN 10.1.2.4 27.14 MB 256 51.3% xxxxxxxx-11ed-41c6-be8b-a912e54b1ccf rc1 UN 10.1.2.7 25.09 MB 256 50.1% xxxxxxxx-9e73-410a-b1bf-5bfd15138625 rc2 UN 10.1.2.6 23.32 MB 256 51.2% xxxxxxxx-d132-49b6-8eda-4459391d12e4 rc2
BTW, replication factor for tables was "2". Load is changing slightly every couple of mins. but i can download the data and its as expected!
Sorry for being lazy and unfair, i have been googling setup for a week and would really appreciate if you could help me understand this or at least point to me to proper link.