I'm doing some kmeans clustering:
Regardless of how many clusters I choose to use, the percentage of point variability does not change:
Here's how I am plotting my data:
# Prepare Data mydata <- read.csv("~/student-mat.csv", sep=";") # Let's only grab the numeric columns mydata <- mydata[,c("age","Medu","Fedu","traveltime","studytime","failures","fam mydata <- na.omit(mydata) # listwise deletion of missing mydata <- scale(mydata) # standardize variables ibrary(ggplot2) # K-Means Clustering with 5 clusters fit <- kmeans(mydata, 5) #to change number of clusters, I change the "5" # Cluster Plot against 1st 2 principal components # vary parameters for most readable graph library(cluster) clusplot(mydata, fit$cluster, color=TRUE, shade=TRUE, labels=0, lines=0)
How do we affect the percentage of point variability?