I have tried to reproduce the results from the PCA tutorial on here (PCA-tutorial) but I've got some problems.
From what I understand I am following the steps to apply PCA as they should be. But my results are not similar with the ones in the tutorial (or maybe they are and I can't interpret them right?). With n_components=4 I obtain the following graph n_components4. I am probably missing something somewhere, I've also added the code I have so far.
My second problem is about annotating the points in the graph, I have the labels and I want each point to get the corresponding label. I've tried some things but with no success so far.
I've also added the data set, I have it saved as CSV:
,Cheese,Carcass meat,Other meat,Fish,Fats and oils,Sugars,Fresh potatoes,Fresh Veg,Other Veg,Processed potatoes,Processed Veg,Fresh fruit,Cereals,Beverages,Soft drinks,Alcoholic drinks,Confectionery England,105,245,685,147,193,156,720,253,488,198,360,1102,1472,57,1374,375,54 Wales,103,227,803,160,235,175,874,265,570,203,365,1137,1582,73,1256,475,64 Scotland,103,242,750,122,184,147,566,171,418,220,337,957,1462,53,1572,458,62 NIreland,66,267,586,93,209,139,1033,143,355,187,334,674,1494,47,1506,135,41
So any thoughts on either of those problems?
import pandas as pd import matplotlib.pyplot as plt from sklearn import decomposition demo_df = pd.read_csv('uk_food_data.csv') demo_df.set_index('Unnamed: 0', inplace=True) target_names = demo_df.index tran_ne = demo_df.T pca = decomposition.PCA(n_components=4) comps = pca.fit(tran_ne).transform(tran_ne) plt.scatter(comps[0,:], comps[1, :]) plt.title("PCA Analysis UK Food"); plt.xlabel("PC1"); plt.ylabel("PC2"); plt.grid(); plt.savefig('PCA_UK_Food.png', dpi=125)