I build dynamic websites where structure is hierarchically saved in the database (Own CMS). I am using the Adjacency model to manage this database tables (PHP and Mysql through PDO)
I detected that Google is indexing pages that it should not.
An example of a tree structure used for navigation:
home about us products productgroup 1 productgroup 2 contact support sales
Imagine this structure in a pulldown menu with links to the pages. When I select products->productgroup 1 I get a url like www.domain.com/products/productgroup-1 which pulls the data from the database (based on the last uri element: productgroup-1, a slug version of the title) and shows it in my template. I do not query all elements, only the last (I should, I know).
So far so good. Google is indexing this page as expected:
But... When I use Google webmaster tools I see a lot of pages indexed with 404's, like:
And so fort.
These pages are empty and have no link in the navigation structure.
I have designed my structure so that these pages return a 404 error. Webmastertools confirms this but keeps indexing these pages. I know I can use robots.txt to disallow Google's search bot to keep it drom indexing url's. Is there another way to do this? Should I generate a 403 instead of a 404?
I am in the dark here.