I'm working through a scraping preparation function where pages of results lead to product pages. The function has a default maximum number of results pages, or pages per set of results, to crawl to prevent a simple mistake.
Here's what I have so far. Does the way I'm implementing the maximums with the for loops make sense? Is there a more "pythonic" way? I'm coming at this from a completely learning perspective. Thanks.
def my_crawler(url, max_pages = 1, max_items = 1): for page_number in range(1, max_pages + 1): url = url + str(page_number) source_code = requests.get(url).text products = SoupStrainer(class_ = 'productTags') soup = BeautifulSoup(source_code, 'html.parser', parse_only=products) for item_number, a in enumerate(soup.find_all('a')): print(str(item_number) + ': ' + a['href']) if item_number == max_items - 1: break my_crawler('http://www.thesite.com/productResults.aspx?&No=')