I'm scraping reddit usernames using Python and I'm trying to extract the username from an URL. The URL looks like this:
This is my code:
start = url.find('https://www.reddit.com/user/') + 28
end = url.find('?', start)
end2 = url.find("/", start)
return url[start:end] and url[start:end2] and url[start:]
The first part works but removing the question mark and forward slash doesen't. Maybe I'm using the "and" keyword wrong? Which means I sometimes get something like this:
I know I can use the api but i'd like to learn how to do it without. I've also heard about regular expressions but aren't they pretty slow?
Best How To :
You could use
>>> s = "https://www.reddit.com/user/ExampleUser/comments/"
>>> import re
>>> re.search(r'https://www.reddit.com/user/([^/?]+)', s).group(1)
[^/?]+ negated character class which matches any character but not of
? one or more times.
() capturing group around the negated character class captures those matched characters. Later we could refer the captured characters through back-referencing (like
\1 which refers the group index 1).
By defining a separate function.
>>> def extract_username(url):
... return re.search(r'https://www.reddit.com/user/([^/?]+)', url).group(1)