I am building a RESTful API. The only problem I have is how to do the authentication, since I want a stateless approach where the only information the server has is in the request itself.
So I thought I would look how the big boys do it.
I see most services issue users/applications a token. That is then used on each subsequent request. For example Twitter and GitHub use OAuth2 and I see that they issue a bearer token. So far, so good - stateless, clean and simple:
$ curl -H "Authorization: token OAUTH-TOKEN" https://api.github.com/xyz
However I have a question: Do I store that OAUTH-TOKEN token in my database to verify the user ... and if so, how?
(Edited to clarify question)
Let's say this is my database table:
user | token
abc | 123
xyz | 789
The first user wants to make an API request using their token. So they know their token is "123" and so they do:
curl -H "Authorization: Bearer 123" https://myapi.com
That's all the information my API has to go on, so it looks up WHERE token = "123", and finds out it's user "abc". Simple. All good. Response returned.
Ideally I want my table to be like that (simple, no overhead) so my question really was: is it a bad idea to store the tokens in the database like that?
(I guess it is because I've got in the habit of thinking this is bad just because of dealing with normal email/password rows)
So then I thought, ok, let's say I do need to hash those tokens in my table: how would I then look up the row? That was where your final question about the lookup on the hashed value comes in: I assumed there is a chance of a collision, since if two tokens had the same hash, then if you look up based on the hashed value alone you wouldn't know which user had made the request, surely?
Which brought me on to how to add the additional value of how to identify the row. Just like how you need both an email and a password to identify a row - not just a password - I wondered what the equivalent would be here for an API request. But yes, the simplest solutions are the best and I think that simply passing it along with the token does solve the problem neatly.
So really you've answered the "how would I identify the row if I do need to store the tokens hashed" question.
The only question that remains is "Do I even need to store them hashed - and incur that overhead?"