Since your initial code is in python and it doesn't make much of a difference whether writing MR in python or Java, (3) should be the best option to pursue for your scenario. You might also like to explore libraries like https://github.com/Yelp/mrjob which make it easier to write MR jobs...