Turning a negative into a positive.

When I was coding up views.py for HashPix I found that the Instagram API would sometimes barf at me with an APINotAllowedError exception.

API not allowed? What?! How could that be?

I was using the official python-instagram library! Why would they have a method that returns an APINotAllowed exception? What was even more frustrating was that the error wasn’t due to a faulty network. None of my usual test-searches were causing the API to barf the unexpected APINotAllowed error.

Was it random, then? No, it couldn’t be. Computers are, by design, deterministic. Something had to be wrong with the code and it drove me mad trying to figure out what it was. Was I in breach of some unknown protocol? Did I initialize the connection parameters wrong? For a few minutes I didn’t know what to do. So, like a diligent newbie, I pasted print statements (and on occasion an import ipdb;ipdb.set_trace() as well) and tried to pin-point the exact part of the code where I was screwing up.

Nothing.

And then I noticed a strange hashtag (WARNING: NSFW link) in the logs.

A brief explanation before I continue with the story. One of the first people to try out the app was my darling wife. On that particular day, she was researching sewing methods because she wanted to stitch a skirt all by herself and, naturally, she searched for the hashtag #skirt to check out a few designs.

The Negative

I swear I don’t know how or why, but during one of my test searches, the word ‘up’ got prepended to her search term – ‘skirt’. The result, as you can plainly see, was clearly NSFW. Kinda amusing, but one hundred percent N-S-F-W.

Was Instagram actually blocking API requests that searched for pics tagged with hashtags that they deemed NSFW? I tested my hypothesis with a few more NSFW terms (purely out of scientific curiosity, I assure you) and found that the hypothesis, indeed, held true!

Brilliant. Now that I had found what was causing the exception, all I had to do was catch the APINotAllowed exception, log it and let the whole search attempt fail silently, right?

Nope.

The Positive

I was actually thankful that Instagram had decided to block API requests that they deemed NSFW. In doing so, they had actually given me a way to auto-classify hashtags as NSFW or SFW! I reasoned that if Instagram refused to show me pics related to the hashtag, I could simply mark it as NSFW at my end and convey the same to the user!

And that is exactly what I did.

Turning the Negative into a Positive
By default, I treat all hashtags as safe. I assign an nsfw=False to each hashtag and check it against a list of commonly recognized NSFW tags – no stemming, no NLTK, just a plain ol’ list at the moment:

def check_nsfw(tag):  
    return bool(tag in nsfw.NSFW_LIST)

(You can hazard a guess if you want, or maybe I’ll upload a gist of the list nsfw.NSFW_LIST to GitHub one of these days. For science, of course.)

Armed with this new knowledge, I caught all the exceptions thrown by the Instagram API, compiled them into a list called instagram_errors and then modified my code to do this:

nsfw = map(lambda x: "APINotAllowedError" in x, instagram_errors)  
nsfw = (True in nsfw) or check_nsfw(hashtag_searched_by_user)  

The first line checks if Instagram has returned the sneaky APINotAllowedError in the errors list. The second line is a two-part conditional expression. The first half, simple follows up on the previous line and the second half is a safety net, just in case. That’s it.

Thanks to Instagram’s sneaky rejections, I now have a semi-foolproof way of automatically identifying if a hashtag searched by someone is liable to return NSFW pics. Of course, this system isn’t infallible, a few ambiguous tags (WARNING: EACH OF THOSE LINKS IS NSFW!) occasionally seem to slip through the cracks. But, I have grown wiser since and provided myself with a way to manually tag them as NSFW in the admin.

DISCLAIMER

I am in no way judging anyone who searches for these hashtags. Heck, I don’t even know who searched for what! I don’t have a visitor tracking script. I do have a session cookie; I use it so that you can compile images you like into an album anonymously but there’s no tracking info associated with it – I just don’t have the space.

I just thought the whole thing was funny and it proved to be an awesome learning experience for me, so I thought I should share it here. I hope it didn’t offend you. :)

PS: I DO track outbound clicks anonymously on this blog. Just thought you should know. :P