Another day, another failed caption

Waste the rest of your day with yet another awful Microsoft image detection bot

“Called the CaptionBot, the tool looks at the image you’ve uploaded to describe what it sees in the picture. It’s similar to what Microsoft demonstrated in its Seeing AI video, during which a bot helps a blind person describe what’s around him. But if this AI is to believed, we’re rather worried for that blind Microsoft developer.” – TheNextWeb

Let’s run all the pictures from TheNextWeb’s article through the Clarifai API to see what results we get, shall we?

Source: TheNextWeb
Source: TheNextWeb


Source: TheNextWeb


Source: TheNextWeb


Source: TheNextWeb



Animals vs. Food vs. A.I.

Chihuahua or muffin? Internet goes crazy for animals v food trend

The hallmark of any great visual recognition A.I. is its ability to distinguish animals from food, obviously. Clarifai weighs in at an impressive 95.8% accuracy when put to this very scientific and adorable test:

Puppy or Bagel? Clarifai Score: 16/16

Labradoodle or Fried Chicken? Clarifai Score: 16/16

Chihuahua or Muffin? Clarifai Score: 14/16

Sheepdog or Mop? Clarifai Score: 15/16

Shrew or Kiwi? Clarifai Score: 16/16

Kitten or Ice Cream? Clarifai Score: 16/16

Parrot or Guacamole? Clarifai Score: 14/16

Shiba Inu or Marshmallow? Clarifai Score: 16/16

Barn Owl or Apple? Clarifai Score: 15/16

If you want to test Chihuahua or Muffin for yourself, we made a handy webapp for you to try!

I am not really confident but …

Captionbot is another AI from Microsoft, and it’s not doing so hot either

Ok, we’ve been playing around with Captionbot since it came out last week, and it looks like Microsoft purposefully left out recognition for gorillas/apes/monkeys in an effort to avoid the Google Photos fiasco where Google mislabelled black people as gorillas.

Take a look at the evidence:


Seriously, a black hat, Microsoft? Two giraffes near a tree? A cat wearing a tie??? We get it – image recognition is hard. No one wants to pull a Google Photos. But does that make it ok to forgo teaching your model an entire concept?

Look at Clarifai’s image recognition results for the exact same images:


Clarifai demonstrates that teaching visual recognition to be smart IS possible. Instead of omitting concepts that are difficult to teach computers, let’s find ways to make our technology smarter!