What is Captcha (I'm not a robot)
Have you ever been surfing the internet when you come across one of these boxes that say: "I'm not a robot."? So you check the box and go on your way. But how the heck does this box know whether you're a robot or not and why does it matter? Well, to answer that, we actually have to start with these: They're called CAPTCHAs: Completely Automated Public Turing Test to tell Computers and Humans Apart. They were invented in 2003 by Luis von Ahn and his team of researchers at Carnegie Mellon University. The whole point of these distorted pieces of text was to stop spam on the internet, like preventing scalpers from writing a computer program that buys every ticket in a fraction of a second. They work because humans could read the distorted text yet computers and bots can't. [You shall not pass!] So if you want to stop bots from buying concert tickets or setting up email addresses, we just have to make filling out a CAPTCHA part of the process. So fast forward and now millions of CAPTCHAs are being solved every single day by internet users and Von Ahn on started to think: can we do something useful with all this great power? [And the answer to that is, yes, and this is what we're doing now.] So they decided to use that brainpower to digitize every single physical book we have and the way to do that is to take real physical books, scan them, and then use optical character recognition software to translate the words into digital text. What they did was take any words that were too hard for the computer to decipher and upload them into the reCAPTCHA database. So going forward, instead of showing random distorted text, CAPTCHA started to show words from books that computers couldn't understand and when enough people on the internet solving these CAPTCHAs wrote the same word for a piece of text shown, that word would be confirmed and uploaded to an ebook database. Von Ahn called this project reCAPTCHA.
|
|
Their slogan was "stop spam, read books". At this point, a hundred million reCAPTCHAs were being solved every day, the equivalent of 2.5 million books a year. So Google was like: "let's acquire reCAPTCHA". And they did, in 2009, and they used that brainpower to digitize all the New York Times archives since the 1800s, as well as all of Google books. And when they ran out of those, Google started giving people street numbers from Google Street view to help label Google Maps. So everything worked out happily ever after, but not really because there were a couple of problems. The first is that even though reCAPTCHAs work, they weren't too accessible, so blind people had a much harder time filling out forms and signing up for things on the internet. So they made audio reCAPTCHAs as well that sound like this: [Six] [Four] [Zero] [Nine] But regardless, reCAPTCHAs became a burden for people with dyslexia, poor hearing, poor sight, as well as other sensory impairments. The other problem was that paid services started popping up that solved CAPTCHAs for you. The services work because they took your CAPTCHAs and shipped them off to CAPTCHA Farms in third-world countries where workers would be paid dirt cheap to solve your CAPTCHAs and ship them back to you, the client. And the last problem, which is perhaps the most important, was that computer vision technology was becoming so good that bots were starting to solve these CAPTCHAs and get through. So engineers got to thinking and thought: "why not make CAPTCHAs harder to solve? So they made CAPTCHAs have more twists and turns and added some noise and threw in random lines, but as time went on the technology caught on and bots were once again getting through. So Google decided to do some research and they found that humans got these complex complicated captions right only about 33 percent of the time and their advanced computer technology at Google was getting them right 99.8 percent of the time. Shoot, that computer vision technology was on a [WHOLE 'NOTHER LEVEL] So Google decided to change things. They got rid of the distorted text CAPTCHA and they came up with this: And they called it "No CAPTCHA reCAPTCHA". |
|
When you click it, it sends over an HTTP request to Google with a whole bunch of useful information. Things like your IP address, your country, a timestamp. Information from your browsers, such as the way you move your cursor just moments before entering the checkbox. How you were scrolling the page before the click, the time interval between different browser events, and many other variables that Google will keep secret. All these criteria are then processed by a machine learning risk analysis engine at Google and most of the time the information can tell the difference between a human and a bot. But if the risk analysis engine still isn't sure, then for a small percentage of users they'll often complete an additional challenge. An image recognition CAPTCHA. Something like picking all images with a storefront or picking all the sections of an image that show a street sign. And if you prove that you're a human once this way then chances are Google's engine will remember. And next time after clicking that checkbox, you'll be able to pass right through with ease.
Thanks