What If Siri Could See?

Updated Fri, Jul 8, 2016, 3:27 PM·8 min read

It is human nature to wonder what the future will look like. How will the evolving technology of today impact society in the long run? In the '80s, humans depicted their imagination through movies such as Terminator and Short Circuit. These films personified machines, specifically robots. In these films not only were computers interacting with humans, but they exemplified true cognition, or understanding, of context through their ability to answer complex questions and even read the emotions of their human counterparts.

This depiction of the future of technology motivated society to innovate. Ever striving to replicate the possibility portrayed in the movies of a machine that looks, acts, and thinks like a human. This representation that we envision is a result of what we believe technology has the potential to be in the future - humanlike. So, we continue to constantly develop in order to engage with computers in new ways. Through doing this, we are actually teaching computers to engage with us. This goes beyond teaching them to react to commands and act accordingly, but by helping them to think for themselves with true a understanding.

We do this by teaching computers as we do children. We teach our kids from a young age to utilize each of their senses in order to make inferences about things within the world. From this they learn and develop a true understanding about people, places, and objects. As computers don't have senses to make inferences, we need to provide them with this ability. This means further developing technologies such as voice recognition, the ability for computers to hear, and image recognition, the ability for computers to see. These two "senses" are allowing computers to continue to learn about the world around them in order to gain understanding. With these advancements, computers have the potential to do extraordinary things for mankind.

VOICE RECOGNITION: I SAY JUMP, YOU SAY HOW HIGH

Computers are very good at completing tasks by following a set of instructions down to every last detail. This is why we have "bugs," computers sometimes can't figure out what we intend for them to do, but rather just do exactly what we instruct them to. As a result, there were many issues when companies in the early '90s tried to realize the idea of voice recognition. They were trying to bridge the communication gap between humans and computers in order to allow both to communicate in a new vertical, through speech, but they were having trouble programming computers to understand what humans meant by what they were saying.

For example, when humans would speak to early software they would say, "Do you recognize speech?" Computers would hear, "Did you wreck a nice beach?" The smallest differences in our inflections and voice easily confused the software, which became a painstaking process for humans to correct. Consider the differences in the meaning of the words their, there, and they're. Even though these sounds the same, they have totally different meaning. This is what computers struggled to comprehend and decipher.

These early attempts served as a building block. It wasn't until Siri that a computer finally displayed voice recognition that we had only previously seen in the movies. The creators of Siri did this through a combination of AI and voice recognition. Although Siri does make some mistakes, she has the ability to recognize speech and react appropriately to human commands. If you ask Siri, "what's the weather," she responds with, "It is 80 degrees."

Her ability to genuinely understand the context of situations comes from the combination of voice recognition and AI. Although the technology isn't flawless, she understand what we are trying to tell her most of the time. She shows that it's not about recognizing bits of speech, but rather understanding the meaning of what we are communicating.

IMAGE RECOGNITION: A CAT IS A CAT, A DOG IS A DOG

As humans, we retain 80% of what we see. This is a significant amount compared to the 20% of information that resonates with us when we read, and the 10% that sticks after we hear something. However, 90% of all information sent to the brain is visual, meaning that almost all of our perception is visual.

The brain is a complex, yet beautifully simple organ. It contains a vast network of connections between neurons that we can model mathematically. Knowing this, scientists are able to train the brain to recognize inputs from any source. This is called neuroplasticity. What we can draw from this is that the brain's neuron structure, when networked with others, can "learn" anything. The same mathematical simulation can be programmed into the brain of computers. This process is being used to teach computers to make mistakes and correct them, which has been extremely helpful in programming them with the ability to see. As a result, the image recognition software we have today is able to allow computers to identify objects by taking a photo.

With this capability, computers can technically see and are increasingly gaining the ability to understand images. This detail is the difference between a computer seeing an apple and being able to identify it as a Golden Delicious, or seeing a plant and being able to identify it as Aloe. It's taking surface level thoughts and adding in the ability to infer based on specific details in order to draw a specific conclusion as to what something is based off of context.

The future of this technology is depicted in the movie Her. In the movie, a human man falls in love with his virtual assistant and through using his phone's camera is able to show her his surroundings. This helps her to interpret what situations he's in, picture how things work, and gain a better understanding the world. What can be learned from Her is that a major component for computers to truly understand the world and continuously learn is to be able to see. With this ability, there is no need to type in queries or to make verbal commands to personal assistants. Machines would be able to visualize what was going on around them, gather information, and act accordingly.

TRUE COGNITION: MACHINES THINK FOR THEMSELVES

Visual cognition, when computer vision has a true understanding of images rather than just simple visual recognition, is bridging the gap between humans and technology. Imagine wearable computers that are able to see and interpret the world. The technology could be embedded in different devices such as smart contacts for humans. The lenses could automatically recognize everything you are looking at and you would instantly be provided with information about the breed of a dog passing you on the street, medical advice for a rash you contracted after hiking, or the recipe for making homemade cookies while walking through the grocery store.

This technology has the one crucial element which we are missing in the connected devices of today - the ability to not only see, but to understand. With this, Roombas of the future automatically identify and vacuum dirt piles without mapping the whole room, garden probes alert humans when their vegetable garden is ready to be picked or at the earliest stages of disease, and virtual assistants can call the attention of parents when their child wanders away in a store. In the future, your home will be able to tell beforehand who your visitors are. Based on an understanding of the way in which certain individuals dress, virtual assistants will know if you just received a package from UPS or if your next door neighbor is coming to say hello.

THE FUTURE OF MACHINES: INDEPENDENT LEARNING

Technology is advancing daily and with each new innovation we see a change in human behavior and ultimately society as a whole. The world is increasingly becoming more and more connected. Computers no longer only have a home on your desk, but in every room of your house, in your pocket, and even on your wrist.

As with every piece of assistive technology created throughout history, machines with visual cognition would help humans to focus their time and attention on other things. This means devoting energy to accelerate productivity in other areas of life and allowing machines to take care of smaller projects and other duties.

Biological evolution happens over the course of millennia, however as Moore's law states, we roughly double our computing power every two years, evolving computers at an ever increasing, exponential rate. Once we start to merge human and artificial intelligence, evolution will begin to move at the pace of technology, outpacing organic evolution. Consider those things which humans can do now that are changing the world, such as graft healthy tissue on top of diseased tissue in order to make hearts beat again. Machines of the future will be able to learn this procedure in order to do this more efficiently and precisely than a human ever could. Not only this, but as machines are able to learn they will be able to suggest better alternatives to the procedure and contribute to the developments of the medical field utilizing a wealth of endless resources.

One of the reasons why artificial intelligence is called artificial is because there is an element missing which creates limitations on this technology -- visual cognition. If machines could see with full understanding, they would be able to to exhibit these human traits we portray in movies. Humans would no longer need to input commands using a keyboard or verbalize situations in order to help machines understand, computers would interpret their surroundings on their own. Technology would learn over time to understand the world in order to think for itself. Ultimately, technology would engage with humans in a whole new way and provide assistance in an entirely different form. This, is true machine cognition.

Engadget
28 Years Later is coming to theaters next summer
Fans have been waiting a long, long time for another installment in the 28 Days Later franchise, and we now know when the next followup is coming out: June 20, 2025. 28 Years Later will be directed by Danny Boyle and written by Alex Garland.
Engadget
What we’re listening to: Trail of Flowers, Hyperdrama, Science Fiction and more
In this installment of What We're Listening To, the Engadget team discusses some of the recent releases we've had on repeat, including new music from Sierra Ferrell, Justice, Utada Hikaru and Caroline Polachek.
Engadget
Waymo says its robotaxis are now making 50,000 paid trips every week
Waymo has revealed, as well, that it's had over one million rider-only trips across four cities.
Engadget
Doctor Who: The Devil’s Chord review: Is this madness?
'The Devil's Chord' is a mess, but it's probably an intentional mess.
Engadget
Doctor Who Space Babies review: Bet you didn’t expect that
'Space Babies' is crazy, chaotic and political. Just as Doctor Who should be.
Engadget
Apple’s big AI rollout at WWDC will reportedly focus on making Siri suck less
Apple will reportedly focus its first round of generative AI enhancements on beefing up Siri’s conversational chops. The company will reportedly roll out a new version of Siri powered by generative AI at its WWDC keynote on June 10.
Engadget
Samsung HW-Q990D soundbar review: A small but significant update
Samsung's home theater powerhouse got the one thing it was missing, but not much else for this year's model.
Engadget
Climate protestors clash with police outside Tesla’s German gigafactory
Climate protestors in Germany reportedly broke through police barricades on Friday, amid clashes between activists and law enforcement. The protestors either made it onto (according to protestors) or near (according to local police) the grounds of a Tesla gigafactory in Grünheide, Germany, near Berlin.
Engadget
The world’s largest direct carbon capture plant just went online
Climeworks has just opened the world’s largest direct carbon capture plant. It can suck around 36,000 tons of CO2 from the air each year, burying it underground.
Engadget
Apple's entire AirPods lineup is discounted, plus the rest of the week's best tech deals
This week's best tech deals include the AirPods Pro for $180, the Amazon Kindle for $80 and a year of Paramount+ with Showtime for $60, among others.
Engadget
Amazon's Echo Dot drops to just $28
Amazon's Echo Dot has dropped to $28, which is a great price for our favorite sub-$50 smart speaker.
Engadget
Hulu's Black Twitter documentary is a vital cultural chronicle
Hulu's new documentary, "Black Lives Matter: A People's History," explores the rise and global influence of the community.
Engadget
Google just patched the fifth zero-day exploit for Chrome this year
Google just patched the fifth zero-day exploit for Chrome this year. The company has released a security update for users to download.
Engadget
The Rogue Prince of Persia is delayed because Hades II is a juggernaut
The Rogue Prince of Persia was supposed to debut in early access on May 14, but Evil Empire and Ubisoft have delayed it to get out of the way of Hades II.
Engadget
Engadget Podcast: Is the iPad Pro M4 overkill?
The Rabbit R1 is finally here, and it's yet another useless AI gadget.
Engadget
The Morning After: Apple apologizes for its iPad Pro ad that crushed human creativity
The biggest news stories this morning: How to watch Google's I/O 2024 keynote, Apple apologizes for its iPad Pro ad, Nintendo is done paying Elon Musk for X integration on its consoles.
Engadget
Microsoft's web-based mobile game store opens in July
Microsoft is launching a browser-based store to make sure it's "accessible across all devices, all countries, no matter what."
Engadget
The best smart plugs in 2024
To find out which smart plug you should invite into your connected home, we tested ten, using Alexa, Siri and the Google Assistant.
Engadget
Jack Dorsey claims Bluesky is 'repeating all the mistakes' he made at Twitter
Just in case there was any doubt about how Jack Dorsey really feels about Bluesky, the former Twitter CEO has offered new details on why he left the board and deleted his account.
Engadget
Apple apologizes for its tone-deaf ad that crushed human creativity to make an iPad
Apple has reportedly apologized for its tone-deaf “Crush!” ad that sparked a furious backlash — especially with artists. AdAge reports that Apple said the video “missed the mark” and has scrapped plans to run the cutesy-turned-cringey commercial on TV.

What If Siri Could See?

Latest Stories

28 Years Later is coming to theaters next summer

What we’re listening to: Trail of Flowers, Hyperdrama, Science Fiction and more

Waymo says its robotaxis are now making 50,000 paid trips every week

Doctor Who: The Devil’s Chord review: Is this madness?

Doctor Who Space Babies review: Bet you didn’t expect that

Apple’s big AI rollout at WWDC will reportedly focus on making Siri suck less

Samsung HW-Q990D soundbar review: A small but significant update

Climate protestors clash with police outside Tesla’s German gigafactory

The world’s largest direct carbon capture plant just went online

Apple's entire AirPods lineup is discounted, plus the rest of the week's best tech deals

Amazon's Echo Dot drops to just $28

Hulu's Black Twitter documentary is a vital cultural chronicle

Google just patched the fifth zero-day exploit for Chrome this year

The Rogue Prince of Persia is delayed because Hades II is a juggernaut

Engadget Podcast: Is the iPad Pro M4 overkill?

The Morning After: Apple apologizes for its iPad Pro ad that crushed human creativity

Microsoft's web-based mobile game store opens in July

The best smart plugs in 2024

Jack Dorsey claims Bluesky is 'repeating all the mistakes' he made at Twitter

Apple apologizes for its tone-deaf ad that crushed human creativity to make an iPad

About

Sections

Contribute

Buying Guides