Advertisement

Directional Sound With RICOH THETA 360° Video Conferencing

Contest Winner Shotoku Tamago


The winners of the RICOH THETA x IoT Developers Contest were announced on Nov 7. There were 13 total winners, in multiple categories.

The grand prize winner was Shotoku Tamago which developed an omnidirectional microphone that allows live 360° video streaming to automatically point to the person who is talking.

From their description:

Existing video chat applications often cater to the needs of one-to-one chat only. It is often difficult and awkward to show many people at the same time in a meeting room.

Using Shotoku Tamago, you can display the remote video in real time, covering the wide area of the meeting room at once using RICOH THETA S, and show the speaker automatically by identifying the person (sound source) using a microphone array consisting of multiple microphones.

Help from the RICOH THETA Unofficial Guide

Here at the RICOH THETA Unofficial Guide we looked at Shotoku Tamago to try and get some more details than are provided in the official release. On our site there are many resources and featured projects for working with RICOH THETA and the Open Spherical Camara API. Our informational site is free and open to all.

We do not have the Shotoku Tamago "egg" microphone hardware and so have not tested the setup ourselves yet. We are not affiliated with Infocom Corporation, but if you are interested in more details, please contact @jcasman on the RICOH THETA Unofficial Guide site, and we will try to help as much as possible.

360 video conferencing with Shotoku Tamago


Main Features

  • The video automatically pans to the person talking

  • When there are multiple speakers at the same time, a series of smaller sub windows will open up for each speaker

  • It's possible to omit constant noise from a particular direction

  • It's possible to pan the video feed manually, by overriding automatic mode

Main Technologies Implemented

  • The Honda Research Institute Japan Audition for Robots with Kyoto University (HARK) robot vision technology

  • WebRTC for real-time transmission/reception of 360 degrees video and voice

  • RICOH THETA API to control the 360 video feed

  • Three.js

System Requirements

  • OS Windows7 (32/64bit), Windows 8 (64bit), or Windows 10 (32bit)

  • Memory 4GB or more

  • Google Chrome 52 or higher

  • Python v2.7.10 or higher

  • Node.js v0.10.29

  • HARK for Windows v2.2.0.7

    • When using HARK Designer: HARK Designer is a browser-based config tool for HARK network files. Installation not required.

Extra Notes

  • Infocom's "microphone array" egg is on sale and available by mid-December. The "TAMAGO-03" is available online for 29,800 yen (approx $270) and 7000 yen (approx $63) shipping. Order form in English is here: https://www.sifi.co.jp/en/contact

  • Microsoft's Kinect can be used to approximate the hardware egg part of the project

  • The name Shotoku Tamago is a play on words, combining the name of a famous Japanese statesman and the word "egg." Sort of like naming the product after one of the US founding fathers, known to be good at communicating and bringing people together, approximately like "Thomas Jeffers-egg."

Video Overview and Explanation


YouTube description of 360 video conferencing


A video overview (2 mins) has been provided but it is in Japanese. It's relatively easy to follow along. To help, we've done a translation of everything in the video.

Toda: Hello.
Kaneko: Hello. I was told there is an interesting app in use today.
Kaneko: Oh, yea? Mr. Toda, are you the only one in this meeting? Where is Mr. Takagi?
Takagi: Here I am!
Kobayashi: Mr. Kaneko, I am here, too!

Kaneko: Oh, the screen moves to the person speaking.
Kaneko: What happens when multiple people speak at the same time?
All 3: This is what happens when we all speak at the same time.
Kaneko: Oh, I see. Small windows are displayed on the top.

Kaneko: By the way, what do I do, if I want to see a person who is not speaking?
Toda: If you don't want the camera to move automatically, you can fix the position by pushing the "pin button" at the bottom.
Toda: While doing so, drag the screen with the mouse in the direction you want to see.
Kaneko: Ah, like this.
Kaneko: I see.
Kaneko: Hey! Mr. Ohta, there you are!

Explanation of how this works

  • This egg shaped device is called a "Microphone array." It has multiple microphones included. [red arrow points to microphone]

  • This enables receiving spacial information relating to sound.

  • It is possible to substitute something like Microsoft's Kinetic, which is also equipped with a microphone array.

  • Place a THETA nearby and record in 360 degrees.


First, utilizing this app, set the direction of the sound, based on the 360 degree recorded sound and image information.

  • Send the direction specified sound information to the viewer by using 360 degree image and P2P communications. [viewer: red, P2P communications: green, sender: blue]

  • Using this app, the viewer will send images that are coming from the same direction as the sound.

  • When there is a constant sound, that becomes the main image, and the other sounds and images are shown on sub screens.

  • Furthermore, the focus of the display can be changed based on the viewer's desire.

USAGE

  • Two way web meeting between point A and point B

  • Conversations with family living apart

  • You can discover your own special way to use Shotoku Tamago


Credits:
Infocom Inc. Technology Planning Office
https://lab.infocom.co.jp/2016/08/theta.html
BGM : MusMus
Figure Illustration: Designed by Freepik and distributed by Flaticon

For more information please see: