Skip to content

The Chipper Services

This describes the interactions with Anki’s automatic speech response server. The audio after a "Hey Vector" is sent to servers for processing. The servers send a response back, in the form of an intent. This is a code and a structure that represents an action to carry out in response to the spoken request, query, or statement; it may represent the action requested, an answer to a query, or an action that emotionally responds to what was said. The intent structures are described in another page.

Common Elements

The enumerations and structures in this section are common to many commands.

Enumerations

AudioEncoding

IntentService

LanguageCode

RobotMode

Structures

The following structures are present in the Go code, but their use is not known.

Weather Location

The WeatherLocation structure has the following fields:

Table: JSON Parameters for the weather location structure

Field Type Description
city string
country string
state string

Commands and Responses

Unknown

We see these in the logs, but it doesn’t match what the Go code has for generated grpc protobuf stuff…?

Request

The request sent to the server has the following fields

Table: Parameters for ASR request

Field Type Description
session string Weirdo hex line thing
type string e.g. “streamOpen"

Not sure where the stream open goes. Does it upload the file, or live stream it?

Response

The server response message has the following fields

Table: Parameters for ASR response

Field Type Description
intent string The type of intent
metadata string This can be an empty string, but it can also be a string with colon delimited parameters. It often has the pattern "text: unquoted-string confidence: float handler: LEX" The "text:" can be followed by transcription of the spoken text, the "confidence:" followed by a floating point number representing how confident the speech-to-text engine is in the transcription.
parameters JSON string This is a string containing the JSON serialization of the intent parameters.
type string e.g. "result"

Streaming Connection Check

Request

The StreamingConnectionCheckRequest request message has the following fields:

Table: JSON Parameters for the streaming connection check request

Field Type Description
app_key
audio_per_request
device_id Probably the robot's ESN.
firmware_version
input_audio
session
total_audio_ms int

Response

The ConnectionCheckResponse response message has the following fields:

Table: JSON Parameters for the connection check response

Field Type Description
frames_received A count?
status Status

Streaming Intent

This is used to TBD on the server.

Request

The StreamingIntentRequest request message has the following fields:

Table: JSON Parameters for the streaming intent request

Field Type Description
app_key
audio_encoding AudioEncoding Probably opus or ogg
boot_id
device_id Probably the robot's ESN.
firmware_version
input_audio
input_service
language_code LanguageCode
mode RobotMode
_save_audio bool
session
single_utterance
skip_das bool
speech_only bool

Response

The IntentResponse response message has the following fields:

Table: JSON Parameters for the intent response

Field Type Description
audio_id
device_id Probably the robot's ESN.
intent_result IntentResult
is_final bool
mode RobotMode
session
speech_result SpeechResult

The IntentResult structure has the following fields:

Table: JSON Parameters for the intent result structure

Field Type Description
action
all_parameters_present bool
has_context bool
intent_confidence float
kgresponse
parameters
query_text
service
speech_confidence float

The SpeechResult structure has the following fields:

Table: JSON Parameters for the speech result structure

Field Type Description
is_final bool
transcript string

Streaming Knowledge Graph

This is used to query the knowledge graph on the server. Note: I’m not convinced that Vector uses this. It may be some of how the server internally works that got left in Vector's vic-cloud.

Request

The StreamingKnowledgeGraphRequest request message has the following fields:

Table: JSON Parameters for the streaming knowledge graph request

Field Type Description
app_key
audio_encoding AudioEncoding Probably opus or ogg
boot_id
device_id Probably the robot's ESN.
firmware_version
input_audio
language_code LanguageCode
save_audio
skip_das bool
timezone

Response

The KnowledgeGraphResponse response message has the following fields:

Table: JSON Parameters for the streaming knowledge graph response

Field Type Description
audio_id
command_type
device_id Probably the robot's ESN.
domains_used
query_text
session
spoken_text
text_input

Text

Note: I'm not convinced that Vector uses this. It may be some of how the server internally works that got left in Vector's vic-cloud.

Request

The TextRequest request message has the following fields:

Table: JSON Parameters for the text request

Field Type Description
device_id Probably the robot's ESN.
firmware_version
intent_service IntentService
language_code LanguageCode
mode RobotMode
session
skip_das bool