The Chipper Services¶

This describes the interactions with Anki’s automatic speech response server. The audio after a "Hey Vector" is sent to servers for processing. The servers send a response back, in the form of an intent. This is a code and a structure that represents an action to carry out in response to the spoken request, query, or statement; it may represent the action requested, an answer to a query, or an action that emotionally responds to what was said. The intent structures are described in another page.

Common Elements¶

The enumerations and structures in this section are common to many commands.

Enumerations¶

AudioEncoding¶

IntentService¶

LanguageCode¶

RobotMode¶

Structures¶

The following structures are present in the Go code, but their use is not known.

Weather Location¶

The WeatherLocation structure has the following fields:

Table: JSON Parameters for the weather location structure

Field	Type	Description
city	string
country	string
state	string

Commands and Responses¶

Unknown¶

We see these in the logs, but it doesn’t match what the Go code has for generated grpc protobuf stuff…?

Request¶

The request sent to the server has the following fields

Table: Parameters for ASR request

Field	Type	Description
session	string	Weirdo hex line thing
type	string	e.g. “streamOpen"

Not sure where the stream open goes. Does it upload the file, or live stream it?

Response¶

The server response message has the following fields

Table: Parameters for ASR response

Field	Type	Description
intent	string	The type of intent
metadata	string	This can be an empty string, but it can also be a string with colon delimited parameters. It often has the pattern "text: unquoted-string confidence: float handler: LEX" The "text:" can be followed by transcription of the spoken text, the "confidence:" followed by a floating point number representing how confident the speech-to-text engine is in the transcription.
parameters	JSON string	This is a string containing the JSON serialization of the intent parameters.
type	string	e.g. "result"

Streaming Connection Check¶

Request¶

The StreamingConnectionCheckRequest request message has the following fields:

Table: JSON Parameters for the streaming connection check request

Field	Type	Description
app_key
audio_per_request
device_id		Probably the robot's ESN.
firmware_version
input_audio
session
total_audio_ms	int

Response¶

The ConnectionCheckResponse response message has the following fields:

Table: JSON Parameters for the connection check response

Field	Type	Description
frames_received		A count?
status	Status

Streaming Intent¶

This is used to TBD on the server.

Request¶

The StreamingIntentRequest request message has the following fields:

Table: JSON Parameters for the streaming intent request

Field	Type	Description
app_key
audio_encoding	AudioEncoding	Probably opus or ogg
boot_id
device_id		Probably the robot's ESN.
firmware_version
input_audio
input_service
language_code	LanguageCode
mode	RobotMode
save_audio	bool
session
single_utterance
skip_das	bool
speech_only	bool

Response¶

The IntentResponse response message has the following fields:

Table: JSON Parameters for the intent response

Field	Type	Description
audio_id
device_id		Probably the robot's ESN.
intent_result	IntentResult
is_final	bool
mode	RobotMode
session
speech_result	SpeechResult

The IntentResult structure has the following fields:

Table: JSON Parameters for the intent result structure

Field	Type	Description
action
all_parameters_present	bool
has_context	bool
intent_confidence	float
kgresponse
parameters
query_text
service
speech_confidence	float

The SpeechResult structure has the following fields:

Table: JSON Parameters for the speech result structure

Field	Type	Description
is_final	bool
transcript	string

Streaming Knowledge Graph¶

This is used to query the knowledge graph on the server. Note: I’m not convinced that Vector uses this. It may be some of how the server internally works that got left in Vector's vic-cloud.

Request¶

The StreamingKnowledgeGraphRequest request message has the following fields:

Table: JSON Parameters for the streaming knowledge graph request

Field	Type	Description
app_key
audio_encoding	AudioEncoding	Probably opus or ogg
boot_id
device_id		Probably the robot's ESN.
firmware_version
input_audio
language_code	LanguageCode
save_audio
skip_das	bool
timezone

Response¶

The KnowledgeGraphResponse response message has the following fields:

Table: JSON Parameters for the streaming knowledge graph response

Field	Type	Description
audio_id
command_type
device_id		Probably the robot's ESN.
domains_used
query_text
session
spoken_text
text_input

Text¶

Note: I'm not convinced that Vector uses this. It may be some of how the server internally works that got left in Vector's vic-cloud.

Request¶

The TextRequest request message has the following fields:

Table: JSON Parameters for the text request

Field	Type	Description
device_id		Probably the robot's ESN.
firmware_version
intent_service	IntentService
language_code	LanguageCode
mode	RobotMode
session
skip_das	bool