The Chipper Services¶
This describes the interactions with Anki’s automatic speech response server. The audio after a "Hey Vector" is sent to servers for processing. The servers send a response back, in the form of an intent. This is a code and a structure that represents an action to carry out in response to the spoken request, query, or statement; it may represent the action requested, an answer to a query, or an action that emotionally responds to what was said. The intent structures are described in another page.
Common Elements¶
The enumerations and structures in this section are common to many commands.
Enumerations¶
AudioEncoding¶
IntentService¶
LanguageCode¶
RobotMode¶
Structures¶
The following structures are present in the Go code, but their use is not known.
Weather Location¶
The WeatherLocation structure has the following fields:
Table: JSON Parameters for the weather location structure
Field | Type | Description |
---|---|---|
city | string | |
country | string | |
state | string |
Commands and Responses¶
Unknown¶
We see these in the logs, but it doesn’t match what the Go code has for generated grpc protobuf stuff…?
Request¶
The request sent to the server has the following fields
Table: Parameters for ASR request
Field | Type | Description |
---|---|---|
session | string | Weirdo hex line thing |
type | string | e.g. “streamOpen" |
Not sure where the stream open goes. Does it upload the file, or live stream it?
Response¶
The server response message has the following fields
Table: Parameters for ASR response
Field | Type | Description |
---|---|---|
intent | string | The type of intent |
metadata | string | This can be an empty string, but it can also be a string with colon delimited parameters. It often has the pattern "text: unquoted-string confidence: float handler: LEX" The "text:" can be followed by transcription of the spoken text, the "confidence:" followed by a floating point number representing how confident the speech-to-text engine is in the transcription. |
parameters | JSON string | This is a string containing the JSON serialization of the intent parameters. |
type | string | e.g. "result" |
Streaming Connection Check¶
Request¶
The StreamingConnectionCheckRequest request message has the following fields:
Table: JSON Parameters for the streaming connection check request
Field | Type | Description |
---|---|---|
app_key | ||
audio_per_request | ||
device_id | Probably the robot's ESN. | |
firmware_version | ||
input_audio | ||
session | ||
total_audio_ms | int |
Response¶
The ConnectionCheckResponse response message has the following fields:
Table: JSON Parameters for the connection check response
Field | Type | Description |
---|---|---|
frames_received | A count? | |
status | Status |
Streaming Intent¶
This is used to TBD on the server.
Request¶
The StreamingIntentRequest request message has the following fields:
Table: JSON Parameters for the streaming intent request
Field | Type | Description |
---|---|---|
app_key | ||
audio_encoding | AudioEncoding | Probably opus or ogg |
boot_id | ||
device_id | Probably the robot's ESN. | |
firmware_version | ||
input_audio | ||
input_service | ||
language_code | LanguageCode | |
mode | RobotMode | |
save_audio | bool | |
session | ||
single_utterance | ||
skip_das | bool | |
speech_only | bool |
Response¶
The IntentResponse response message has the following fields:
Table: JSON Parameters for the intent response
Field | Type | Description |
---|---|---|
audio_id | ||
device_id | Probably the robot's ESN. | |
intent_result | IntentResult | |
is_final | bool | |
mode | RobotMode | |
session | ||
speech_result | SpeechResult |
The IntentResult structure has the following fields:
Table: JSON Parameters for the intent result structure
Field | Type | Description |
---|---|---|
action | ||
all_parameters_present | bool | |
has_context | bool | |
intent_confidence | float | |
kgresponse | ||
parameters | ||
query_text | ||
service | ||
speech_confidence | float |
The SpeechResult structure has the following fields:
Table: JSON Parameters for the speech result structure
Field | Type | Description |
---|---|---|
is_final | bool | |
transcript | string |
Streaming Knowledge Graph¶
This is used to query the knowledge graph on the server. Note: I’m not convinced that Vector uses this. It may be some of how the server internally works that got left in Vector's vic-cloud.
Request¶
The StreamingKnowledgeGraphRequest request message has the following fields:
Table: JSON Parameters for the streaming knowledge graph request
Field | Type | Description |
---|---|---|
app_key | ||
audio_encoding | AudioEncoding | Probably opus or ogg |
boot_id | ||
device_id | Probably the robot's ESN. | |
firmware_version | ||
input_audio | ||
language_code | LanguageCode | |
save_audio | ||
skip_das | bool | |
timezone |
Response¶
The KnowledgeGraphResponse response message has the following fields:
Table: JSON Parameters for the streaming knowledge graph response
Field | Type | Description |
---|---|---|
audio_id | ||
command_type | ||
device_id | Probably the robot's ESN. | |
domains_used | ||
query_text | ||
session | ||
spoken_text | ||
text_input |
Text¶
Note: I'm not convinced that Vector uses this. It may be some of how the server internally works that got left in Vector's vic-cloud.
Request¶
The TextRequest request message has the following fields:
Table: JSON Parameters for the text request
Field | Type | Description |
---|---|---|
device_id | Probably the robot's ESN. | |
firmware_version | ||
intent_service | IntentService | |
language_code | LanguageCode | |
mode | RobotMode | |
session | ||
skip_das | bool |