The Chipper Services¶
This describes the interactions with Anki’s automatic speech response server. The audio after a "Hey Vector" is sent to servers for processing. The servers send a response back, in the form of an intent. This is a code and a structure that represents an action to carry out in response to the spoken request, query, or statement; it may represent the action requested, an answer to a query, or an action that emotionally responds to what was said. The intent structures are described in another page.
Common Elements¶
The enumerations and structures in this section are common to many commands.
Enumerations¶
AudioEncoding¶
IntentService¶
LanguageCode¶
RobotMode¶
Structures¶
The following structures are present in the Go code, but their use is not known.
Weather Location¶
The WeatherLocation structure has the following fields:
Table: JSON Parameters for the weather location structure
| Field | Type | Description |
|---|---|---|
| city | string | |
| country | string | |
| state | string |
Commands and Responses¶
Unknown¶
We see these in the logs, but it doesn’t match what the Go code has for generated grpc protobuf stuff…?
Request¶
The request sent to the server has the following fields
Table: Parameters for ASR request
| Field | Type | Description |
|---|---|---|
| session | string | Weirdo hex line thing |
| type | string | e.g. “streamOpen" |
Not sure where the stream open goes. Does it upload the file, or live stream it?
Response¶
The server response message has the following fields
Table: Parameters for ASR response
| Field | Type | Description |
|---|---|---|
| intent | string | The type of intent |
| metadata | string | This can be an empty string, but it can also be a string with colon delimited parameters. It often has the pattern "text: unquoted-string confidence: float handler: LEX" The "text:" can be followed by transcription of the spoken text, the "confidence:" followed by a floating point number representing how confident the speech-to-text engine is in the transcription. |
| parameters | JSON string | This is a string containing the JSON serialization of the intent parameters. |
| type | string | e.g. "result" |
Streaming Connection Check¶
Request¶
The StreamingConnectionCheckRequest request message has the following fields:
Table: JSON Parameters for the streaming connection check request
| Field | Type | Description |
|---|---|---|
| app_key | ||
| audio_per_request | ||
| device_id | Probably the robot's ESN. | |
| firmware_version | ||
| input_audio | ||
| session | ||
| total_audio_ms | int |
Response¶
The ConnectionCheckResponse response message has the following fields:
Table: JSON Parameters for the connection check response
| Field | Type | Description |
|---|---|---|
| frames_received | A count? | |
| status | Status |
Streaming Intent¶
This is used to TBD on the server.
Request¶
The StreamingIntentRequest request message has the following fields:
Table: JSON Parameters for the streaming intent request
| Field | Type | Description |
|---|---|---|
| app_key | ||
| audio_encoding | AudioEncoding | Probably opus or ogg |
| boot_id | ||
| device_id | Probably the robot's ESN. | |
| firmware_version | ||
| input_audio | ||
| input_service | ||
| language_code | LanguageCode | |
| mode | RobotMode | |
| save_audio | bool | |
| session | ||
| single_utterance | ||
| skip_das | bool | |
| speech_only | bool |
Response¶
The IntentResponse response message has the following fields:
Table: JSON Parameters for the intent response
| Field | Type | Description |
|---|---|---|
| audio_id | ||
| device_id | Probably the robot's ESN. | |
| intent_result | IntentResult | |
| is_final | bool | |
| mode | RobotMode | |
| session | ||
| speech_result | SpeechResult |
The IntentResult structure has the following fields:
Table: JSON Parameters for the intent result structure
| Field | Type | Description |
|---|---|---|
| action | ||
| all_parameters_present | bool | |
| has_context | bool | |
| intent_confidence | float | |
| kgresponse | ||
| parameters | ||
| query_text | ||
| service | ||
| speech_confidence | float |
The SpeechResult structure has the following fields:
Table: JSON Parameters for the speech result structure
| Field | Type | Description |
|---|---|---|
| is_final | bool | |
| transcript | string |
Streaming Knowledge Graph¶
This is used to query the knowledge graph on the server. Note: I’m not convinced that Vector uses this. It may be some of how the server internally works that got left in Vector's vic-cloud.
Request¶
The StreamingKnowledgeGraphRequest request message has the following fields:
Table: JSON Parameters for the streaming knowledge graph request
| Field | Type | Description |
|---|---|---|
| app_key | ||
| audio_encoding | AudioEncoding | Probably opus or ogg |
| boot_id | ||
| device_id | Probably the robot's ESN. | |
| firmware_version | ||
| input_audio | ||
| language_code | LanguageCode | |
| save_audio | ||
| skip_das | bool | |
| timezone |
Response¶
The KnowledgeGraphResponse response message has the following fields:
Table: JSON Parameters for the streaming knowledge graph response
| Field | Type | Description |
|---|---|---|
| audio_id | ||
| command_type | ||
| device_id | Probably the robot's ESN. | |
| domains_used | ||
| query_text | ||
| session | ||
| spoken_text | ||
| text_input |
Text¶
Note: I'm not convinced that Vector uses this. It may be some of how the server internally works that got left in Vector's vic-cloud.
Request¶
The TextRequest request message has the following fields:
Table: JSON Parameters for the text request
| Field | Type | Description |
|---|---|---|
| device_id | Probably the robot's ESN. | |
| firmware_version | ||
| intent_service | IntentService | |
| language_code | LanguageCode | |
| mode | RobotMode | |
| session | ||
| skip_das | bool |