Through google assistant, google actions are how you “Build custom conversational experiences using Google Assistant’s voice and visual APIs. Take users on journeys through your product, using Assistant’s natural language understanding (NLU) capabilities and robust developer tools.”
via Conversation Actions Docs
How you build voice commands like, “Ok Google, am I pretty?” An action is a conversational experience.
We became interested in it because:
HOT NEW agency Oberman & Mallinger wanted us to build one for their client and was curious about pricing and difficulty level.
What problem does Google Actions solve:
We are moving towards a world where voice is an activator - away from text and keyboard. Google actions is one tool set for building voice activated and controlled application - alexa skills would be another one. Voice activated apps are accessible for people who struggle with text, like people learning to read, dyslexic or just multi taskers.
If you’re on android or google home you have google assistant, if you are on a different OS like Mac you can download the google assistant app.
You can then explore over 1 million actions, which is what google calls the conversational experience, built by google and 3rd party developers.
For example, there is one called "Can i wear shorts today?"👇
We broke down how "Can I wear shorts today" likely works to illustrate the fundamentals of action building.
Calling Google Assistant
To wake up Google, you need to say a "trigger phrase", "hot word" or "wake word" - these terms refer to how you get the assistant to start listening.
We looked into customizing this, which is actually very scandalous, because it is a highly requested feature but also highly precarious as you don't want the assistant listening if you have not summoned it.
For this example, "Ok Google" is the default trigger phrase.
Calling an action
"Talk to" is how google assistant now knows you are requesting an action.
The main invocation is the name of your google action which in this case is: Can I wear shorts today?
Now we are in the location scene. A scene is a building block of a conversation model. As the docs say, “Scenes represent individual states of your conversation and their main purpose is to organize your conversation into logical chunks, execute tasks, and return prompts to users.”
Our guess would be that the location scene consists of the prompt to get the user location, the users response, using the location data to call weather api, and returning the logic of if the weather permits for shorts. That would be the scene.
Intents are the intention of the user, the user input. In this case, the app responds with I need to know your location. You say yes. The yes response is your intent.
It is ideal to add training phases to the intent, not just yes but also ya, yup, duh.
Ok, I think we're good on jargon for now - this information is enough to get you started on a deploying a simple action.