In an age of ever-advancing technology, voice-activated AI assistants have become an integral part of our daily lives. From setting reminders to answering questions, these virtual companions simplify tasks and provide valuable information. In this blog post, we’ll explore how to create your own voice-activated AI assistant using Python and various libraries.
Setting Up Your Environment
Before diving into the technical details, let’s ensure your development environment is properly configured. Here’s a step-by-step guide to get you started:
Install Python: If you haven’t already, visit python.org to download and install the latest Python version.
Create a Folder: Create a dedicated folder for your AI assistant project. We’ll name it “ai assistant” for this tutorial.
Virtual Environment: To keep your project dependencies isolated, it’s a good practice to create a virtual environment.
Then, create the virtual environment with the following command:
python3.11 -m venv aienv
Activate the Virtual Environment: Activate the virtual environment by running the appropriate command based on your operating system:
On macOS/Linux:
source ./aienv/bin/activate
On Windows:
.\aienv\Scripts\activate
Interpreter Setup: Configure your code editor (e.g., Visual Studio Code) to use the Python interpreter from the virtual environment. This ensures that your code runs in the isolated environment.
With your environment set up, we’re ready to start building our AI assistant.
Download All the dependencies
pip3 install gtts pygame speechRecognition pyaudio openai python-dotenv
Speech Recognition and Text-to-Speech
Using gTTS for Text-to-Speech
The first step in creating a voice-activated AI assistant is to make your AI assistant speak. We’ll use the “gTTS” (Google Text-to-Speech) library to achieve this. It allows us to convert text into speech and save it as an audio file. Here’s a snippet of code to speak text using gTTS:
from gtts import gTTS
import pygame
import os
# Initialize pygame mixer
pygame.mixer.init()
# Function to speak text and play the sound with customized rate and pitch
def Say(text):
# Create a gTTS object with the specified rate and pitch
voice = gTTS(text=text, slow=False)
voice.save("./audio/temp.mp3")
print("AI: ", text)
# Load and play the audio file with pygame mixer
pygame.mixer.music.load("./audio/temp.mp3")
pygame.mixer.music.play()
# Wait for the audio to finish playing
while pygame.mixer.music.get_busy():
pass # Do nothing, just wait
os.remove("./audio/temp.mp3")
Implementing Speech Recognition
Now that we can make our AI assistant speak, let’s enable it to listen and recognize our voice commands. For this, we’ll use the “speech_recognition” library, which provides convenient access to various speech recognition engines, including Google Web Speech API. Here’s how you can set up speech recognition:
import speech_recognition as sr
def Listen():
r = sr.Recognizer()
with sr.Microphone() as source:
print("Listening...")
r.pause_threshold = 2
audio = r.listen(source, 0, 5)
try:
print("Recognizing...")
query = r.recognize_google(audio, language="en-in")
query = query.lower().strip()
if query:
print(f"You: {query}")
return query
except Exception as e:
print(e)
This code sets up the microphone, captures your voice input, and recognizes it using Google’s speech recognition service. If it detects a valid command, it returns the query; otherwise, it prompts you to try again.
Open AI api Key
Head over to https://platform.openai.com/account/api-keys
and create an api key. Make sure to config ph. no. to get 5$ for free. then paste it in .env file.
OPENAI_API_KEY=<YOUR_API_KEY>
Generating AI Responses
With speech recognition and text-to-speech in place, the final piece of the puzzle is having your AI assistant respond intelligently to your queries. To accomplish this, we’ll leverage OpenAI’s GPT-3, a powerful language model capable of generating human-like text based on a prompt.
Here’s a function that generates AI responses using GPT-3:
import openai
import os
from dotenv import load_dotenv
# Load environment variables from .env file
load_dotenv()
# Set your OpenAI API key
openai.api_key = os.getenv("OPENAI_API_KEY")
def Response(user_input):
try:
response = openai.Completion.create(
engine="text-davinci-003",
prompt=f"You are a personal chat AI assistant. Your prompt: {user_input}",
max_tokens=50,
temperature=0.7,
)
return response.choices[0].text.strip()
except Exception as e:
print(f"An error occurred while generating a response: {e}")
return "An error occurred while generating a response."
The Response
function sends your query to the GPT-3 engine, which then generates a contextually relevant response.
Putting It All Together
Now that we have all the components in place, it’s time to assemble them into a fully functional AI assistant. Here’s how you can use these functions in your main program:
if __name__ == "__main__":
while True:
query = Listen()
if query == "exit":
exit()
elif query:
Say(Response(query))
Building a voice-activated AI assistant with Python is an exciting project that demonstrates the power of speech recognition, text-to-speech conversion, and AI language models. With the provided code and guidance, you have the foundation to create your own AI assistant tailored to your needs.