Programming meets Voice Recording: A Python Guide for Playing and Capturing Sound

Machine learning seems cool and interesting when you see the model in action. What many people don’t reveal is that training and optimizing one is tiresome, especially when you have to tune the hyperparameters. The training process could go on for a couple of minutes and waiting every time for the model to finish learning the data gets boring fast. But there should be a solution to this. What if we program our script so that we get some notification as soon as the process is over?

This is exactly what we are going to do today. This blog will show how one can perform voice recording using Python programming. Moreover, we will also see how to play audio using Python. We can use this to alert ourselves when the program has crossed a particular “checkpoint”.

First, we will look at the Python libraries and later, we will create a small Python application where we will implement an audio file to create an alert.

voice recording

Python Libraries for Audio

1. winsound

winsound is one of the simplest Python library for sounds that supports only .wav files. Below is the code to play a simple WAV file:

import winsound

filename = 'myfile.wav'
winsound.PlaySound(filename, winsound.SND_FILENAME)

Apart from this, the library also allows you to play a simple beep at a particular frequency for a given time. This is what I used while training my machine learning model as a notification sound to alert me when the training process is completed:

import winsound

winsound.Beep(1000, 100)  # Beep at 1000 Hz for 100 ms

2. playsound

Unlike winsound, the playsound library also allows you to play the traditional MP3 files, along with WAV and other format files. Here’s a small example code of using the library:

from playsound import playsound

def play_sound(file_path):
    try:
        playsound(file_path)
    except Exception as e:
        print(f"Error playing the sound: {e}")

# Replace 'your_sound_file.mp3' with the path to your sound file
sound_file_path = 'your_sound_file.mp3'
play_sound(sound_file_path)

Here, a function play_sound is defined. It takes a file path as an argument and attempts to play the sound using the playsound function. The try block is used to handle any exceptions that might occur during the sound playback. If an exception occurs, it prints an error message along with the exception details.

3. simpleaudio

simpleaudio is a Python library designed for playing and recording audio. It provides a simple and easy-to-use interface for working with sound in your Python applications. The library is specifically focused on simplicity and cross-platform compatibility.

One outstanding characteristic of the simpleaudio library is that it allows asynchronous playback, enabling you to play sounds in the background while your program continues with other tasks. Furthermore, it also offers a function that ensures that the program waits for the sound to finish playing before moving on. Below is a small program that shows its implementation:

import simpleaudio as sa

def play_sound(file_path):
    try:
        wave_obj = sa.WaveObject.from_wave_file(file_path)
        play_obj = wave_obj.play()
        play_obj.wait_done()
    except Exception as e:
        print(f"Error playing the sound: {e}")

# Replace 'your_sound_file.wav' with the path to your sound file
sound_file_path = 'your_sound_file.wav'
play_sound(sound_file_path)

In this example, the play_sound function loads a WAV file and plays it using simpleaudio. The wait_done() method ensures that the program waits for the sound to finish playing before moving on.

4. pyaudio

Until now, we only came across libraries used to play audio, but pyaudio will allow us to take our program a step further. With pyaudio, not only can one play an existing audio but can also record audio using an in-build laptop microphone or any external microphone.

import pyaudio
import wave

def record_audio(file_path, duration=5, channels=1, sample_rate=44100, chunk_size=1024):
    # Initialize Pyaudio
    audio = pyaudio.PyAudio()

    # Set recording parameters
    format = pyaudio.paInt16
    stream = audio.open(format=format,
                        channels=channels,
                        rate=sample_rate,
                        input=True,
                        frames_per_buffer=chunk_size,
                        input_device_index=input_device_index)

    print("Recording...")

    frames = []
    for i in range(int(sample_rate / chunk_size * duration)):
        data = stream.read(chunk_size)
        frames.append(data)

    print("Recording complete.")

    # Stop and close the stream
    stream.stop_stream()
    stream.close()
    audio.terminate()

    # Save the recorded audio to a WAV file
    with wave.open(file_path, 'wb') as wf:
        wf.setnchannels(channels)
        wf.setsampwidth(audio.get_sample_size(format))
        wf.setframerate(sample_rate)
        wf.writeframes(b''.join(frames))

# Set the duration of the recording in seconds
recording_duration = 5
# Set the desired file path for saving the recording
output_file_path = 'output_recording.wav'

# Record audio and save to file
record_audio(output_file_path, duration=recording_duration)

The provided Python code utilizes the pyaudio library to record audio from the default input device for a specified duration and saves the recording to a WAV file. The record_audio function is defined to handle the recording process, initializing Pyaudio, opening an audio stream with specified parameters, and capturing audio data in chunks. You can set the input_device_index variable to the index of the desired input device. Use None as the default value to use the system’s default input device. After the recording is complete, the audio stream is stopped and closed, and the Pyaudio instance is terminated. The recorded audio frames are then saved to a WAV file using the wave module.

The example showcases how to set recording parameters, such as duration and file path, and demonstrates the overall process of capturing and storing audio with pyaudio. Adjustments to the parameters can be made to suit specific recording requirements.

Using winsound to Create Alerts

Let’s create a simple machine learning program using Python and the winsound library to notify you when the training process is completed. In this example, we’ll use a hypothetical machine learning model (e.g., scikit-learn’s DummyClassifier for demonstration purposes) and trigger a notification sound using winsound when the training is done.

from sklearn.datasets import load_iris
from sklearn.model_selection import train_test_split
from sklearn.dummy import DummyClassifier
from sklearn.metrics import accuracy_score
import winsound

def notify_training_completion():
    # Replace 'notification_sound.wav' with the path to your sound file
    sound_file_path = 'notification_sound.wav'
    winsound.PlaySound(sound_file_path, winsound.SND_FILENAME)
    
def train_machine_learning_model():
    # Load the Iris dataset
    iris = load_iris()
    X_train, X_test, y_train, y_test = train_test_split(iris.data, iris.target, test_size=0.2, random_state=42)

    # Create a DummyClassifier (replace this with your actual model)
    model = DummyClassifier(strategy="most_frequent")
    
    # Train the model
    model.fit(X_train, y_train)

    # Make predictions on the test set
    predictions = model.predict(X_test)

    # Evaluate the model
    accuracy = accuracy_score(y_test, predictions)
    print(f"Model Accuracy: {accuracy}")

    # Notify with a sound when training is completed
    notify_training_completion()

# Run the machine learning program
train_machine_learning_model()

In this example, the train_machine_learning_model function loads the Iris dataset, splits it into training and testing sets, creates a DummyClassifier, trains the model, makes predictions, and evaluates the accuracy. After the training is complete, it calls the notify_training_completion function, which uses winsound to play a notification sound (replace ‘notification_sound.wav’ with the path to your actual sound file).

Feel free to replace the DummyClassifier with your actual machine learning model and customize the notification sound file according to your preferences.

Conclusion

So, let’s wrap it up! First up, we have winsound – a simple lib that handles .wav files. Want sounds of beeps, bells, and a round of applause? Then playsound is the library you’d like to consider installing. The library goes beyond .wav and also allows MP3 and other file formats. Next, simpleaudio is the library that not only plays sounds, but also allows you to control the playback. Last but not least, pyaudio takes the stage. Not only does it play audio, but it also records it. (Just don’t record your karaoke audio, the code will go berserk with that voice).

As a fun example, we built a small machine learning model and used the winsound library to get notified when the training process is complete. Happy coding! 😉

Got any cool Python or machine learning project ideas that you already completed and want to share with the world? Get in touch via my social media and the next blog could feature your work.

You also wouldn’t want to miss out on such interesting articles, would you? Then subscribe to my *FREE* newsletter where you’ll get a monthly notification on the posts that you might have missed.

This Post Has One Comment

Leave a Reply