Files wav labelling automated and processing to create datasets

Hello everyone
I’m embarking on a project that in principle is a bit intimidating (for me of course) I’m mapping the city where I live with recordings in different landmarks. With them I want to check the biotic and atropogenic levels of each place in order to make an ecoacoustic study.
I need a workflow for audio tagging and processing. This is very important for further data processing, but I lack a solid knowledge of the subject.
I have had some ‘talk’ with my friend ‘Copilot’ but after one morning I abandoned that way for not getting satisfactory results. Anyone who has worked these things around here and is willing to share the experience and way of working?
Attached are some attempts to create a script to do this but they didn’t work Thank you very much.

import os
import pandas as pd
from import Dataset, DataLoader
import torchaudio

annotations_file = 
class AudioDataset(Dataset):
    def __init__(self, annotations_file, audio_dir, transform=None):
        self.annotations = pd.read_csv(annotations_file)
        self.audio_dir = audio_dir
        self.transform = transform

    def __len__(self):
        return len(self.annotations)

    def __getitem__(self, index):
        audio_sample_path = os.path.join(self.audio_dir, self.annotations.iloc[index, 0])
        label = self.annotations.iloc[index, 1]
        waveform, sample_rate = torchaudio.load(audio_sample_path)
        if self.transform:
            waveform = self.transform(waveform)
        return waveform, label

# Suponiendo que ya tienes un modelo entrenado llamado 'modelo'
# y una función de transformación llamada 'transformacion_audio'

# Cargar el dataset no etiquetado
dataset_no_etiquetado = AudioDataset('path/to/annotations.csv', 'path/to/audio_dir', transform=transformacion_audio)
dataloader = DataLoader(dataset_no_etiquetado, batch_size=1, shuffle=False)

# Etiquetar los nuevos sonidos
predicciones = []
for waveform, _ in dataloader:
    outputs = modelo(waveform)
    _, predicted_labels = torch.max(outputs, 1)

# Guardar las predicciones en un archivo CSV
nuevas_etiquetas = pd.DataFrame(predicciones, columns=['Etiqueta'])
nuevas_etiquetas.to_csv('nuevas_etiquetas.csv', index=False)

Thank you very much

I would recommend trying out fastxtend as it’s build on top of both fastai and torchaudio to provide item and batch Transforms that you can use with fastai’s DataBlock.

I’m not an expert (I’ve only just started using fastxtend) but looking at the code you provided I think you could do something like the following pseudo-code:

df = pd.read_csv(annotations_file)

auds = DataBlock(
    blocks=(AudioBlock, CategoryBlock),
    get_x = ColReader("filename", pref=audio_dir),
    get_y = ColReader("label")

dls = auds.dataloaders(df)

learn = vision_learner(dls, arch, n_in=1)
learn.fine_tune(12, 0.01)

If you can provide the dataset and Kaggle/Colab notebook I can look into it more closely.