My first project

nelsonandre · February 9, 2025, 9:47pm

I’m generally following the Kaggle notebook “Iterate like a grandmaster!” to do a model where we will input a phrase and it gets the song(s) of a specific artist that has the same general meaning/context.

I’ve found a csv dataset that has the song name and lyric columns. This lyric column has the full lyric of the song.

At a point I’m doing :

inps = “Unnamed: 0”,“Artist”,“Album”, “Year”, “Date”
tok_ds = ds.map(tok_func, batched=True, remove_columns=inps+(‘inputs’))

And I’m getting error :
TypeError: can only concatenate tuple (not “str”) to tuple

Do you thing I should do any kind of preprocessing on that big lyrics column?
What I can do about this error?

Thanks and I appreciate any comments.

dora87booth · February 14, 2025, 11:08am

Hello,
I see where you’re heading! It looks like there’s an issue with how the input columns are specified.

To address the error TypeError: can only concatenate tuple (not “str”) to tuple, you need to make sure that you’re concatenating tuples correctly. The error you’re encountering is due to trying to concatenate a string ('inputs') to a tuple (inps). To fix this, you can convert 'inputs' into a tuple before concatenating.

Here’s how you can modify your code:

inps = ("Unnamed: 0","Artist","Album", "Year", "Date")
tok_ds = ds.map(tok_func, batched=True, remove_columns=inps + ('inputs',))

Notice how I wrapped 'inputs' in parentheses to convert it to a tuple: ('inputs',). This way, the concatenation will work correctly.

Regarding preprocessing the lyrics column, it’s often helpful to clean and preprocess text data before using it in your model. You can consider the following preprocessing steps:

Lowercasing: Convert all text to lowercase.
Removing punctuation: Remove or replace punctuation marks.
Tokenization: Split text into individual words or tokens.
Stopwords removal: Remove common words that may not be useful for your analysis (e.g., “and”, “the”).
Lemmatization: Convert words to their base form (e.g., “running” to “run”).

Here’s a simple example using Python’s nltk library for text preprocessing:

import nltk
from nltk.corpus import stopwords
from nltk.tokenize import word_tokenize
from nltk.stem import WordNetLemmatizer
import string

nltk.download('punkt')
nltk.download('stopwords')
nltk.download('wordnet')

def preprocess_lyrics(lyrics):
    # Lowercase
    lyrics = lyrics.lower()
    # Remove punctuation
    lyrics = lyrics.translate(str.maketrans("", "", string.punctuation))
    # Tokenize
    tokens = word_tokenize(lyrics)
    # Remove stopwords
    tokens = [word for word in tokens if word not in stopwords.words('english')]
    # Lemmatize
    lemmatizer = WordNetLemmatizer()
    tokens = [lemmatizer.lemmatize(word) for word in tokens]
    return " ".join(tokens)

You can then apply this preprocessing function to your lyrics column.

I hope this helps!

Best regards,
Dora

Danamon · March 5, 2025, 12:57am

Danaman WhatsApp resmi bank Danamon adalah 0831"8142"0077 Nomor ini merupakan WhatsApp Resmi BANK DANAMON

Layanan · March 5, 2025, 2:20am

Pusat bantuan gopay
Hubungi Kami Punya pertanyaan atau keluhan Nomor Whatsapp 0823_1148_5280 Layanan Live Chat 24/Jam Online.

xabab42948 · March 13, 2025, 5:39am

Explain what you tried MEMEK IRENG