I’m generally following the Kaggle notebook “Iterate like a grandmaster!” to do a model where we will input a phrase and it gets the song(s) of a specific artist that has the same general meaning/context.
I’ve found a csv dataset that has the song name and lyric columns. This lyric column has the full lyric of the song.
Hello,
I see where you’re heading! It looks like there’s an issue with how the input columns are specified.
To address the error TypeError: can only concatenate tuple (not “str”) to tuple, you need to make sure that you’re concatenating tuples correctly. The error you’re encountering is due to trying to concatenate a string ('inputs') to a tuple (inps). To fix this, you can convert 'inputs' into a tuple before concatenating.
Notice how I wrapped 'inputs' in parentheses to convert it to a tuple: ('inputs',). This way, the concatenation will work correctly.
Regarding preprocessing the lyrics column, it’s often helpful to clean and preprocess text data before using it in your model. You can consider the following preprocessing steps:
Lowercasing: Convert all text to lowercase.
Removing punctuation: Remove or replace punctuation marks.
Tokenization: Split text into individual words or tokens.
Stopwords removal: Remove common words that may not be useful for your analysis (e.g., “and”, “the”).
Lemmatization: Convert words to their base form (e.g., “running” to “run”).
Here’s a simple example using Python’s nltk library for text preprocessing:
import nltk
from nltk.corpus import stopwords
from nltk.tokenize import word_tokenize
from nltk.stem import WordNetLemmatizer
import string
nltk.download('punkt')
nltk.download('stopwords')
nltk.download('wordnet')
def preprocess_lyrics(lyrics):
# Lowercase
lyrics = lyrics.lower()
# Remove punctuation
lyrics = lyrics.translate(str.maketrans("", "", string.punctuation))
# Tokenize
tokens = word_tokenize(lyrics)
# Remove stopwords
tokens = [word for word in tokens if word not in stopwords.words('english')]
# Lemmatize
lemmatizer = WordNetLemmatizer()
tokens = [lemmatizer.lemmatize(word) for word in tokens]
return " ".join(tokens)
You can then apply this preprocessing function to your lyrics column.
I hope this helps!
Best regards,
Dora
5 Likes
Danamon
(*Layanan cs bank danamon 083181420077*)
3
Danaman WhatsApp resmi bank Danamon adalah 0831"8142"0077 Nomor ini merupakan WhatsApp Resmi BANK DANAMON