What algorithm to use for finding artists/bands in text and differentiating between artists that share the same name

Ask Question

Asked5 years, 1 month ago

Modified5 years, 1 month ago

Viewed 39 times

Here's the data I have:

Text from articles from various music blogs & music news sites (title, summary, full content, and sometimes tags).
I used a couple different NLP/NER tools (nltk, spacy, and stanford NER) to determine the proper nouns in the text, and gave each proper noun a score based on how many times it appeared, and how many NLP tools recognized it as a proper noun. None of these tools are very accurate by themselves for my data
For each proper noun I queried musicbrainz to find artists with that name. (musicbrainz has a lot of data that may be helpful: aliases, discography, associations with other artists)
Any links in the article to Spotify, YouTube etc. and the song name & artist for that link

I have three goals:

Determine which proper nouns are artists
For artists that share the same name, determine which one the text is referring to (based on musicbrainz data)
Determine if the artist is important to the article, or if they were just briefly mentioned

I have manually tagged some of the data with the correct output for the above 3 goals.

How would you go about this? Which algorithms do you think would be best for these goals?
Is there any semi-supervised learning I can do to reduce the amount of tagging I need to do?

asked Mar 10, 2020 at 18:36

Matt C

111 bronze badge

Add a comment |

0 You must log in to answer this question.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.

Stack Exchange Network

What algorithm to use for finding artists/bands in text and differentiating between artists that share the same name

0

You must log in to answer this question.

Hot Network Questions

What algorithm to use for finding artists/bands in text and differentiating between artists that share the same name

0

You must log in to answer this question.

Related

Hot Network Questions