1
$\begingroup$

Here's the data I have:

  1. Text from articles from various music blogs & music news sites (title, summary, full content, and sometimes tags).

  2. I used a couple different NLP/NER tools (nltk, spacy, and stanford NER) to determine the proper nouns in the text, and gave each proper noun a score based on how many times it appeared, and how many NLP tools recognized it as a proper noun. None of these tools are very accurate by themselves for my data

  3. For each proper noun I queried musicbrainz to find artists with that name. (musicbrainz has a lot of data that may be helpful: aliases, discography, associations with other artists)

  4. Any links in the article to Spotify, YouTube etc. and the song name & artist for that link

I have three goals:

  1. Determine which proper nouns are artists
  2. For artists that share the same name, determine which one the text is referring to (based on musicbrainz data)
  3. Determine if the artist is important to the article, or if they were just briefly mentioned

I have manually tagged some of the data with the correct output for the above 3 goals.

How would you go about this? Which algorithms do you think would be best for these goals?
Is there any semi-supervised learning I can do to reduce the amount of tagging I need to do?

$\endgroup$

    0

    You must log in to answer this question.

    Start asking to get answers

    Find the answer to your question by asking.

    Ask question

    Explore related questions

    See similar questions with these tags.