0
$\begingroup$

I have a huge dataset: Last name, first name, date of birth of Indian residents and I need to match them for similarity.

The matching is fuzzy, the data looks like this (names are fictitious for the example):

last name, first name, date of birth John;Doe;01-01-2003 Doe;John;01-01-2003 John Doe;;01-01-2003 

I've had some success with the comparison in principle - I'm using the Levenshtein algorithm.

Now the question of encoding data for the neural network has come up. The dataset is large and I plan to use embedding, but I don't have a dictionary of names

What should be done in that case? Is there any other method to implement encoding?

$\endgroup$
1
  • 1
    $\begingroup$@kkk kkk - This doesn't answer your main question, but you can further match your data by creating rules. For example, using initials or checking to see if the first and last names have been swapped.$\endgroup$
    – nwaldo
    CommentedMay 14, 2024 at 12:31

0

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.