Fuzzy Name Matching with Machine Learning. Input data encoding

Asked11 months ago

Viewed 113 times

I have a huge dataset: Last name, first name, date of birth of Indian residents and I need to match them for similarity.

The matching is fuzzy, the data looks like this (names are fictitious for the example):

last name, first name, date of birth John;Doe;01-01-2003 Doe;John;01-01-2003 John Doe;;01-01-2003

I've had some success with the comparison in principle - I'm using the Levenshtein algorithm.

Now the question of encoding data for the neural network has come up. The dataset is large and I plan to use embedding, but I don't have a dictionary of names

What should be done in that case? Is there any other method to implement encoding?

edited Jun 17, 2024 at 3:53

nwaldo

4843 silver badges13 bronze badges

asked May 14, 2024 at 7:09

ккк ккк

11 bronze badge

1
$\begingroup$@kkk kkk - This doesn't answer your main question, but you can further match your data by creating rules. For example, using initials or checking to see if the first and last names have been swapped.$\endgroup$
– nwaldo
CommentedMay 14, 2024 at 12:31

Add a comment |

0

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.

Stack Exchange Network

Fuzzy Name Matching with Machine Learning. Input data encoding

0

Hot Network Questions

Fuzzy Name Matching with Machine Learning. Input data encoding

0

Know someone who can answer? Share a link to this question via email, Twitter, or Facebook.

Related

Hot Network Questions