Serializing tabular data in ruby -- is map, flatten, hash the correct approach?

Question

I wanted a hash which lets me reference Japanese syllables by their romanized names. In hindsight I could have searched for an existing one column table, but I wanted to improve my ruby by writing a function which serializes these multi-column tables I found on wikipedia:

 katakana: v_eng: a i u e o v_jap: ア イ ウ エ オ K: カ キ ク ケ コ S: サ シ ス セ ソ T: タ チ ツ テ ト N: ナ ニ ヌ ネ ノ H: ハ ヒ フ ヘ ホ M: マ ミ ム メ モ Y: ヤ _ ユ _ ヨ R: ラ リ ル レ ロ W: ワ ヰ _ ヱ ヲ hiragana: v_eng: a i u e o v_jap: あ い う え お k: か き く け こ s: さ し す せ そ t: た ち つ て と n: な に ぬ ね の h: は ひ ふ へ ほ m: ま み む め も y: や _ ゆ _ よ r: ら り る れ ろ w: わ ゐ _ ゑ を nn: ん _ _ _ _

I was able to create the serializing function, syllabarys():

#!/usr/bin/env ruby require 'yaml' def syllabarys @syllabarys ||= lambda{ raw_data = YAML.load_file 'japanese.dic' syllabary_names = ['katakana','hiragana'] a = syllabary_names.map{|syllabary| syllabary_data = raw_data[syllabary] veng,vjap = syllabary_data['v_eng'].split, syllabary_data['v_jap'].split vowels = Hash[*veng.zip(vjap).flatten] #zipped flat array => splat #jp row strings by en consonants: jrsbec = syllabary_data.select{|con,row|con =~ /^[KSTNHMYRWkstnhmyrwN]$/} #jp row arrays by en consonants: jrabec = Hash[*jrsbec.map{|con,row|[con,row.split]}.flatten(1)] #en vowels with jp row arrays by en consonants: evwjrabec = Hash[jrabec.map{|con,row|[con,Hash[veng.zip(row)]]}] #array of hashes => no splat #jp syllables by en syllables: #outer map provides en consonant to inner map #inner map creates the dictionary we want in array form, e.g. [#K#[['Ka','カ'],..], #S..] #flatten(1) removes outer array created by outer map [['Ka','カ'],..] => no splat jp_by_en = Hash[evwjrabec.map{|con,row|row.map{|vowel,jp_syl| [con+vowel,jp_syl] }}.flatten(1)] #remove forgotten syllables: jp_by_en.select{|en_syl,jp_syl|jp_syl != '_'} } Hash[*syllabary_names.zip(a).flatten(1)] }.call end

And it returns the desired hash:

{"katakana"=>{"Ka"=>"カ", "Ki"=>"キ", "Ku"=>"ク", "Ke"=>"ケ", "Ko"=>"コ", "Sa"=>"サ", "Si"=>"シ", "Su"=>"ス", "Se"=>"セ", "So"=>"ソ", "Ta"=>"タ", "Ti"=>"チ", "Tu"=>"ツ", "Te"=>"テ", "To"=>"ト", "Na"=>"ナ", "Ni"=>"ニ", "Nu"=>"ヌ", "Ne"=>"ネ", "No"=>"ノ", "Ha"=>"ハ", "Hi"=>"ヒ", "Hu"=>"フ", "He"=>"ヘ", "Ho"=>"ホ", "Ma"=>"マ", "Mi"=>"ミ", "Mu"=>"ム", "Me"=>"メ", "Mo"=>"モ", "Ya"=>"ヤ", "Yu"=>"ユ", "Yo"=>"ヨ", "Ra"=>"ラ", "Ri"=>"リ", "Ru"=>"ル", "Re"=>"レ", "Ro"=>"ロ", "Wa"=>"ワ", "Wi"=>"ヰ", "We"=>"ヱ", "Wo"=>"ヲ"}, "hiragana"=>{"ka"=>"か", "ki"=>"き", "ku"=>"く", "ke"=>"け", "ko"=>"こ", "sa"=>"さ", "si"=>"し", "su"=>"す", "se"=>"せ", "so"=>"そ", "ta"=>"た", "ti"=>"ち", "tu"=>"つ", "te"=>"て", "to"=>"と", "na"=>"な", "ni"=>"に", "nu"=>"ぬ", "ne"=>"ね", "no"=>"の", "ha"=>"は", "hi"=>"ひ", "hu"=>"ふ", "he"=>"へ", "ho"=>"ほ", "ma"=>"ま", "mi"=>"み", "mu"=>"む", "me"=>"め", "mo"=>"も", "ya"=>"や", "yu"=>"ゆ", "yo"=>"よ", "ra"=>"ら", "ri"=>"り", "ru"=>"る", "re"=>"れ", "ro"=>"ろ", "wa"=>"わ", "wi"=>"ゐ", "we"=>"ゑ", "wo"=>"を"}}

This task was a good exercise in index-free coding, but I notice my code exhibits a recurring pattern of map|zip->flatten->Hash. I'd like to know if this is this a normal pattern in ruby or if there's a better way of serializing tabular data.

Dogbert · Accepted Answer · 2014-02-01 15:29:35Z

Feedback:

Instead of ||= lambda { ... }.call, you can use ||= begin ... end
Instead of Hash[*arr] you can use arr.to_h in Ruby 2.0+
You don't need to convert everything to a hash if you just want to use it for .map later -- [[1, 2], [3, 4]].map { |k, v| k + v } #=>[3, 7]
Instead of .map { ... }.flatten(1), you can use .flat_map { ... }
If you won't be using a variable in a block, you can use _ instead, like .map { |key, _| key }

I rewrote the code to be like what I'd code it today.

require 'yaml' def do_it(raw) map = raw["v_eng"].split.zip(raw["v_jap"].split) raw.select do |k, _| k.size == 1 end.flat_map do |pre, japs| map.zip(japs.split).map do |(post, _), jap| [pre + post, jap] unless jap == '_' end.compact end.to_h end want = {"katakana"=>{"Ka"=>"カ", "Ki"=>"キ", "Ku"=>"ク", "Ke"=>"ケ", "Ko"=>"コ", "Sa"=>"サ", "Si"=>"シ", "Su"=>"ス", "Se"=>"セ", "So"=>"ソ", "Ta"=>"タ", "Ti"=>"チ", "Tu"=>"ツ", "Te"=>"テ", "To"=>"ト", "Na"=>"ナ", "Ni"=>"ニ", "Nu"=>"ヌ", "Ne"=>"ネ", "No"=>"ノ", "Ha"=>"ハ", "Hi"=>"ヒ", "Hu"=>"フ", "He"=>"ヘ", "Ho"=>"ホ", "Ma"=>"マ", "Mi"=>"ミ", "Mu"=>"ム", "Me"=>"メ", "Mo"=>"モ", "Ya"=>"ヤ", "Yu"=>"ユ", "Yo"=>"ヨ", "Ra"=>"ラ", "Ri"=>"リ", "Ru"=>"ル", "Re"=>"レ", "Ro"=>"ロ", "Wa"=>"ワ", "Wi"=>"ヰ", "We"=>"ヱ", "Wo"=>"ヲ"}, "hiragana"=>{"ka"=>"か", "ki"=>"き", "ku"=>"く", "ke"=>"け", "ko"=>"こ", "sa"=>"さ", "si"=>"し", "su"=>"す", "se"=>"せ", "so"=>"そ", "ta"=>"た", "ti"=>"ち", "tu"=>"つ", "te"=>"て", "to"=>"と", "na"=>"な", "ni"=>"に", "nu"=>"ぬ", "ne"=>"ね", "no"=>"の", "ha"=>"は", "hi"=>"ひ", "hu"=>"ふ", "he"=>"へ", "ho"=>"ほ", "ma"=>"ま", "mi"=>"み", "mu"=>"む", "me"=>"め", "mo"=>"も", "ya"=>"や", "yu"=>"ゆ", "yo"=>"よ", "ra"=>"ら", "ri"=>"り", "ru"=>"る", "re"=>"れ", "ro"=>"ろ", "wa"=>"わ", "wi"=>"ゐ", "we"=>"ゑ", "wo"=>"を"}} raw = YAML.load_file 'japanese.dic' p do_it(raw["katakana"]) == want["katakana"] #=> true p do_it(raw["hiragana"]) == want["hiragana"] #=> true

Hope that helps. Let me know if you want any other clarification(s) in a comment below.

Stack Exchange Network

Serializing tabular data in ruby -- is map, flatten, hash the correct approach?

1 Answer 1

Hot Network Questions

Serializing tabular data in ruby -- is map, flatten, hash the correct approach?

1 Answer 1

Related

Hot Network Questions