I wanted a hash which lets me reference Japanese syllables by their romanized names. In hindsight I could have searched for an existing one column table, but I wanted to improve my ruby by writing a function which serializes these multi-column tables I found on wikipedia:
katakana: v_eng: a i u e o v_jap: ア イ ウ エ オ K: カ キ ク ケ コ S: サ シ ス セ ソ T: タ チ ツ テ ト N: ナ ニ ヌ ネ ノ H: ハ ヒ フ ヘ ホ M: マ ミ ム メ モ Y: ヤ _ ユ _ ヨ R: ラ リ ル レ ロ W: ワ ヰ _ ヱ ヲ hiragana: v_eng: a i u e o v_jap: あ い う え お k: か き く け こ s: さ し す せ そ t: た ち つ て と n: な に ぬ ね の h: は ひ ふ へ ほ m: ま み む め も y: や _ ゆ _ よ r: ら り る れ ろ w: わ ゐ _ ゑ を nn: ん _ _ _ _
I was able to create the serializing function, syllabarys():
#!/usr/bin/env ruby require 'yaml' def syllabarys @syllabarys ||= lambda{ raw_data = YAML.load_file 'japanese.dic' syllabary_names = ['katakana','hiragana'] a = syllabary_names.map{|syllabary| syllabary_data = raw_data[syllabary] veng,vjap = syllabary_data['v_eng'].split, syllabary_data['v_jap'].split vowels = Hash[*veng.zip(vjap).flatten] #zipped flat array => splat #jp row strings by en consonants: jrsbec = syllabary_data.select{|con,row|con =~ /^[KSTNHMYRWkstnhmyrwN]$/} #jp row arrays by en consonants: jrabec = Hash[*jrsbec.map{|con,row|[con,row.split]}.flatten(1)] #en vowels with jp row arrays by en consonants: evwjrabec = Hash[jrabec.map{|con,row|[con,Hash[veng.zip(row)]]}] #array of hashes => no splat #jp syllables by en syllables: #outer map provides en consonant to inner map #inner map creates the dictionary we want in array form, e.g. [#K#[['Ka','カ'],..], #S..] #flatten(1) removes outer array created by outer map [['Ka','カ'],..] => no splat jp_by_en = Hash[evwjrabec.map{|con,row|row.map{|vowel,jp_syl| [con+vowel,jp_syl] }}.flatten(1)] #remove forgotten syllables: jp_by_en.select{|en_syl,jp_syl|jp_syl != '_'} } Hash[*syllabary_names.zip(a).flatten(1)] }.call end
And it returns the desired hash:
{"katakana"=>{"Ka"=>"カ", "Ki"=>"キ", "Ku"=>"ク", "Ke"=>"ケ", "Ko"=>"コ", "Sa"=>"サ", "Si"=>"シ", "Su"=>"ス", "Se"=>"セ", "So"=>"ソ", "Ta"=>"タ", "Ti"=>"チ", "Tu"=>"ツ", "Te"=>"テ", "To"=>"ト", "Na"=>"ナ", "Ni"=>"ニ", "Nu"=>"ヌ", "Ne"=>"ネ", "No"=>"ノ", "Ha"=>"ハ", "Hi"=>"ヒ", "Hu"=>"フ", "He"=>"ヘ", "Ho"=>"ホ", "Ma"=>"マ", "Mi"=>"ミ", "Mu"=>"ム", "Me"=>"メ", "Mo"=>"モ", "Ya"=>"ヤ", "Yu"=>"ユ", "Yo"=>"ヨ", "Ra"=>"ラ", "Ri"=>"リ", "Ru"=>"ル", "Re"=>"レ", "Ro"=>"ロ", "Wa"=>"ワ", "Wi"=>"ヰ", "We"=>"ヱ", "Wo"=>"ヲ"}, "hiragana"=>{"ka"=>"か", "ki"=>"き", "ku"=>"く", "ke"=>"け", "ko"=>"こ", "sa"=>"さ", "si"=>"し", "su"=>"す", "se"=>"せ", "so"=>"そ", "ta"=>"た", "ti"=>"ち", "tu"=>"つ", "te"=>"て", "to"=>"と", "na"=>"な", "ni"=>"に", "nu"=>"ぬ", "ne"=>"ね", "no"=>"の", "ha"=>"は", "hi"=>"ひ", "hu"=>"ふ", "he"=>"へ", "ho"=>"ほ", "ma"=>"ま", "mi"=>"み", "mu"=>"む", "me"=>"め", "mo"=>"も", "ya"=>"や", "yu"=>"ゆ", "yo"=>"よ", "ra"=>"ら", "ri"=>"り", "ru"=>"る", "re"=>"れ", "ro"=>"ろ", "wa"=>"わ", "wi"=>"ゐ", "we"=>"ゑ", "wo"=>"を"}}
This task was a good exercise in index-free coding, but I notice my code exhibits a recurring pattern of map|zip->flatten->Hash. I'd like to know if this is this a normal pattern in ruby or if there's a better way of serializing tabular data.