Jump to content

Module:data consistency check/documentation

Wiktionary වෙතින්

This module checks the validity and internal consistency of the language, language family, and script data used on Wiktionary: the modules in Category:Language data modules as well as Module:scripts/data.

Discrepancies detected:

  • Code: EL.. Saw name: Ecclesiastical ලතින්. Expected name: Ecclesiastical Latin.
  • Code: LL.. Saw name: Late ලතින්. Expected name: Late Latin.
  • Code: ML.. Saw name: Medieval ලතින්. Expected name: Medieval Latin.
  • Code: VL.. Saw name: Vulgar ලතින්. Expected name: Vulgar Latin.
  • Code: abs. Saw name: Ambonese මැලේ. Expected name: Ambonese Malay.
  • Code: acw. Saw name: Hijazi අරාබි. Expected name: Hijazi Arabic.
  • Code: acy. Saw name: Cypriot අරාබි. Expected name: Cypriot Arabic.
  • Code: ady. Saw name: Adyghe. Expected name: West Circassian.
  • Code: aeb. Saw name: Tunisian අරාබි. Expected name: Tunisian Arabic.
  • Code: afb. Saw name: Gulf අරාබි. Expected name: Gulf Arabic.
  • Code: ajg. Saw name: Adja. Expected name: Aja (West Africa).
  • Code: ajp. Saw name: South Levantine අරාබි. Expected name: South Levantine Arabic.
  • Code: apc. Saw name: North Levantine අරාබි. Expected name: North Levantine Arabic.
  • Code: ary. Saw name: Moroccan අරාබි. Expected name: Moroccan Arabic.
  • Code: arz. Saw name: ඊජිප්තු අරාබි. Expected name: Egyptian Arabic.
  • Code: ayl. Saw name: Libyan අරාබි. Expected name: Libyan Arabic.
  • Code: bcl. Saw name: Bikol Central. Expected name: Central Bikol.
  • Code: bln. Saw name: Southern Catanduanes Bicolano. Expected name: Southern Catanduanes Bikol.
  • Code: br. Saw name: Breton. Expected name: බ්‍රෙටන්.
  • Code: bto. Saw name: Iriga Bicolano. Expected name: Rinconada Bikol.
  • Code: cjo. Saw name: Ashéninka Pajonal. Expected name: Pajonal Ashéninka.
  • Code: cmn-ear. Saw name: Early මැන්ඩරීන්. Expected name: Early Mandarin.
  • Code: cts. Saw name: Northern Catanduanes Bicolano. Expected name: Northern Catanduanes Bikol.
  • Code: cy. Saw name: Welsh. Expected name: වේල්ස.
  • Code: dra-okn. Saw name: Old කන්නඩ. Expected name: Old Kannada.
  • Code: dum. Saw name: Middle ඕලන්ද. Expected name: Middle Dutch.
  • Code: enm. Saw name: Middle ඉංග්‍රීසි. Expected name: මධ්‍යකාලීන ඉංග්‍රීසි.
  • Code: euq-pro. Saw name: Proto-බාස්ක්. Expected name: ප්‍රොටෝ-බාස්ක්.
  • Code: fbl. Saw name: West Albay Bikol. Expected name: West Miraya Bikol.
  • Code: fr-CA. Saw name: Canadian ප්‍රංශ. Expected name: Canadian French.
  • Code: frk. Saw name: Proto-West ජර්මානුic. Expected name: ප්‍රොටෝ-බටහිර ජර්මානු.
  • Code: frm. Saw name: Middle ප්‍රංශ. Expected name: මධ්‍යකාලීන ප්‍රංශ.
  • Code: fro-nor. Saw name: Old Northern ප්‍රංශ. Expected name: Old Northern French.
  • Code: gd. Saw name: Scottish Gaelic. Expected name: ස්කොට්ස් ගේලික්.
  • Code: gem. Saw name: ජර්මානුic. Expected name: ජර්මානු.
  • Code: gem-pro. Saw name: Proto-ජර්මානුic. Expected name: ප්‍රොටෝ-ජර්මානු.
  • Code: gim. Saw name: Gimi (Goroka). Expected name: Gimi (Papuan).
  • Code: gkm. Saw name: Byzantine ග්‍රීක. Expected name: Byzantine Greek.
  • Code: gmh. Saw name: Middle High ජර්මානු. Expected name: Middle High German.
  • Code: gml. Saw name: Middle Low ජර්මානු. Expected name: Middle Low German.
  • Code: gmq. Saw name: North ජර්මානුic. Expected name: North Germanic.
  • Code: gmq-mno. Saw name: Middle නෝර්වීජියානු. Expected name: Middle Norwegian.
  • Code: gmq-oda. Saw name: Old ඩෙන්මාර්ක. Expected name: Old Danish.
  • Code: gmq-osw. Saw name: Old ස්වීඩන්. Expected name: Old Swedish.
  • Code: gmw-ecg. Saw name: East Central ජර්මානු. Expected name: East Central German.
  • Code: gmw-jdt. Saw name: Jersey ඕලන්ද. Expected name: Jersey Dutch.
  • Code: gmw-pro. Saw name: Proto-West ජර්මානුic. Expected name: ප්‍රොටෝ-බටහිර ජර්මානු.
  • Code: gmy. Saw name: Mycenaean ග්‍රීක. Expected name: Mycenaean Greek.
  • Code: gn. Saw name: Guaraní. Expected name: Guarani.
  • Code: goh. Saw name: Old High ජර්මානු. Expected name: Old High German.
  • Code: grk-mar. Saw name: Mariupol ග්‍රීක. Expected name: Mariupol Greek.
  • Code: gsw. Saw name: Alemannic ජර්මානු. Expected name: Alemannic German.
  • Code: gug. Saw name: Paraguayan Guaraní. Expected name: Paraguayan Guarani.
  • Code: gun. Saw name: Mbyá Guaraní. Expected name: Mbya Guarani.
  • Code: gv. Saw name: Manx. Expected name: මැන්ක්ස්.
  • Code: hmn-pro. Saw name: Proto-Hmong. Expected name: Proto-Hmongic.
  • Code: idb. Saw name: Indo-පෘතුගීසි. Expected name: Indo-Portuguese.
  • Code: inc-ash. Saw name: Ashokan ප්‍රාකෘත. Expected name: අශෝක ප්‍රාකෘත.
  • Code: itc-ola. Saw name: Old ලතින්. Expected name: පුරාතන ලතින්.
  • Code: itc-pro. Saw name: Proto-Italic. Expected name: ප්‍රොටෝ-ඉතාලිකා.
  • Code: kaw. Saw name: Old ජාවා. Expected name: Old Javanese.
  • Code: kbd. Saw name: Kabardian. Expected name: East Circassian.
  • Code: kw. Saw name: Cornish. Expected name: කෝනිෂ්.
  • Code: kxd. Saw name: Brunei මැලේ. Expected name: Brunei Malay.
  • Code: la-ecc. Saw name: Ecclesiastical ලතින්. Expected name: Ecclesiastical Latin.
  • Code: la-lat. Saw name: Late ලතින්. Expected name: Late Latin.
  • Code: la-med. Saw name: Medieval ලතින්. Expected name: Medieval Latin.
  • Code: la-vul. Saw name: Vulgar ලතින්. Expected name: Vulgar Latin.
  • Code: lng. Saw name: Old High ජර්මානු. Expected name: Old High German.
  • Code: ltc. Saw name: Middle චීන. Expected name: Middle Chinese.
  • Code: men. Saw name: Mende. Expected name: Mende (Sierra Leone).
  • Code: meo. Saw name: Kedah මැලේ. Expected name: Kedah Malay.
  • Code: mga. Saw name: Middle අයිරිෂ්. Expected name: Middle Irish.
  • Code: mi. Saw name: Maori. Expected name: Māori.
  • Code: ml. Saw name: මැලේalam. Expected name: මලයාලම්.
  • Code: ms-cla. Saw name: Classical මැලේ. Expected name: Classical Malay.
  • Code: ms-old. Saw name: Old මැලේ. Expected name: Old Malay.
  • Code: nb. Saw name: නෝර්වීජියානු Bokmål. Expected name: Norwegian Bokmål.
  • Code: nds. Saw name: Low ජර්මානු. Expected name: Low German.
  • Code: nds-de. Saw name: ජර්මානු Low ජර්මානු. Expected name: German Low German.
  • Code: nds-nl. Saw name: ඕලන්ද Low Saxon. Expected name: Dutch Low Saxon.
  • Code: nn. Saw name: නෝර්වීජියානු Nynorsk. Expected name: Norwegian Nynorsk.
  • Code: nod. Saw name: Northern තායි. Expected name: Northern Thai.
  • Code: obr. Saw name: Old බුරුම. Expected name: Old Burmese.
  • Code: och. Saw name: Old චීන. Expected name: Old Chinese.
  • Code: odt. Saw name: Old ඕලන්ද. Expected name: Old Dutch.
  • Code: oge. Saw name: Old ජෝර්ජියානු. Expected name: Old Georgian.
  • Code: ohu. Saw name: Old හංගේරියානු. Expected name: Old Hungarian.
  • Code: ojp. Saw name: Old ජපන්. Expected name: Old Japanese.
  • Code: okm. Saw name: Middle කොරියානු. Expected name: Middle Korean.
  • Code: oko. Saw name: Old කොරියානු. Expected name: Old Korean.
  • Code: osp. Saw name: Old ස්පාඤ්ඤ. Expected name: පුරාතන ස්පාඤ්ඤ.
  • Code: ota. Saw name: Ottoman තුර්කි. Expected name: Ottoman Turkish.
  • Code: pal. Saw name: Middle පර්සියානු. Expected name: මධ්‍යකාලීන පර්සියානු.
  • Code: pdc. Saw name: Pennsylvania ජර්මානු. Expected name: Pennsylvania German.
  • Code: peo. Saw name: Old පර්සියානු. Expected name: Old Persian.
  • Code: phl. Saw name: Phalura. Expected name: Palula.
  • Code: plu. Saw name: පාලිkur. Expected name: Palikur.
  • Code: poz-cet-pro. Saw name: Proto-Central-Eastern මැලේo-Polynesian. Expected name: Proto-Central-Eastern Malayo-Polynesian.
  • Code: poz-mcm-pro. Saw name: Proto-මැලේo-Chamic. Expected name: Proto-Malayo-Chamic.
  • Code: poz-mly-pro. Saw name: Proto-මැලේic. Expected name: Proto-Malayic.
  • Code: poz-msa-pro. Saw name: Proto-මැලේo-Sumbawan. Expected name: Proto-Malayo-Sumbawan.
  • Code: poz-pro. Saw name: Proto-මැලේo-Polynesian. Expected name: Proto-Malayo-Polynesian.
  • Code: pqe-pro. Saw name: Proto-Eastern මැලේo-Polynesian. Expected name: Proto-Eastern Malayo-Polynesian.
  • Code: rbl. Saw name: Miraya Bikol. Expected name: East Miraya Bikol.
  • Code: rm. Saw name: Romansch. Expected name: Romansh.
  • Code: rmf. Saw name: Kalo ෆින්ලන්ත Romani. Expected name: Kalo Finnish Romani.
  • Code: rmg. Saw name: Traveller නෝර්වීජියානු. Expected name: Traveller Norwegian.
  • Code: roa-opt. Saw name: Old Galician-පෘතුගීසි. Expected name: Old Galician-Portuguese.
  • Code: ruo. Saw name: Istro-රුමේනියානු. Expected name: Istro-Romanian.
  • Code: ruq. Saw name: Megleno-රුමේනියානු. Expected name: Megleno-Romanian.
  • Code: sa-ved. Saw name: Vedic සංස්කෘත. Expected name: Vedic Sanskrit.
  • Code: sga. Saw name: Old අයිරිෂ්. Expected name: Old Irish.
  • Code: sit. Saw name: Sino-ටිබෙට්. Expected name: Sino-Tibetan.
  • Code: sit-pro. Saw name: Proto-Sino-ටිබෙට්. Expected name: Proto-Sino-Tibetan.
  • Code: sou. Saw name: Southern තායි. Expected name: Southern Thai.
  • Code: tbq-lob-pro. Saw name: Proto-Lolo-බුරුම. Expected name: Proto-Lolo-Burmese.
  • Code: tbw. Saw name: Tagbanwa. Expected name: Aborlan Tagbanwa.
  • Code: trk-oat. Saw name: Old Anatolian තුර්කි. Expected name: Old Anatolian Turkish.
  • Code: xaa. Saw name: Andalusian අරාබි. Expected name: Andalusian Arabic.
  • Code: xcl. Saw name: Old ආමේනියානු. Expected name: Old Armenian.
  • Code: yai. Saw name: Yagnobi. Expected name: Yaghnobi.
  • Code: yrk. Saw name: Tundra Nenets. Expected name: Nenets.
  • Code: zle-mbe. Saw name: Middle බෙලරුසියානු. Expected name: Middle Belarusian.
  • Code: zlw-ocs. Saw name: Old චෙක්. Expected name: Old Czech.
  • Code: zlw-opl. Saw name: Old පෝලන්ත. Expected name: Old Polish.
  • The canonical name පුරාතන ලතින් (itc-ola) is missing.
  • Old Latin, the canonical name for the code itc-ola, is wrong; it should be පුරාතන ලතින්.
  • The canonical name බ්‍රසීල පෘතුගීසි (pt-BR) is missing.
  • Brazilian Portuguese, the canonical name for the code pt-BR, is wrong; it should be බ්‍රසීල පෘතුගීසි.
  • Old Latin, the canonical name for the code itc-ola, is wrong; it should be පුරාතන ලතින්.
  • Brazilian Portuguese, the canonical name for the code pt-BR, is wrong; it should be බ්‍රසීල පෘතුගීසි.
  • West Circassian (ady) has its canonical name ("West Circassian") repeated in the table of aliases.
  • Magi (gkd) has its canonical name ("Magi") repeated in the table of aliases.
  • Lele (New Guinea) (lle) has its canonical name ("Lele (New Guinea)") repeated in the table of aliases.
  • The sort_key field in the data table for Lak (lbe) specifies the module Module:lbe-sortkey, which does not exist.
  • The sort_key field in the data table for Laboya (lmy) specifies the module Module:lmy-sortkey, which does not exist.
  • The translit field in the data table for Limbu (lif) specifies the module Module:lif-translit, which does not exist.
  • The translit field in the data table for Laki (lki) specifies the module Module:lki-translit, which does not exist.
  • Palula (phl) has its canonical name ("Palula") repeated in the table of aliases.
  • The translit field in the data table for Pengo (peg) specifies the module Module:kxv-translit, which does not exist.
  • The sort_key field in the data table for Udmurt (udm) specifies the module Module:udm-sortkey, which does not exist.
  • The sort_key field in the data table for Ulch (ulc) specifies the module Module:ulc-sortkey, which does not exist.
  • The sort_key field in the data table for Ubykh (uby) specifies the module Module:uby-sortkey, which does not exist.
  • Ura (New Guinea) (uro) has its canonical name ("Ura (New Guinea)") repeated in the table of aliases.
  • The translit field in the data table for Kamassian (xas) specifies the module Module:xas-translit, which does not exist.
  • The translit field in the data table for Chuvan (xcv) specifies the module Module:xcv-translit, which does not exist.
  • Koromu (xes) has its canonical name ("Koromu") repeated in the table of aliases.

Checks performed

[සංස්කරණය]

For multiple data modules:

  • Codes for languages, families and etymology-only languages must be unique and cannot clash with one another.
  • Canonical names for languages, families, and etymology-only languages must not be found in the list of other names.
  • Each name in the list of other names must appear only once.
  • otherNames, if present, must be an array.
  • Wikidata item IDs must be a positive integer or a string starting with Q and ending with decimal digits.

The following must be true of the data used by Module:languages:

  • Each code must be defined in the correct submodule according to whether it is two-letter, three-letter or exceptional.
  • The canonical name (field 1) must be present and must not be the same as the canonical name of another language.
  • If field 2 is not nil, it must a valid Wikidata item ID.
  • If field 3 or family is given and not nil, it must be a valid family code.
  • If field 4 or scripts is given and not nil, it must be an array, and each string in the array must be a valid script code.
  • If ancestors is given, it must be an array, and each string in the array must be a valid language or etymology language code.
  • If family is given, it must be a valid family code.
  • If type is given, it must be one of the recognised values (regular, reconstructed, appendix-constructed).
  • If entry_name is given, it must be a table that contains either two arrays (from and to) or a string (remove_diacritics) or both.
  • If sort_key is given, it may either be a string, or at table that in turn contains either two arrays (from and to) or a string (remove_diacritics).
  • If entry_name or sort_key is given, the from array must be longer or equal in length to the to array.
  • If standard_chars is given, it must form a valid Lua string pattern when placed between square brackets with ^ before it ("[^...]). (It should match all characters regularly used in the language, but that cannot be tested.)
  • If override_translit is set, translit must also be set, because there must be a transliteration module that can override manual transliteration.
  • If link_tr is present, it must be true.
  • Have no data keys besides these: 1, 2, 3, "entry_name", "sort_key", "display", "otherNames", "aliases", "varieties", "type", "scripts", "ancestors", "wikimedia_codes", "wikipedia_article", "standard_chars", "translit", "override_translit", "link_tr".

Checks not performed:

  • If translit is present, it should be the name of a module, and this module should contain a tr function that takes a pagename (and optionally a language code and script code) as arguments.
  • If sort_key is a string, it should be the name of a module, and this module should contain a makeSortKey function that takes a pagename (and optionally a language code and script code) as arguments.
  • If entry_name or sort_key is a table and contains a field remove_diacritics, the value of the field should be a string that forms a valid Lua pattern when it is placed inside negated set notation ([^...]).

These are not checked here, because module errors will quickly crop up in entries if these conditions are not met, assuming that Module:utilities attempts to generate a sortkey for a category pertaining to the language in question, or full_link attempts to use the transliteration module.

Module:languages/code to canonical name and Module:languages/canonical names must contain all the codes and canonical names found in the data submodules of Module:languages, and no more.

The following must be true of the data used by Module:etymology languages:

  • canonicalName must be given.
  • parent must be given must be a valid language, family or etymology-only language code.
  • If ancestors is given, it must be an array, and each string in the array must be a valid language or etymology language code. The etymology language should also be listed as the ancestor of a regular language.
  • Have no data keys besides these: "canonicalName", "otherNames", "parent", "ancestors", "wikipedia_article", "wikidata_item".

Codes in Module:families data must:

  • Have canonicalName, which must not be the same as the canonical name of another family.
  • If family is given, it must be a valid family code.
  • Have at least one language or subfamily belonging to it.
  • Have no data keys besides these: "canonicalName", "otherNames", "family", "protoLanguage", "wikidata_item".

Codes in Module:scripts data must:

  • Have canonicalName.
  • Have at least one language that lists it as one of its scripts.
  • Have a characters pattern for script autodetection, and this must form a valid Lua string pattern when placed between square brackets ("[...]"). (It should match all characters in the script, but that cannot be tested.)
  • Have no data keys besides these: "canonicalName", "otherNames", "parent", "systems", "wikipedia_article", "characters", "direction".
"https://si.wiktionary.org/w/index.php?title=Module:data_consistency_check/documentation&oldid=228605" වෙතින් සම්ප්‍රවේශනය කෙරිණි