Wiktionary talk:About Japanese
Archives |
---|
|
Updates needed for {{ja-readings}}?
I've been filling in the yomi of a number of kanji entries lately, and I've run into some structural limitations of the {{ja-readings}} template. For single kanji used for verbs, the kun'yomi section in particular can become ridiculously large and visually messy, as can be seen at 結#Japanese. I've been trying to use sane wiki markup so anyone coming after me can more easily see what's going on, like the following:
====Readings==== * {{ja-readings | on=<!-- -->[[けつ]] (''[[ketsu]]''), <!-- -->[[けい]] (''[[kei]]'') | kanyoon=<!-- -->[[結]] ([[けち]] ''[[kechi]]'') to win an [[archery]] competition; to claim undecided territory in the [[endgame]] of [[go#Etymology_2|go]], <!-- -->[[結する]] ([[けっする]], ''[[kessuru]]'') to become [[constipated]]; to [[tie up]] or [[conclude]] an [[argument]] or stated position, <!-- -->[[結す]] ([[けっす]], ''[[kessu]]'') alternate for 結する | kun=<!-- -->[[結ぶ]] ([[むすぶ]] ''[[musubu]]''), <!-- -->[[結び]] ([[むすび]] ''[[musubi]]''), <!-- -->[[結ばる]] ([[むすばる]] ''[[musubaru]]''), <!-- -->[[結ばわる]] ([[むすばわる]] ''[[musubawaru]]''), <!-- -->[[結ぼる]] ([[むすぼる]] ''[[musuboru]]''), <!-- -->[[結ぼうる]] ([[むすぼうる]] ''[[musubōru]]''), <!-- -->[[結ぼれる]] ([[むすぼれる]] ''[[musuboreru]]''), <!-- -->[[結う]] ([[ゆう]] ''[[yuu]]''), <!-- -->[[結い]] ([[ゆい]] ''[[yui]]''), <!-- -->[[結わう]] ([[ゆわう]] ''[[yuwau]]''), <!-- -->[[結わえる]] ([[ゆわえる]] ''[[yuwaeru]]''), <!-- -->[[結える]] ([[いわえる]] ''[[iwaeru]]'') alternate for 結わえる, <!-- -->[[結く]] ([[いわく]] ''[[iwaku]]'') alternate for 結わえる, <!-- -->[[結く]] ([[すく]] ''[[suku]]'') to [[knit]] a [[net]], <!-- -->[[結なす]] ([[かたなす]] ''[[katanasu]]'') to gather or tie together into one bunch, <!-- -->[[結める]] ([[かためる]] ''[[kataneru]]'') to bind together; to open and read out the content of official documents, <!-- -->[[結ぬ]] ([[かたぬ]] ''[[katanu]]'') alternate for 結ねる | nanori= }}
The 結#Japanese example is plug ugly, and hard to read, but all of the information there is proper to include as best I can tell, and does indeed belong in the list of kun'yomi. What I'd like is for the {{ja-readings}} template to show readings in a bulleted list, for a cleaner presentation and easier usability.
Instead of this:
Readings
|
... I'd rather see something like this:
ReadingsOn:
Kun:
|
Ideally, the template would also allow folks to input multiple readings with each on its own line, as in the 結#Japanese code sample above but minus the crutch of <!-- --> HTML comments --- but that's probably asking too much, given what I've seen of template syntax (yech!).
I'm hoping there's someone reading this page who has the requisite template expertise to implement this change. If I hear nothing in, say, a week or two, I may have a go at making the change myself. :) -- Cheers, Eiríkr Útlendi | Tala við mig 23:12, 1 February 2011 (UTC)
- There's been no comment for the last half-year, so I'll start looking into changing the template. -- Ta, Eiríkr Útlendi | Tala við mig 16:25, 4 August 2011 (UTC)
- Turns out the template is locked. I've posted on Template_talk:ja-readings#Formatting_when_the_list_of_yomi_gets_long in an attempt to get some momentum going. -- Eiríkr Útlendi | Tala við mig 17:16, 4 August 2011 (UTC)
- I've replied there. - -sche (discuss) 02:14, 7 September 2011 (UTC)
- @Eirikr I'm sure everyone would agree that your proposed format would be a vast improvement to what the ja-readings template gives us. I've read the comments on the template page and it sounds like the change you are proposing to this template will not be forthcoming for various reasons which I can't understand, but is there anything that should prevent us from reformatting these without a template, as you have above? 馬太阿房 (talk) 16:35, 21 April 2017 (UTC)
- I've replied there. - -sche (discuss) 02:14, 7 September 2011 (UTC)
romanizing -suru verbs
I was wondering what the word is on how to romanize -suru verbs like 勉強する, that is, as benkyō suru or benkyōsuru. According to the supplied example, 監督する, there is a space, as there is for 勉強する. I would assume there should be one since there is a space for -na adjectives as well. On the other hand, when I casually looked at a number of other type-3 verbs, all of them had no spaces. Maybe I missed something but I couldn't find anything that explicitly says if there should be a space or not. I don't have a preference one way or the other, but it seems to me that the dictionary ought to be consistent, so is it safe to assume that the entries without spaces should be edited to include them? thanks! Haplology 16:29, 10 February 2011 (UTC)
- A bit late in replying, but I'd put my 2p on including the space. This makes it clear that the core part of the word (the bit in kanji) is distinct. After all, only the する part conjugates, and both the core part and する are indeed distinct words unto themselves. A number of Japanese publications I've seen that use romaji will leave out the space, but I think this is primarily in reflection of the lack of spaces in Japanese writing. Latin-alphabet writing needs spaces for clearer visual parsing, in part as we don't have the nice kanji-vs.-kana visual distinction to rely upon. -- Cheers, Eiríkr Útlendi | Tala við mig 16:22, 4 August 2011 (UTC)
- Really old thread that probably doesn't matter anymore but I for one like to attach -suru w/o a space to enforce the idea that 勉強する as a whole is a verb... —suzukaze (t・c) 04:24, 28 August 2016 (UTC)
Ateji and rare readings
A question for the group, here --
Is there any consensus on how best to handle nonstandard ateji or otherwise rare pronunciations?
- One case in point is 矢#Japanese, which lists ちかう and つらねる as readings. I cannot find these readings in any dictionary I have to hand, but in a thread over at RFV, User:Haplology describes finding these readings listed as rare alternates (意読). I can kind of see how someone might opt to use 矢 to spell these words, nonstandard as it is.
- Another example is 神#Kanji, which includes the reading たましい. I've only ever seen たましい spelled in kanji as either 魂 or (more rarely) 魄, but I could imagine 神 being used instead as an 意読.
So, do we remove such rarities? Do we keep them, but mark them? If so, how? Is there some sort of threshold for frequency of use before we include an 意読 for a particular kanji word?
Any insight appreciated. -- Eiríkr Útlendi | Tala við mig 16:14, 4 August 2011 (UTC)
Lemma forms for keiyōdōshi
Can anyone elucidate the reasoning behind including the な on the end for keiyōdōshi lemma forms? This な is essentially a particle, and is in no way integral to the word, as can be seen by swapping this for に to create the adverbial, or for だ to create the terminal. It would seem to make much more sense to use the root form of a keiyōdōshi, i.e. the form without the な, as the lemma -- as, indeed, do all other dictionaries that I'm aware of. -- Eiríkr Útlendi | Tala við mig 20:19, 22 August 2011 (UTC)
- I agree. Haplology 16:34, 23 August 2011 (UTC)
Work needed on Template:ja-na
Please have a look at Template_talk:ja-na#Redesign needed to deal with adjectives that have no kanji and respond as appropriate. I am happy to implement the changes myself, so feel free to give your opinion even if you aren't up on template syntax. -- TIA, Eiríkr Útlendi | Tala við mig 17:36, 25 August 2011 (UTC)
Work Needed
(Copied over from WT:Beer_parlour#WT:About_Japanese)
Following comments in various other threads, it appears that the WT:AJA page needs some work. The issues I'm immediately aware of:
- Quasi-adjectives (な adjectives): WT:AJA insists on including the な in the headword, which does not appear to be the current consensus.
- の adjectives: WT:AJA does not include any clear guidelines for these. (Relatedly,
{{ja-adj}}
doesn't include any way of handling these either.) - Suru compound verbs: WT:AJA calls for using the
{{ja-suru}}
template. However, する is a standalone verb, so including the する conjugation on each and every compound verb page seems excessive. {{ja-kanjitab}}
: WT:AJA describes including this under an=== Etymology ===
section if there is one, but including under the main== Japanese ==
section produces largely identical results, unless there are multiple etymology sections, in which case repeating the kanjitab seems excessive.- The Transliteration subpage could also use some work, particularly with regard to spacing and what constitutes a single word in Japanese (i.e., particles should be separate, suru should be separate, etc. etc.).
- 連体詞: WT:AJA states that this should be given a POS of "prefix", but that is really not what these words are -- a prefix is part of a word, whereas 連体詞 are clearly standalone words. They are less prefixes and more like true adjectives, in that they must precede a noun.
- Single-kanji entries: WT:AJA has no clear instructions on how to specify okurigana in kun'yomi listings, nor any clear instructions on how to format these to link to verb forms. For instance, 食 shows one way of clarifying okurigana and linking to kanji+okurigana entries, but is a bit visually messy; ja:食#日本語 looks a bit cleaner with the use of hyphens to show the break between the kanji and the okurigana, and this roughly matches the format I've most often seen in dead-tree dictionaries, but the entry doesn't link to any kanji+okurigana entries, just to the hiragana entries; and 飲 doesn't show okurigana or link to any kanji+okurigana entries.
This post is really just meant to get the ball rolling. Many of these changes listed above are a departure from what WT:AJA currently says, so I'm hoping to spark a bit of discussion before making any edits. -- TIA, Eiríkr Útlendi | Tala við mig 17:41, 6 September 2011 (UTC)
- Regarding your first point: you're proposing to remove な from the address of the page, not just the headword, correct? (You're proposing to move 浅はかな to 浅はか, and to change the headword from 浅はかな (な-na declension, hiragana あさはかな, romaji asahaka na) to 浅はか (な-na declension, hiragana あさはか, romaji asahaka)?) Do any other changes need to be made to quasi-adjective entries? For example, do the declension tables need to be modified? I'm trying to ascertain how difficult it would be to make the change by bot. It seems it would be simple (move the page and eg change "な|rom" and " na}}" to "|rom" and "}}"), and you could write a bot or ask one of our technically-skilled editors to write one for you. The only comments I've seen in discussions of this subject have supported removal of the な, so I would say there's consensus for the change.
- Regarding の adjectives: can you give an example of one?
- Regarding Suru compound verbs: is there any harm in giving the conjugation? On de. and en.Wikt, we give eg the conjugation of anhalten and zurückhalten, even though it is merely the conjugation of halten + an/zurück. The code to generate the conjugation table appears to use only information that is already elsewhere in the entry, so including the template seems not to require the creator of an entry to look up any more information than (s)he has already had to look up to determine the page title and write the
{{ja-verb}}
headword line. I would keep the conjugation tables in all of the entries.- In a later point, you seem to suggest considering suru a separate word. Would you propose deleting the Suru verbs as SOP at that point?
- Isn't [[:ja:食#日本語]] an interwiki link to ja.Wikt? What did you mean to write? - -sche (discuss) 01:42, 7 September 2011 (UTC)
- Hello -sche, I've taken the liberty of changing the bullets in your reply to numbers for easier reference. My correspondingly numbered replies below:
- Yes, the main lemma entry should be the form without the な - so 浅はか would be the main page, and 浅はかな would mostly just point to 浅はか, much as any other entry for a conjugated word form mostly just points to the main headword. As far as I can tell, the only changes needed would be to the headwords and related minutiae; it would probably be bot-able. Moving from [quasi]+な to just [quasi] would be the easiest option. I don't think declination tables need any changing at all; in fact, they're partly what got me thinking about the change, since they include the adjectival な forms, but also the adverbial に forms, among others, making a lemma with no following particle the more natural place to put such information. Moreover, all other dictionaries I've ever used do not include the な on the end in any headwords.
- Do you know of any good resource or tutorial pages in the MediaWiki universe here that describe how to make a bot?
- Just off the top of my head (entries I've worked on recently), の adjective examples include 鄙 (とひと) and でぶでぶ. Conjugation would be mostly the same as for な adjectives, but I'd have to go through my references to tell you the exact differences.
- No harm in including the する conjugation. There are simply *so many* more of these types of verbs as there are of any one type of verb in German or English that things start to get kind of silly with the repetition, but no, there's no real harm in having it.
- And yes, する is a standalone verb in its own right, which simply means "to do", so by that measure, [noun]+する pages would indeed be SOP. However, it is important to be able to note which nouns can be used in verbal ways. From an aesthetic perspective, it'd be much more graceful to include [noun]+する information right on the [noun] page, and sending the user to the する page for information on how that verb is conjugated. That's perhaps too much to bother with for a bot, though, I'm not sure.
FWIW, other Japanese dictionaries (either JA-JA or JA<>EN) list just the [noun] entries, and mark within them whether the noun can take する -- there are no [noun]+する headwords in any other dictionary that I've ever seen.
- And yes, する is a standalone verb in its own right, which simply means "to do", so by that measure, [noun]+する pages would indeed be SOP. However, it is important to be able to note which nouns can be used in verbal ways. From an aesthetic perspective, it'd be much more graceful to include [noun]+する information right on the [noun] page, and sending the user to the する page for information on how that verb is conjugated. That's perhaps too much to bother with for a bot, though, I'm not sure.
- The [[ja:食#日本語]] bit is indeed a link to the Japanese Wiktionary, specifically to the 日本語 (Japanese) heading on the 食 page. That was intended to provide an example of how the JA WT folks are formatting their entries with regard to okurigana - something that we don't have any official policy or plan for.
- Yes, the main lemma entry should be the form without the な - so 浅はか would be the main page, and 浅はかな would mostly just point to 浅はか, much as any other entry for a conjugated word form mostly just points to the main headword. As far as I can tell, the only changes needed would be to the headwords and related minutiae; it would probably be bot-able. Moving from [quasi]+な to just [quasi] would be the easiest option. I don't think declination tables need any changing at all; in fact, they're partly what got me thinking about the change, since they include the adjectival な forms, but also the adverbial に forms, among others, making a lemma with no following particle the more natural place to put such information. Moreover, all other dictionaries I've ever used do not include the な on the end in any headwords.
- Hope this helps explain things. -- Cheers, Eiríkr Útlendi | Tala við mig 05:47, 7 September 2011 (UTC)
- Thanks for the clarifications!
- Wikipedia has w:Wikipedia:Creating a bot. I myself know little about bots.
- Editing
{{ja-adj}}
to handle の adjectives seems to be the simplest of these issues (because the template requires relatively few parameters and displays relatively little information, for example no declined forms). I think the only change that needs to be made is to make the template accept "no" (and の?) as an answer to "decl=", and display "の-no declension"... right? I think you could go ahead and make that improvement to the template; we may still lack a template like{{ja-suru}}
that produces the conjugated forms, but because many entries lack conjugation sections, I do not think it is necessary to design a の-conjugation-template before updating the headword-line template. - I'd like to keep the definition-lines currently in the [noun]+する entries, because they do vary in form/meaning at least slightly (失礼する = "to be rude", but 旅行する = "to travel, to make a journey"). I do like the idea of listing such information in the noun entries (indeed, even if the compounds are kept!) — perhaps like this or this?
- Oh, sorry; I thought you meant [[:ja:食#日本語]] and {{l|ja|食}} were alternative ways of linking to entries! I misunderstood (and still do not understand, ha) that issue. - -sche (discuss) 07:48, 7 September 2011 (UTC)
- Thanks for the clarifications!
- Hallo noch einmal, bevor ich schlaffe --
- Ich sehe auf Deiner Benutzerseite daß Du deutsch sprichst, aber vielleicht ließt Du auch japanisch? Ich weiß gar nicht ob ich diese Romaji auch schreiben soll, aber ich will doch nicht 失礼する wenn Du vielleicht Romaji brauchest. :) -- Eiríkr Útlendi | Tala við mig 08:10, 7 September 2011 (UTC)
- And about the ja wikt and en wikt bits, that was just about contrasting how the en.wikt entry for 食 looks for the on'yomi and kun'yomi versus how the ja.wikt entry looks. The ja entry clearly delineates where the kanji pronunciation ends and the okurigana begin, whereas the en entry doesn't -- which is a bit of a failing. -- Cheers, er, Tschüß, Eiríkr Útlendi | Tala við mig 08:10, 7 September 2011 (UTC)
- I agree with the consensus on these changes to WT:AJA. There are a couple of other questions I want to add:
- Is there some way we can indicate that an adverb takes the particle -と (-to)? It is so common that perhaps it ought to be in the headword template, but I don't think there's a field for it in ja-pos.
- I don't have a preference either way but it would be nice if AJA were clear about how to format counters, specifically if they take a hyphen, like -匹, or if they have none. It says "e.g., -本", which looks like 本 plus a hyphen at first glance, but the link itself has no hyphen. At least it should be rewritten for clarity.
- Speaking of bots, could we make a bot to add or fix hidx? It's completely mechanical and uncontroversial, but hard for newbies to pick up and easy for anyone to forget. I've noticed that there's a lot of variation in how it's used.
- Thanks Haplogy 13:35, 9 September 2011 (UTC)
- I agree with the consensus on these changes to WT:AJA. There are a couple of other questions I want to add:
- Hello,
- Thanks to Eiríkr for pointing me to this discussion. If no one objects, I'd like to get the ball rolling on changing the な-type adjectives with this change to WT:AJA#Quasi-adjectives:
== Quasi-adjectives == The main entry for quasi-adjectives should be in the 'plain' or 'root' form: === Adjective ==={{ja-adj|k|decl=な|hira=(kana)|rom=(romaji)}}
E.g. 平安 (heian) has a level 3 section like this: === Adjective ==={{ja-adj|k|decl=な|hira=へいあん|rom=heian}}
This should be followed by the definition(s), and then the declension table using template {{ja-na}}
.
Note that the “plain form” in this case is also a noun. This should not be a problem; just as bet is both a verb and a noun, 平安 is both a noun and an adjective.
- Does this look good? (Sorry the formatting is awkward, I wanted it all to be in that grey box thing.) -MichaelLau 01:58, 10 September 2011 (UTC)
- Looks good to me. Haplogy 05:19, 11 September 2011 (UTC)
Note that the “plain form” in this case is also a noun for certain words. This should not be a problem; just as bet is both a verb and a noun, 平安 is both a noun and an adjective.
- And then there's also the various ways in which they conjugate - some take な and some take の to become adjectives, some take に and others take と to become adverbs - which we need to build into the template (the な・に format is already built in). A few oddballs appear to do both in one way or another, such as 常・恒, for which I can find examples of use as an adjective with both な and の.
- Food for thought, anyway. I'm glad this discussion is happening. What would folks say to one of us creating a copy of the current version of WT:AJA, maybe by creating a new page at WT:About_Japanese/Draft or somewhere similar and just copying the content of WT:AJA over, and then we can start collaboratively editing the draft version? -- Eiríkr Útlendi | Tala við mig 07:24, 11 September 2011 (UTC)
- I made the /draft page. I think there are probably a lot of ways to change this to make it easier to navigate also. For instance, people should know whether they are interested in contributing to classical Japanese, so all those sections on classical Japanese can be extracted and made their own page or section without cluttering up the page for everyone else. -MichaelLau 14:26, 12 September 2011 (UTC)
- I made what I think is the minor change of changing [[lemma]]: to {{ja-def|lemma}} to the links under section 3.1 Non-lemma forms. Haplogy 14:46, 12 September 2011 (UTC)
- Brilliant, thank you Michael and Haplology! I'm creating the Wiktionary_talk:About_Japanese/Draft page to discuss edits to the draft. -- Cheers, Eiríkr Útlendi | Tala við mig 16:53, 12 September 2011 (UTC)
link to separate characters in the headword
Wiktionary:Feedback#.E7.AB.AF.E6.9C.AB could be implemented by adding a head= paramter to the Japanece templates, and then setting head=端末. We should keep the box, because it displays the kanji in a large, legible font, but is there a reason not to also link them in the headword? (Oh, maybe blue/red font is harder to read, especially if one character is blue and the other is red.) - -sche (discuss) 19:41, 10 October 2011 (UTC)
- I think it would be more approprate for multipart terms (see my change to ロシア連邦, not for each character (kanji or kana) but that just IMHO. I changed ジェシカ because it looked ugly to me. We do have a kanji box, adding the same in the header would be redundant, again, IMHO. --Anatoli 23:23, 10 October 2011 (UTC)
~々
How should we format words made up of two identical kanji, like 次々? "~々" is the only one you can find outside of dictionaries, hence probably more likely to be searched for. On the other hand, "次次" is the real word, in a sense, and all the dictionaries that I can find list these words as "次次" rather than "次々". I'm leaning toward making "次次" the lemma entry, and listing "次々" as an alternative form using {{alternative form of}}
. The link at tsugitsugi would point to 次次. What does everyone else think? Haplology 05:27, 11 December 2011 (UTC)
- I prefer 次々, as people search for the most common spelling, not the "correct" one. I wouldn’t write 次次, because it just looks wrong. (次次回 jijikai is okay though I prefer 次々回.) If we follow paper dictionaries, we should have つぎつぎ as main entry, but it is not the case here on Wiktionary. — TAKASUGI Shinji (talk) 00:42, 12 December 2011 (UTC)
- I'm with Takasugi-san here that the lemma should be under the most common rendering, 次々 in this case. That said, I think we should also have a 次次 entry, pointing back to 次々, in the interests of completeness and in case anyone does look up the doubled form. -- Eiríkr Útlendi │ Tala við mig 17:18, 7 March 2012 (UTC)
Fullwidth alphabet letters and digits, and halfwidth katakana
Atitarev and I had a discussion about fullwidth digits in User talk:Atitarev#Fullwidth digits. As I have explained there, fullwidth digits, namely 0, 1, 2, 3, 4, 5, 6, 7, 8, and 9, are considered obsolete by the Unicode standard, and I don’t think we should use them for main entries. What do you think? — TAKASUGI Shinji (talk) 10:20, 16 March 2012 (UTC)
- I certainly don't see much utility in having these, since they're only a typographical mechanism for displaying the Arabic numerals in double-byte fonts. I would just recommend deleting them, except I know we have other WT pages for single characters. Maybe this is something to bring up in the WT:Beer parlor? -- Eiríkr Útlendi │ Tala við mig 14:56, 16 March 2012 (UTC)
- The ten character pages I listed above are all right, because they explain Unicode information. What Atitarev and I talked about is which is good for the main entry of 十日 written with Arabic numerals, the halfwidth 10日 or the fullwidth 10日. I think we should use the former naturally. — TAKASUGI Shinji (talk) 15:25, 16 March 2012 (UTC)
- Ah, I'm with you now. I agree that half-width (i.e. single-byte) numerals should be used instead of full-width (i.e. double-byte). -- Eiríkr Útlendi │ Tala við mig 19:27, 16 March 2012 (UTC)
- The ten character pages I listed above are all right, because they explain Unicode information. What Atitarev and I talked about is which is good for the main entry of 十日 written with Arabic numerals, the halfwidth 10日 or the fullwidth 10日. I think we should use the former naturally. — TAKASUGI Shinji (talk) 15:25, 16 March 2012 (UTC)
- If display is a concern, as Anatoli suggests, it is possible to put text like "10日" in a template that uses an appropriately monospace font, just as Hebrew text is put into a template so that it can be displayed in an appropriately legible font. - -sche (discuss) 18:37, 16 March 2012 (UTC)
- Talk:CD is related to this issue. A discussion archived on that page reached the decision that Japanese words which are spelt in Latin script should be spelt in "regular" Latin letters, not in fullwidth ones, hence the Japanese word for a compact disc is CD, not CD. - -sche (discuss) 00:58, 28 July 2013 (UTC)
On ja-kanjitab
Please see & comment: Template talk:ja-kanjitab#Links to Translingual. Thanks! --Μετάknowledgediscuss/deeds 12:44, 6 September 2012 (UTC)
Alternative readings header
There are several entries in the category Category:Entries with non-standard headers with the header "Alternative readings". Should the header be changed, or is that header OK (in which case, remove the cleanup template and inform Liliana). - -sche (discuss) 21:49, 29 September 2012 (UTC)
- Fixed びょう and byō, will look at the others. These weird headers appear to be due to Wiktionary:About_Japanese#Non-lemma_forms, an obsolete section that JA editors have (so far as I can tell) been ignoring for a while, but that we also haven't gotten around to fixing. That section recommends use of things that have been obsolete for a while. Also discussing deletion of
{{ja-kanji reading}}
, closely tied to these non-standard headers, at WT:RFDO#Template:ja-kanji_reading. -- Eiríkr Útlendi │ Tala við mig 06:41, 12 February 2013 (UTC)
Romaji entries
Does anyone object to changing romaji entries as per Wiktionary:Beer_parlour/2013/February#Stripping_extra_info_from_Japanese_romaji?
==Japanese== ===Romanization=== {{ja-romaji|hira=りゅうと|kata=リュート}} # {{ja-def|隆と}} stylishly # {{ja-def|リュート}} a lute {{gloss|the musical instrument}}
--Anatoli (обсудить/вклад) 04:16, 27 February 2013 (UTC)
Canned usage note for katakana in science
How would everyone feel about a template like {{kata-bio}} that goes something like this?
As with all names of plants and animals, the katakana form of this term is always preferred in scientific contexts.
Feel free to reword this.
I think a note like this would be appropriate for any entry which is the name of a plant or animal. Since it's the same message each time, it would be nice to have it written in the best possible way, both for substance and for style. As for substance, I'm not sure if medical doctors treat katakana the same way--please add details if you know. There is a cluster of entries from long ago which have a similar usage note which was evidently copy-pasted between them (and its style could have been improved in my opinion.) --Haplology (talk) 03:19, 27 March 2013 (UTC)
- A late reply: we do have
{{U:ja:biology}}
. This generates content like the following:
As with many terms that name organisms, this term is often spelled in katakana, especially in biological contexts (where katakana is customary).
... or ...
As with many terms that name organisms, this term is often spelled in katakana, especially in biological contexts (where katakana is customary), as サンプル.
Ordering etym sections in multi-etym JA entries
Hello anyone watching this page. I've recently found myself working on more JA entries with multiple etym sections, giving rise to the question of how to order the different sections.
- For entries with both kun'yomi and on'yomi, which should come first?
- ⇒ My sense is that maybe on'yomi should come first, since these are often listed first in JA kanji dictionaries. On the other hand, on'yomi are essentially borrowings from Chinese (for the most part), so perhaps the kun'yomi should come first as the native Japanese derivations?
- Among the on'yomi etyms, which should come first?
- ⇒ My thought here is chronological. If we have goon, that comes first, then kan'on, then tōon, then sōon.
- Among the kun'yomi etyms, which should come first?
- ⇒ Here, I'm less certain.
- One instinct is to also list these chronologically, starting from the oldest forms. See the kun'yomi etyms in this version of the 仮名 entry for one example. The oldest reading karina is listed first, then the derived kanna reading, then the derived kana reading.
- But then again, perhaps we should start with the most common reading?
- And if we start with the most common, do we list the rest in order of most-used?
- Or do we list the rest chronologically?
I'm interested in any constructive feedback. Our current state is basically willy-nilly, which is starting to bother me. A more standard policy would be preferable. ‑‑ Eiríkr Útlendi │ Tala við mig 20:08, 6 March 2015 (UTC)
@Atitarev, Haplology, TAKASUGI Shinji, Wyang, エリック・キィ, Tsukuyone, Nibiko, Umbreon126, Kc kennylau Ping!
- (Ping didn't work). In my opinion, most common senses and readings should come first, regardless of yomi. I personally hate the chronological order, which causes problems with translations, for example @truth. --Anatoli T. (обсудить/вклад) 05:47, 13 March 2015 (UTC)
- (Ping absolutely didn't work) I agree with Anatoli. —suzukaze (t・c) 04:28, 28 August 2016 (UTC)
- @Eirikr: you probably know now, but anyway this is important: you must write your signature in order for ping to work (no signature). — TAKASUGI Shinji (talk) 05:04, 11 March 2018 (UTC)
- (Ping absolutely didn't work) I agree with Anatoli. —suzukaze (t・c) 04:28, 28 August 2016 (UTC)
{{kanji}}
has been non-existent for just under 10 years now. I think this page needs an overhaul. —suzukaze (t・c) 10:28, 22 January 2017 (UTC)
User who merits blocking / nuking on sight
Context-dependent sort key
@Atitarev, Eirikr, Haplology, Suzukaze-c: I proposed a new sort key function in meta:2017 Community Wishlist Survey/Wiktionary#Context-dependent sort key, with no reaction yet. It will be easier to apply different sort keys for Japanese and Chinese entries in the same page. I’m not sure how English Wiktionary handles conflicting sort keys, though. What do you guys think? — TAKASUGI Shinji (talk) 09:46, 18 November 2017 (UTC)
- I think it is a good idea. I think it could also cut down on redundant module calls (@Erutuon once mentioned that it was silly how Module:vi-sortkey is executed multiple times for a single entry, IIRC). —suzukaze (t・c) 09:50, 18 November 2017 (UTC)
- Some time ago, when I was considering posting a request on Phabricator for my sorting idea, I encountered a request that was already up and someone was working on it. I'll have to dig it up. It was similar to this ("Allow collation to be specified per category"). Aha, it's here ("Support collation by a certain locale (sorting order of characters)"). Has a lot of technical discussion, which I skimmed at the time.
- Among the ideas I recall was having multiple sortkeys connected to languages on each page, and each category has some magic word that determines which language's sortkey it should use, or which language's sort order. I think there was something about using functions created by Unicode to create language-specific sortkeys or to implement sort orders in some fashion. I might be misremembering stuff. The task is now closed, whatever that means.
- Anyway, I think the idea of tying categories to specific languages in the server and using multiple sortkeys or server-implemented sort orders sounds like it would solve the problem of CJKV on a single page. You could use a
{{DEFAULTSORT:}}
type magic word, but it would specify sorting for a specific language's categories, or leave it to the server. I wonder if they're working on that. — Eru·tuon 10:27, 18 November 2017 (UTC)- @TAKASUGI Shinji: I think it's a good idea. Please note that Chinese entries are no longer sorted by pinyin (which is only relevant to Mandarin, anyway) but by radicals. User:Wyang may tell you more about how sorting is done in Module:zh and Module:zh/data/sortkey. I believe language-specific sorting can be done inside Wiktionary by headword modules, which is already the case for Chinese and various other languages, which require sorting different from default but I'm not a Lua guru. --Anatoli T. (обсудить/вклад) 10:58, 18 November 2017 (UTC)
- Chinese sorting is now done automatically by Module:zh-sortkey. Japanese could definitely benefit from a sorting cleanup - it's silly to enter the same sortkey multiple times. If the SECTIONSORT functionality (which gives a sortkey for subsequent text until the next SECTIONSORT is encountered) were to be actualised, it may be technically feasible for the SECTIONSORT keys to be generated automatically, every time the
{{ja-pron}}
template is called, thus leaving no trace of the sortkey in the entry code at all and making the sorting even more intelligent. Wyang (talk) 12:25, 18 November 2017 (UTC)
- Chinese sorting is now done automatically by Module:zh-sortkey. Japanese could definitely benefit from a sorting cleanup - it's silly to enter the same sortkey multiple times. If the SECTIONSORT functionality (which gives a sortkey for subsequent text until the next SECTIONSORT is encountered) were to be actualised, it may be technically feasible for the SECTIONSORT keys to be generated automatically, every time the
- @TAKASUGI Shinji: I think it's a good idea. Please note that Chinese entries are no longer sorted by pinyin (which is only relevant to Mandarin, anyway) but by radicals. User:Wyang may tell you more about how sorting is done in Module:zh and Module:zh/data/sortkey. I believe language-specific sorting can be done inside Wiktionary by headword modules, which is already the case for Chinese and various other languages, which require sorting different from default but I'm not a Lua guru. --Anatoli T. (обсудить/вклад) 10:58, 18 November 2017 (UTC)
- I read the post finally. I don't think sortkeys should be tied to sections. For instance, the Chinese section often contains categories for many Chinese varieties in addition to (written) Chinese. Each variety needs a different sortkey, and usually there are multiple categories for a single Chinese variety: Mandarin lemmas and Mandarin nouns in 犬 (quǎn), for instance. A SECTIONSORT would only be able to give one sortkey. So, for example, if SECTIONSORT were the radical-stroke sortkey used in Chinese categories, the Chinese categories would not need manual sortkeys, but the categories for Mandarin, Wu, Cantonese, Min Nan, and so on would.
- What would work well for Chinese sections is to tie sortkeys and categories to language codes. That is, each language in a page gets a sortkey, and the category looks for the sortkey of a particular language and uses it. (Sortkeys could still be specified manually for an individual category, of course.) Then, the Mandarin categories could use pinyin sortkeys, Chinese categories use radical-stroke sortkeys, and so on, and each of these could be specified only once on the page. This would require two magic words: a language-specific DEFAULTSORT-type magic word in the entry, and a magic word in the category page (which could be added automatically by Module:category tree) specifying which language's sortkey the category should use, if it is present. So to make up names for the magic words, the entry could contain
{{LANGSORT:cmn|pinyin}}
and{{LANGSORT:zh|radical stroke}}
, and then the Mandarin categories could contain{{SORTLANG:cmn}}
and the Chinese categories{{SORTLANG:zh}}
to tell the server to use thecmn
or thezh
sortkey respectively. — Eru·tuon 22:54, 18 November 2017 (UTC)- The Chinese situation is kind of abnormal though. I think that Chinese+radical could be the section default, and Mandarin/Cantonese, etc. could have classic manual sortkeys in the
[[Category:_______ lemmas|_______]]
style. —suzukaze (t・c) 23:05, 18 November 2017 (UTC)- Well, yes. That's what
{{zh-pron}}
currently does. My more theoretical concern is that really sortkeys pertain to categories, not to sections. Sections don't have sortkeys; they have a language to which most categories used in the section pertain. But more practically, there are also categories that are not specific to the language and perhaps shouldn't use a language-specific sortkey (Tea room). — Eru·tuon 00:21, 19 November 2017 (UTC)
- Well, yes. That's what
- The Chinese situation is kind of abnormal though. I think that Chinese+radical could be the section default, and Mandarin/Cantonese, etc. could have classic manual sortkeys in the
- Many years ago, I posted somewhere at a Meta site regarding the unusability of sortkey functionality for Japanese entries, inasmuch as 1) a single Japanese entry spelling often needs multiple sortings based on different phonetic realizations, and 2) at the time (and I suspect still) the Mediawiki software handled multiple sortkeys for a single category specification on a single page by ignoring all but the last sortkey. My post (ah, here it is) got no reply at all. That was 5.5 years ago. ご参考まで. ‑‑ Eiríkr Útlendi │Tala við mig 04:16, 20 November 2017 (UTC)
- Thank you. Meta is not a popular place for discussion. As you point out, it is good if multiple sort keys are possible for Japanese (ex. fr:Catégorie:Homographes non homophones en japonais). Is there any language that needs multiple sort keys other than Japanese? Some Chinese characters can be categorized in more than one radical, depending on dictionaries. — TAKASUGI Shinji (talk) 00:12, 22 November 2017 (UTC)
Middle Japanese
Which language code do I give to words marked as "Middle Japanese" in literature? Crom daba (talk) 18:49, 15 December 2017 (UTC)
- @Crom daba: Unfortunately, no such language code exists, at least in the ISO standard. There's OJP for Old Japanese, usually regarded as ending around 800 CE or so, and then there's JA for everything after that -- which is an awfully big bucket. I'm open to the creation of such a code. Middle Japanese, sometimes a.k.a. Classical Japanese, is different enough from the modern language to warrant different treatment here -- differing usage, conjugation patterns, etc. That said, any such initiative should probably get hashed out in the Beer Parlor first. :) ‑‑ Eiríkr Útlendi │Tala við mig 00:14, 16 December 2017 (UTC)
- I just want to know the proper way (the wiktionary way) to mention the Japanese words given here (if you could find it in its original script I would be very thankful), I'm not suggesting to introduce Middle Japanese we don't have a need for it. Crom daba (talk) 00:26, 16 December 2017 (UTC)
Conjugation table
I always find those Japanese conjugation tables proscriptive and dated. For example, in a situation where English speakers would use a bare imperative in a friendly manner, Japanese speakers will use a non-polite requestive (して, 食べて, etc.) or a non-polite “advisory” imperative (しな, 食べな, etc.) but you don’t see them anywhere. Similarly, る before n very often becomes ん such as 分かんない but you see only 分からない.
I have created a list of forms we should have in a conjugation table: Wiktionary talk:About Japanese/Conjugation. What do you think? — TAKASUGI Shinji (talk) 12:58, 10 March 2018 (UTC)
- I completely support this. —suzukaze (t・c) 05:08, 11 March 2018 (UTC)
- Thanks, Suzukaze-c. @Atitarev, Eirikr, Haplology, Wyang, エリック・キィ: what do you think of modernizing Japanese conjugation tables? The list of forms must reflect community opinions as much as possible. — TAKASUGI Shinji (talk) 15:23, 13 March 2018 (UTC)
- @TAKASUGI Shinji I generally support your idea, but we can put a qualifier denoting dated, colloquial, etc. in each cell. I have felt a gap between a model Japanese and what I, as one of the native Japanese speakers, actually speak, hear and see today!--エリック・キィ (talk) 15:52, 13 March 2018 (UTC)
- @TAKASUGI Shinji I agree with Eryk: I generally support this idea, with some adjustments -- more contextual data as above, and some terminology tweaks. For instance, the 連用形 in the sample tables is listed as the "Conjunctive", whereas I learned the 連用形 as the "continuative", and the て-form as the "conjunctive". I also don't understand how this will look in the end -- is the sample page intended for inclusion as-is on the WT:AJA page? Or will the relevant rows be extracted and recombined in a conjugation table specific to each verb form (such as, all the rows for 書く will be recombined into a single table, and that will go on the 書く page)? ‑‑ Eiríkr Útlendi │Tala við mig 18:26, 13 March 2018 (UTC)
- Wow, great job, Shinji. ぬ and ねえ forms are included. Is it worth to include ず forms as well? Anatoli T. (обсудить/вклад) 20:50, 13 March 2018 (UTC)
- @Eirikr: That was just an error, and I fixed it. Everyone can modify Wiktionary talk:About Japanese/Conjugation freely. I’d like to show the relevant forms in each verb entry and the entire table in Wiktionary:About Japanese and Appendix:Japanese verbs. Layout is to be discussed.
- @Atitarev ず is really archaic as a sentence-final form, but ずに is still common in literary Japanese. We can show both or only the latter. — TAKASUGI Shinji (talk) 00:00, 14 March 2018 (UTC)
- I think both show up often enough to merit inclusion. I also bump into -ざる endings in fixed phrases, such as -ざるを得ない, that kind of thing. ‑‑ Eiríkr Útlendi │Tala við mig 00:11, 14 March 2018 (UTC)
- Support. Wyang (talk) 01:34, 14 March 2018 (UTC)
- Yes, support and include both ず and ずに forms with proper labels. Is ねえ really "vulgar" or sloppy/dialectal or something else? :) --Anatoli T. (обсудить/вклад) 02:06, 14 March 2018 (UTC)
- There are many conservative people who don’t like the adjective-final /eː/ especially if the speaker is a woman. [2] — TAKASUGI Shinji (talk) 23:39, 14 March 2018 (UTC)
- I agree. It's interesting that some mispronunciations (might be a wrong word here) in Japanese make words sound impolite or even vulgar. てめぇ (temē) must sound much worse than てまえ (temae). --Anatoli T. (обсудить/вклад) 23:56, 14 March 2018 (UTC)
- There are many conservative people who don’t like the adjective-final /eː/ especially if the speaker is a woman. [2] — TAKASUGI Shinji (talk) 23:39, 14 March 2018 (UTC)
- (I really, really like this table. I need to express my support a second time. —Suzukaze-c◇◇ 09:26, 25 January 2019 (UTC))
@TAKASUGI Shinji, Atitarev, Suzukaze-c What about giving both 学校文法 and 日本語教育文法 info, like this?
- Traditional description (Japanese school grammar)
Conjugation type 語幹
Stem語尾
Ending五段活用
Five-grade conjugationはし (hashi) る (ru)
- ???
Conjugation type 語幹
Stem語尾
EndingグループⅠ(子音語幹動詞・ウ動詞)
Group I (consonant stem, -u verb)/hasir-/ (hashir-) /-u/ (-u)
--Dine2016 (talk) 15:15, 29 September 2019 (UTC)
- +1; I have also tried to do this in previous attempts to make a new conjugation template (if I understand correctly). —Suzukaze-c◇◇ 17:33, 29 September 2019 (UTC)
- That’s good but we don’t need Japanese words like 五段活用 and グループ Ⅰ. — TAKASUGI Shinji (talk) 23:00, 29 September 2019 (UTC)
- Agreed that we probably don't need the 五段活用 and グループ I Japanese labels in the table -- this is intended for an English-reading audience, after all. :) Also, my impression is that the "group" nomenclature for talking about Japanese verb conjugation patterns is mostly used in English-language materials, so having that in Japanese seems a bit odd.
- I'm reminded that Ainu uses special smaller kana to represent final consonants. Using these, for example, hashir- could be spelled as ハシㇼ- in kana. @Shinji, I'm curious if you (or anyone else) have encountered such kana spellings for Japanese? ‑‑ Eiríkr Útlendi │Tala við mig 19:55, 4 October 2019 (UTC)
- The small kana are specifically for the Ainu language. — TAKASUGI Shinji (talk) 08:24, 5 October 2019 (UTC)
- That’s good but we don’t need Japanese words like 五段活用 and グループ Ⅰ. — TAKASUGI Shinji (talk) 23:00, 29 September 2019 (UTC)
- Hello, just wanting to share the related proposal: Wiktionary:Beer parlour/2024/November#Proposal: Adopting Inflectional Tables Based on Modern Morphological Views for Japanese
- Looking forward to your thoughts! Σ>―(〃°ω°〃)♡→L.C.D.(-{に〇〇する}-) 21:30, 19 November 2024 (UTC)
字音語素
Some monolingual Japanese dictionaries include the so-called 字音語素 or 字音語の造語成分 along with words. The current practice on the wiki seems to list the definitions and compounds of single kanji in the "Kanji" section, but there seems to be currently no standard way to indicate which pronunciations apply to which definitions for cases like 悪 (アク・オ) or 楽 (ラク・ガク). Also, when there is only one etymology, the practice of putting "Kanji" and "Alternative forms", "Pronunciation", "Noun" headers on the same L3 level seems a little odd to me. Any ideas on how 字音語素 should be presented? --Dine2016 (talk) 04:03, 27 April 2018 (UTC)
- It's been solved - use the Affix POS. --Dine2016 (talk) 16:42, 7 October 2018 (UTC)
Japanese grammar terminology
Do you think we need to unify the terminology of Japanese grammar? For example, 未然形 is translated as “irrealis or incomplete form” in running text but “imperfective” in the verb conjugation table. And 五段活用 is “godan” in the verb headword-line but “type 1” in the category name (though it's “Group I” in textbooks like Minna no Nihongo). Personally, I prefer terms like “consonant-stem”, “vowel-stem”, “infinitive” or “gerund” over traditional grammar (aka Hashimoto grammar or school grammar) terms like “five-grade”, “monograde”, “continuative form” or “te form”, but anything that can be settled on is OK.
I suggest creating a template called {{ja-term}}
and use it for grammatical terms. For example, {{ja-term|infinitive}}
could display “infinitive”.
(Notifying Eirikr, Wyang, TAKASUGI Shinji, Nibiko, Atitarev, Suzukaze-c, Poketalker, Cnilep, Britannic124, Fumiko Take, Nardog, Marlin Setia1, AstroVulpes, Tsukuyone, Aogaeru4): --Dine2016 (talk) 16:42, 7 October 2018 (UTC)
- (I'm glad that I have two hands.) On the one hand, I absolutely agree that some consistent set of labels is desirable. On the other hand, I don't think there is a consistent and widely adopted set of labels, either across Japanese sources or English-medium Japanese grammars. I guess my bottom line is that I would be happy to have a set of labels for use in Wiktionary. Cnilep (talk) 23:41, 7 October 2018 (UTC)
- From what I've seen, the larger part of the inconsistency is due to so many different writers using their own set of labels. This is likely due to the way that Japanese grammar does not conform well to English-language labels. For instance, I quite dislike either label gerund or infinitive, presumably mentioned above in reference to the 連用形 (ren'yōkei), as both English-language labels point to grammatical constructs that don't quite exist in Japanese, while also failing to express the function of the actual form in Japanese. The label "gerund" in English would fit for utterances such as "I like walking", whereas the same statement in Japanese -- 歩くのが好き -- doesn't use the ren'yōkei at all, but rather the dictionary or plain form (or what have you). Meanwhile, although the label "infinitive" would fit a statement such as "I like to walk", which then also broadly fits the Japanese 歩くのが好き and uses the dictionary or plain form in both languages, the Japanese again doesn't use the ren'yōkei.
- There's also the problem of historical context. Modern grammars describing classical and older Japanese generally call the -e- verb stems the 已然形 (izenkei), often glossed as realis in contrast to the irrealis or 未然形 (mizenkei). However, the modern language uses this form differently, leading to a change in labels even in Japanese, where the -e- verb stems are instead called the 仮定形 (kateikei).
- I mostly agree with Cnilep [except that I have seldom run into divergent labels in Japanese-language grammars, aside from historical terms such as 既然言 (kizengen) or 将然言 (shōzengen)]. We should coordinate on a standardized set of labels, and also make sure to build in some way for users accustomed to other common labels to find out how our labels correspond to theirs. ‑‑ Eiríkr Útlendi │Tala við mig 19:33, 8 October 2018 (UTC)
- @Eirikr: Thanks for your reply. I currently have copies of three Japanese grammars by western linguists – A Reference Grammar of Japanese by Samuel E. Martin, A History of the Japanese Language by Bjarke Frellesvig, and A Descriptive Grammar of Early Old Japanese Prose by John R. Bentley – all of which use the labels “infinitive” and “gerund”. I have no objections to calling them “continuative” and “conjunctive”, though. What do you think of the other names of modern verb forms in Frellesvig (2010)?
Finite | |
---|---|
Nonpast | kaku |
Past | kaita |
Volitional | kakoo |
Imperative | kake |
(Past conjectural) | (kaitaroo) |
Non-finite | |
Infinitive | kaki |
Gerund | kaite |
Conditional-1 | kaitara |
Representative | kaitari |
Conditional-2 | kaitewa |
Provisional | kakeba |
Concessive | kaitemo |
Note: I agree that the conditional-2 (kaitewa) and the concessive (kaitemo) should be removed and treated as kaite + wa/mo, which is also how Martin (1975) treats them. By the way, what about renaming the conditional(-1) and the provisional as something like “-tara conditional” and “-ba conditional”? |
- One reason I don't like the traditional description of Japanese grammar is its paradigm of verbs, which is unsuitable for Modern Japanese. It unnecessarily keeps the distinction between 終止形 and 連体形, while failing to point out the present/future tense they have acquired due to the rise of past -ta (from stative -tar-). The variant of 未然形 which ends in オ段 is also a back-formation, an artifact of writing, rather than a true stem. Another disadvantage of traditional grammar is the segmentation of verbs like 読む and 食べる into よ・む and た・べる; it is better to posit stems such as yom- and tabe-, respectively. (Can 二段 verbs like 尋ぬ be described as varying between tadune- and tadunu-?) For this reason, I still prefer “linguistic” terms such as “(regular) consonant-stem” over “five-grade”, which is based on a clumsy analysis of verb forms hindered by the moraic orthography. Another reason I don't like traditional Japanese grammar is its classification of 付属語. For example, it lumps から・と and て・ば both as 接続助詞, while romanization will show their difference: yonda kara and yomu to, vs yonde and yomeba. --Dine2016 (talk) 16:11, 9 October 2018 (UTC)
@Eirikr I found this paper: 日本語教育の文法体系と寺村秀夫 : 活用の場合, which states that:
またバーナード・ブロックの活用表は次の如くである。 [...] Hypothetical (Provisional) (kak-eba, mi-reba など) Hypothetical (Conditional) (kaitara, mi-tara など) Participial (Infinitive) (kak-i, mi-ø など) Participial (Gerund) (kaite, mi-te など) Participial (Alternative) (kaitari, mi-tari など)
It seems that the terms "infinitive" and "gerund" are indeed in use, and the distinction between "provisional" and "conditional" is established, in English scholarship. On the other hand, the same paper states that
なお、寺村自身が編纂にかかわった Basic Japanese(大阪外国語大学、1967)においては、Conjunctive form(連用形)が活用形として導入されており、[...]
This is also the terminology used in Bentley (2001), in which the six "forms" in traditional grammar are listed as imperfect, conjunctive, conclusive, attributive, evidential, and imperative. (From a western point of view, it's incorrect to call them "forms": the mizenkei is a stem, and the ren'yōkei is either an inflected form or a stem depending on how it is used. Also 書かない should be considered one words instead of two, as shown by romanization.) --Dine2016 (talk) 03:35, 12 October 2018 (UTC)
- What do you mean by evidential? In linguistics evidential is a totally different thing (SIL). The six traditional “forms” (mizen, ren’yō, shūshi, rentai, katei/izen, meirei) can’t explain modern Japanese morphology well and it is misleading to treat mizen and katei as real forms. Traditional Japanese grammarians tried to explain Ancient Japanese and Modern Japanese in a unified frame because of diglossia at that time, but now let’s just make it clear that they are two different languages. — TAKASUGI Shinji (talk) 22:53, 12 October 2018 (UTC)
- @TAKASUGI Shinji: Thanks for your reply. “Evidential” is the label for the izenkei in Bentley (2001):
The traditional label for the evidential literally means ‘already thus’ (izenkei), or what can be translated as the perfective (the state of completion). This label is misleading, and since this conjugation usually implies evidence of a condition, a provision, or a concession (Martin 1988:229, 556-7, 785), I have chosen the label ‘evidential’.
- A History of the Japanese Language by Bjarke Frellesvig uses the label “exclamatory” for the same form, and A Reference Grammar of Japanese by Samuel E. Martin uses “literary concessive” for the same form optionally suffixed with -do.
- By the way, I totally agree that the traditional analysis of Japanese grammar (aka Hashimoto grammar or school grammar) is unsuitable for Modern Japanese. That is why I suggest a clean break from traditional grammar terms like “irrealis”, “continuative” and “conjunctive”. --Dine2016 (talk) 00:48, 13 October 2018 (UTC)
- In a discussion above (#Conjugation table) I have listed up almost all the forms in Modern Japanese in Wiktionary talk:About Japanese/Conjugation. They need to be properly named. — TAKASUGI Shinji (talk) 01:13, 13 October 2018 (UTC)
Old Japanese
Please join in the discussion at Wiktionary talk:About Old Japanese. —Μετάknowledgediscuss/deeds 05:27, 6 February 2019 (UTC)
Japanese linking template
- Discussion moved from User talk:Suzukaze-c.
Hi. I'm considering making a Japanese counterpart of {{zh-l}}
. Which format do you think would be better?
or
or
--Dine2016 (talk) 12:37, 8 February 2019 (UTC)
- I personally like the third one. It's also similar to
{{ko-l}}
and{{vi-l}}
. —Suzukaze-c◇◇ 16:01, 8 February 2019 (UTC)- Thanks. Given that Eirikr has been inactive for a while, what additional parameters do you think will be helpful besides
|tr=
,|gloss=
and|lit=
and|note=
? I have once seen the elaborate format 青 (ao, historically awo) in etymology sections, which suggests the new format あお (青, ao, historically あを, awo), but I'm not sure whether such a format is desirable. Looking at the English etymologies on the wiki, it seems that Old English, Middle English and modern English are never lumped together. A compound formed in Old English is treated as “From Middle English ab, from Old English αβ, from α + β. Surface analysis A + B.” and never “From A (historically α) + B (historically β)” or anachronistically “From A + B”. Given that Old, Middle and modern Japanese have different orthography, I think Japanese should be handled in the same way. --Dine2016 (talk) 04:06, 9 February 2019 (UTC)- I agree with your ideas. —Suzukaze-c◇◇ 03:04, 13 February 2019 (UTC)
- Same here. I agree that orthography for Middle and Old Japanese has to be separated. Take a look at Category:Middle Vietnamese lemmas. By the way, good job on the new
{{ja-see-kango}}
. KevinUp (talk) 12:25, 13 February 2019 (UTC)
- Same here. I agree that orthography for Middle and Old Japanese has to be separated. Take a look at Category:Middle Vietnamese lemmas. By the way, good job on the new
- I agree with your ideas. —Suzukaze-c◇◇ 03:04, 13 February 2019 (UTC)
- Thanks. Given that Eirikr has been inactive for a while, what additional parameters do you think will be helpful besides
- @Dine2016 I prefer the third one as well. So will this be implemented to
{{ja-l}}
or another separate template? User:Poketalker has been using{{m|ja|漢字|tr=kanji}}
for some time so maybe we can have{{ja-m|漢字|かんじ|[[gloss]]}}
instead. Not sure if this is going to break{{ja-l}}
because so many combinations are possible for that template. Maybe the gloss can be entered using|gloss=
in{{ja-l}}
? KevinUp (talk) 08:01, 13 February 2019 (UTC)- @KevinUp: Thanks for the reply. My suggestion is to extend
{{ja-l}}
in a way that does not break existing usages, and make a{{ja-lx}}
which works like{{zh-l}}
. The former template only formats its arguments and generates nothing else, while the latter template can support auto-completion such as{{ja-lx|太陽}}
→ 太陽 (たいよう, taiyō). The latter would be very tricky to implement soI'll probably only do the formerI suggest doing the former first. --Dine2016 (talk) 08:48, 13 February 2019 (UTC)- @Dine2016: Sounds good to me. Some possible combinations for the new format that I could think of (to verify its output):
- @KevinUp: Thanks for the reply. My suggestion is to extend
- I personally like the third one. It's also similar to
- Perhaps this conversation ought to be moved to Template talk:ja-l. KevinUp (talk) 12:25, 13 February 2019 (UTC)
- @Dine2016, KevinUp: I wrote Module:User:Suzukaze-c/jpx-links instead of doing other important things —Suzukaze-c◇◇ 05:14, 21 February 2019 (UTC)
- Good job on the template. For the CSS part, I hope we can make the Japanese script in running text a little bigger (but not too big like the headwords), and use Meiryo instead of MS PGothic on Windows, as the former is optimized for ClearType while the latter embeds bitmaps and does not look good. I remember there is a way to reduce the vertical space of Meiryo, which is used on some Vocaloid-related wikis on Wikia.
- Unfortunately, I've lost interest in Japanese once I realized there was no way to eliminate all duplication of information in the source code of Japanese entries. The two “final bosses” which made it impossible, I think, would be: (1) the repetition of the reading in headword templates and (2) the repetition of the inflection type in the headword template and the inflection table. Chinese was able to eliminate the major repetitions because Unified Chinese moved the romanizations to the pronunciation template and there was no inflection. Japanese was not so lucky (although we can follow the French Wiktionary's handling of inflection to eliminate the second problem). --Dine2016 (talk) 11:02, 21 February 2019 (UTC)
- re: CSS: See also User:Suzukaze-c/sandbox#2, I guess. Maybe I should ask for interface administrator rights? Perhaps we could remove the inline CSS from
{{ja-r}}
and{{ja-usex}}
as well. —Suzukaze-c◇◇ 17:44, 21 February 2019 (UTC) - re: vocaloid wikia: probably me lmao which one
- re: repetition: I have wondered if allowing global variables (currently not allowed) would help with this sort of problem. —Suzukaze-c◇◇ 17:28, 21 February 2019 (UTC)
- re: vocaloid wikia: ah, sorry, it was an ugly hack. --Dine2016 (talk) 07:55, 26 February 2019 (UTC)
- yeah but it's my ugly hack ;) —Suzukaze-c◇◇ 08:07, 26 February 2019 (UTC)
- ah sorry again (*/ω\*) seldom 逛英語ACG圈
- I wonder if Unified Japanese could justify omitting (1) the romanizations in headword templates and (2) the inflection tables. --Dine2016 (talk) 09:51, 26 February 2019 (UTC)
- it's totally an ugly hack, i'm just teasing you (´ε` )
- Your idea reminds me that Module:th-headword reads the content of an entry's own page. Maybe that could be a source of inspiration.
- Also, what do you think of User:Suzukaze-c/p/ja#Japanese {entry format reform} as my idea of "unified Japanese"? (There's certainly a lot of redundancy regarding definitions and such, but I don't think things would be any better if we split
ja
.) —Suzukaze-c◇◇ 04:32, 27 February 2019 (UTC)- @Suzukaze-c I just discovered an example of “same kanji, same modern kana, different historical kana”: 法律 (ほうりつ < はふりつ, hōritsu < fafuritu, ほうりつ < ほふりつ, hōritsu < fofuritu). Hope it's useful. --Dine2016 (talk) 16:39, 4 March 2019 (UTC)
- yeah but it's my ugly hack ;) —Suzukaze-c◇◇ 08:07, 26 February 2019 (UTC)
- re: vocaloid wikia: ah, sorry, it was an ugly hack. --Dine2016 (talk) 07:55, 26 February 2019 (UTC)
- re: CSS: See also User:Suzukaze-c/sandbox#2, I guess. Maybe I should ask for interface administrator rights? Perhaps we could remove the inline CSS from
- @Suzukaze-c Does your template support the alternative format
{{jpx-m|変化:へんか}}
in addition to{{jpx-m|変化|へんか}}
? If this is supported, we can- code templates like
{{ja-syn}}
and{{ja-synonym}}
in the simplest way ({{jpx-m|{{{1}}}}}
), and - use them like
{{ja-syn|変わる:かわる|変化:へんか|チェンジ}}
and{{ja-synonym|変化:へんか|[[change]]}}
- code templates like
- Furthermore, if automatical fetching of the reading is implemented so that
{{ja-m|変化}}
yields 変化 (へんか, henka), we can further- simplify the usage to
{{ja-syn|変わる|変化|チェンジ}}
and{{ja-synonym|変化|change}}
- simplify the usage to
- to get the same results. --Dine2016 (talk) 05:39, 6 March 2019 (UTC)
re: unified Japanese
I think Etymologies 1–3 could be grouped like
- Etymology 1
- Pronunciation
- Verb
ける (transitive) // no need of romaji or conjugation type as it changes over time, one of the advantages of unified Japanese :)
- …
- Conjugation
Inflected forms of ける [godan] in Modern Japanese | ||
---|---|---|
Inflection | Hiragana | Romanization |
Stems | ||
Basic stem | ker- | |
a- stem (未然形*) | けら | kera- |
onbin stem (音便形) | けっ | keQ- |
e- stem (仮定形*) | けれ | kere- |
Finite forms | ||
Nonpast (終止形*/連体形*/基本形/ル形) | ける | keru |
Past (過去形/タ形) | けった | ketta |
Volitional (意向形/推量形) | けろう | kerō |
Imperative (命令形*) | けれ | kere |
Non-finite forms | ||
Infinitive (連用形*) | けり | keri |
Gerund (て形) | けって | kette |
Conditional | けったら | kettara |
Representative | けったり | kettari |
Provisional (ば形/条件形) | ければ | kereba |
Key constructions | ||
Passive (受身) | けられる (kerareru, stem ker-are-, ichidan conjugation) | |
… | ||
文法体系はおおむね Frellesvig (2010) に従う * 学校文法における活用形 |
Inflected forms of ける [shimo ichidan] in Late Middle Japanese | |
---|---|
Inflection | Phonemic |
Stems | |
Basic stem (未然形*/連用形*) | ke- |
e- stem (已然形*) | kere- |
Finite forms | |
Nonpast (終止形*/連体形*) | keru |
Past | keta |
Intentional | kyoozuru |
Volitional | kyoo |
Past conjectural | ketarɔɔ |
Imperative (命令形*) | kei ~ keyo |
Non-finite forms | |
Infinitive | ke |
Gerund | kete |
Conditional | keba (~ ketewa) |
Provisional | kereba |
Concessive | keredomo ~ ketemo |
Past conditional | ketara(ba) |
Past provisional | ketareba |
Past concessive | ketaredomo |
Intentional provisional | kyoozureba |
Intentional concessive | kyoozuredomo |
Key constructions | |
… | |
文法体系はおおむね Frellesvig (2010) に従う * 学校文法における活用形 |
Inflected forms of ける [shimo ichidan] in Early Middle Japanese | ||
---|---|---|
… |
--Dine2016 (talk) 09:47, 27 February 2019 (UTC)
- Intriguing. I like how adding a new conjugation table for older stages is reminiscent of our (very convenient) current approach for Chinese (adding pronunciation to
{{zh-pron}}
). Would we still use romaji for historical forms in etymology sections? —Suzukaze-c◇◇ 06:05, 1 March 2019 (UTC)- Um… if you're citing a word in a specific stage of the language, a transcription appropriate to the stage can be added. For example, the topic marker of Old Japanese is 波 (pa), the one of Early Middle Japanese is は (fa), and the ones of Late Middle Japanese and Modern Japanese is は (wa). On the other hand, if the stage is unknown, you can just use the kana spelling and refer to the topic marker as は, which is similar to how you cite Chinese characters rather than words using
{{zh-l|*...}}
. In the latter situation, you sometimes need to choose between old and new orthography (あを vs あお), or classical and modern forms (あり vs ある), but printed dictionaries have the same problem. - For synonym sections, maybe we can group the words by stage, effectively constituting a historical thesaurus like the 三省堂 現代語古語類語辞典 and the Historical Thesaurus of English? --Dine2016 (talk) 10:46, 1 March 2019 (UTC)
- Um… if you're citing a word in a specific stage of the language, a transcription appropriate to the stage can be added. For example, the topic marker of Old Japanese is 波 (pa), the one of Early Middle Japanese is は (fa), and the ones of Late Middle Japanese and Modern Japanese is は (wa). On the other hand, if the stage is unknown, you can just use the kana spelling and refer to the topic marker as は, which is similar to how you cite Chinese characters rather than words using
- By developed from Dine2016's idea I made blueprint of the new template.
- In this draft, phonetic Japanese texts are spelled in Katakana following the academic custom about Japanese phonology; there are downsteps as well as upsteps for the some dialects and old variations; difference between the nasal and stop consonants on ガ行 exists as the difference between the dialects, and /ŋ/ is indicated in the 半濁点 (゜) over the letters; as the examples of romanised spellings of Middle Japanese, Nippo Jisho is given. Feel free to use this as a tentative plan.--荒巻モロゾフ (talk) 10:38, 10 April 2019 (UTC)
- @荒巻モロゾフ: Very nice. How do you think data should be input? (what would the wikitext/template syntax look like?) —Suzukaze-c◇◇ 05:16, 12 April 2019 (UTC)
- It's desirable to input IPA directly, because diversity of the Japanese dialectal phonology is too large to output automatically. However regarding the phonetic katakana with accent annotation, it's hard to write html codes directly. From this paper[3], I picked up a symbolic system that can be used for conversion by the template.
- @荒巻モロゾフ: Very nice. How do you think data should be input? (what would the wikitext/template syntax look like?) —Suzukaze-c◇◇ 05:16, 12 April 2019 (UTC)
regend example [ Upstep between the moras ア[ア
→ アア[[ Rising in the next mora [[ア
→ ア] Downstep between the moras ア]ア
→ アア]] Falling in the previous mora ア]]
→ ア! Loose descent between the moras without downstep ア!ア
→ アア% Loose ascent between the moras without upstep ア%ア
→ アア
- Example:
- /ꜜhàrû ꜜnàtté núkúnátté kíꜜtàkàrà ꜜkjǒːꜜwà ꜜùmí íkóká/
]ハ%ル]] ]ナッ%テ %ヌクナッテ %キ]タカラ ]キョ%ー]ワ ]ウ%ミ %イコカ
- → ハル ナッテ ヌクナッテ キタカラ キョーワ ウミ イコカ
- 春なって温なって来たから今日は海行こか。
- Haru natte nuku natte kita kara kyō wa umi iko ka.
- As it becames spring and is getting warmer, let's go to the sea today.
- Could you make a conversion program from this?--荒巻モロゾフ (talk) 10:21, 23 April 2019 (UTC)
Sino-Japanese etymologies
There're currently several ways to express Sino-Japanese etymologies:
- From Middle Chinese XXX (YYY, “ZZZ”)
- From
{{etyl|zhx|ja}}
XXX - From Chinese XXX
This may be unified to the general form(s), served via a single template:
- Sino-Japanese word from XXX (“ZZZ”)
- Sino-Japanese word from XXX (YYY, “ZZZ”) (is Middle Chinese transliteration required?)
- Sino-Japanese word from XXX, readed as kan'on
- (Japanese-coined) Sino-Japanese word from AAA and BBB
--115.27.198.88 22:01, 8 April 2019 (UTC)
- Some thoughts.
- Most on'yomi terms in Japanese are old borrowings from Middle Chinese, hence the
ltc
language code. - Some terms were borrowed from modern Chinese.
- → Ideally, these should explicitly state which variety they came from -- Mandarin (
cmn
), Cantonese (yue
), etc. - That said, some terms only borrowed the spelling from Chinese and use the Japanese (though Chinese-derived) on'yomi. These might account for some of the
zh
instances.
- → Ideally, these should explicitly state which variety they came from -- Mandarin (
- Some entries haven't been updated in many years, which is where most (perhaps all?) of the vague
zhx
comes from.
- Most on'yomi terms in Japanese are old borrowings from Middle Chinese, hence the
- Personally, I think it's useful to include the Middle Chinese reading to show where things started from, and then also show the phonological changes after the term arrived in Japanese. For instance, 長#Etymology_1 shows how this started as Middle Chinese /ʈɨɐŋX/, becoming in turn Japanese /tjau/ → /t͡ɕjau/ → /t͡ɕɔː/ → /t͡ɕoː/. This kind of phonological development is part of the history of the term and can be quite interesting, showing how the sounds of Chinese and Japanese have diverged over time.
- There are also not a few Japanese terms that originated in Middle Chinese with one sense, and then got repurposed during the Meiji period with a different or altered sense. Such terms include 世界, 社会, 自由, etc. I'm not sure if that class of terms would fit into the proposed template? ‑‑ Eiríkr Útlendi │Tala við mig 00:43, 9 April 2019 (UTC)
- Some thoughts.
- @Eirikr: Yes, thanks, please take a look at 再見, which has two etymologies, both Chinese- Middle Chinese and modern Mandarin. The modern Mandarin might need citations :) Thanks to User:Justinrleung for improving. --Anatoli T. (обсудить/вклад) 01:15, 9 April 2019 (UTC)
- Note this section only concerns words using regular on’yomi, not including things like 再見#Etymology_2.--115.27.198.88 12:46, 9 April 2019 (UTC)
- @Eirikr: Yes, thanks, please take a look at 再見, which has two etymologies, both Chinese- Middle Chinese and modern Mandarin. The modern Mandarin might need citations :) Thanks to User:Justinrleung for improving. --Anatoli T. (обсудить/вклад) 01:15, 9 April 2019 (UTC)
The template {{etyl}}
is being phased out. I have no preference as regards the discussion above. Cnilep (talk) 06:02, 23 May 2020 (UTC)
Why is there a pronoun header?
What's the point in having it? They act fully like any other noun or am I missing something? Korn [kʰũːɘ̃n] (talk) 08:59, 9 April 2019 (UTC)
- Meh. Japanese sources also list 代名詞. As another example of distinctions made by native speakers, there's nothing particular about よう's grammatical functioning to set it apart from a 形容動詞, but Japanese sources consistently list it as a 助動詞. Having the "pronoun" label is also arguably useful for cross-language comparisons; without it, someone's bound to claim that Japanese "doesn't have pronouns", which is silly. ‑‑ Eiríkr Útlendi │Tala við mig 20:58, 11 April 2019 (UTC)
Romanization question
How to romanized the following terms:
Marlin Setia1 (talk) 11:24, 11 April 2019 (UTC)
- My preference and personal practice is with spaces. By comparison, one doesn't write flatofone'shand. ‑‑ Eiríkr Útlendi │Tala við mig 20:59, 11 April 2019 (UTC)
- 手の平 is one word, it should be romanized without spaces. Most of the times it's possible to tell from the accent if it's a single word or not. In this case, there is only one accent for the whole string (ténohira or tenóhira), so it counts as a word. If it was a compound expression, each component would retain its own accent (*té no híra). The same is true for 言の葉, which can be pronounced kotonoha but also kotonóha, the second pronunciation making it clear that we're dealing with a single word. Sartma (talk) 08:43, 20 September 2021 (UTC)
- See the longer response at Wiktionary_talk:About_Japanese#Re-thinking_the_pronunciation_section. ‑‑ Eiríkr Útlendi │Tala við mig 01:07, 23 September 2021 (UTC)
- 手の平 is one word, it should be romanized without spaces. Most of the times it's possible to tell from the accent if it's a single word or not. In this case, there is only one accent for the whole string (ténohira or tenóhira), so it counts as a word. If it was a compound expression, each component would retain its own accent (*té no híra). The same is true for 言の葉, which can be pronounced kotonoha but also kotonóha, the second pronunciation making it clear that we're dealing with a single word. Sartma (talk) 08:43, 20 September 2021 (UTC)
- My preference and personal practice is with spaces. By comparison, one doesn't write flatofone'shand. ‑‑ Eiríkr Útlendi │Tala við mig 20:59, 11 April 2019 (UTC)
"Neoclassical pronunciations"
There is a modern pronunciation of Classical Japanese where 買ふ is pronounced kō, different from Early Middle Japanese kafu, and different from Modern Japanese (the spoken language) where it's kau.
What is that modern pronunciation called? --Backinstadiums (talk) 13:38, 11 July 2019 (UTC)
- As an example, in the Jewel Voice Broadcast (audio), 惟ふに (3:00) is pronounced omō ni, and 失ふ (3:54) is pronounced ushinō. --Dine2016 (talk) 15:04, 11 July 2019 (UTC)
- If we (the EN Wiktionary community) are to come up with a label, I think "neoclassical" makes sense and clearly captures what's going on.
- That said, I'm sure we're not the first to discuss this phenomenon, and someone has probably come up with a label for this elsewhere. I'm not familiar with any such verbiage, however, so we must either hope that someone else chimes in who does know, or do some more research to find out for ourselves. ‑‑ Eiríkr Útlendi │Tala við mig 16:11, 11 July 2019 (UTC)
@Eirikr: According to Prof. Victor Mair,
From my colleague Linda Chance, who is a specialist on Classical Chinese: The technical term for this is ハ行転呼音・はぎょうてんこおん.
It refers to the fact that from sometime in the Heian period the "ha" line changed to the same pronunciation as the "wa" line, but the "ha" line spellings continued in use. (Interesting examples--if you write these in modern Japanese with 'u' for 'fu,' 惟うに is still pronounced omō ni, but 失う becomes "ushinau" (except in some dialects.) This "modern pronunciation" is potentially centuries old. We read classical texts this way because we can't retrieve that original early Heian pronunciation. --Backinstadiums (talk) 14:17, 12 July 2019 (UTC)
- @Backinstadiums, that refers to a specific phonological development earlier in the language's history. Unfortunately, that is not the Japanese term for "neoclassical pronunciation", which includes phenomena such as where modern /au/ and 1603 /ɔː/ is pronounced instead as /oː/. ‑‑ Eiríkr Útlendi │Tala við mig 04:44, 14 July 2019 (UTC)
- @Eirikr: Please send an email to Prof Mair explaining it: vmair@sas.upenn.edu --Backinstadiums (talk) 11:15, 14 July 2019 (UTC)
- @Backinstadiums:, why? I'm honestly curious why you think I should. Also, who is Professor Mair? And why would Linda Chance, a specialist in Classical Chinese, be considered an expert on the Japanese-language terms used to describe the neoclassical Japanese pronunciation in evidence in the mid-1900s Jewel Voice Broadcast?
- (Honestly not intending insult or aggression. Your response just confuses me.) ‑‑ Eiríkr Útlendi │Tala við mig 04:38, 15 July 2019 (UTC)
- @Eirikr: According to David Lurie:
- @Eirikr: Please send an email to Prof Mair explaining it: vmair@sas.upenn.edu --Backinstadiums (talk) 11:15, 14 July 2019 (UTC)
The technical term for these changes is tenko-on 転呼音, but they are not applied consistently in words where the first mora ends in 'a.' I don't know if there is a specific term for those exceptions. --Backinstadiums (talk) 23:04, 15 July 2019 (UTC)
- @Backinstadiums: who is David Lurie? I wouldn't expect a South African photographer to have much to say about Japanese linguistic terminology...?
- I note that the term 転呼音 (tenko-on) refers generally to the phenomenon of sound shift, or more specifically to the sounds that have so shifted. This Japanese term could apply just as well to describe how English don't you becomes doncha in informal speech in certain dialects. While "sound shift" or tenko-on is a useful way of describing the change from Old Japanese readings through to modern Japanese, it doesn't capture the specific "neoclassical pronunciation" sense at issue at the start of this thread. ‑‑ Eiríkr Útlendi │Tala við mig 23:33, 15 July 2019 (UTC)
- In the Standard Japanese, verbs don't cause vowel fusion in the inflection. Examples like 買う (kō) are allowed in some dialects.--荒巻モロゾフ (talk) 23:05, 14 July 2019 (UTC)
- Actually, it is only the -au and -ou verbs which do not cause vowel fusion in the dictionary form. 食う can be alternatively pronounced クー, 言う has stems alternating between い~ and ゆ~, and 酔う has the stem changed from え~ to よ~ in all forms. --Dine2016 (talk) 14:09, 15 August 2019 (UTC)
Another example of this "neoclassical pronunciation" is fossilized words like 逢瀬 (ōse) and 逢魔が時 (ōmagatoki) as well as names beginning with 逢 (ō). Jisho.org search --Dine2016 (talk) 13:57, 15 August 2019 (UTC)
- @Eirikr: Interestingly, this video (0:26) shows that 給う was pronounced タモー even in an otherwise 口語 text. --Dine2016 (talk) 04:03, 20 October 2019 (UTC)
Add HSK level of the Hanzi characters
The Japanese section shows the 常用漢字 level of the Kanji, so I'd like to propose adding the HSK level of the hanzi too. Yet, I have encountered some objection that levels from other tests would also have to be added, to which I replied that in any case, most tests classify the same characters in the same levels, so only a group of characters would have two different levels at most.
Why do kanji only show the levels of 常用漢字? Secondly, where should I propose adding HSK levels for hanzi? --Backinstadiums (talk) 01:15, 17 July 2019 (UTC)
- @Backinstadiums: I'm not familiar with what "HSK" means, but since you're using the term hanzi, I infer that you're talking about Chinese and the Hanyu Shuiping Kaoshi, so presumably you should strike up a thread at Wiktionary talk:About Chinese. ‑‑ Eiríkr Útlendi │Tala við mig 20:28, 17 July 2019 (UTC)
- @Eirikr: Why do kanji only show the levels of 常用漢字? --Backinstadiums (talk) 23:15, 17 July 2019 (UTC)
- @Backinstadiums: I'm not certain, but if I had to guess, I'd say that it's because the 常用 levels tell us something about expected literacy in Japanese, as defined by the Japanese government. This also tells us something about how likely we are to encounter a given word or reading, since I believe that major publications like newspapers generally limit their main content to just the 常用漢字. I'm not sure what other levels would be useful or appropriate to show. I suppose an argument could be made that the 常用 levels are more "encyclopedic" information, but then again, it's information about the word itself, which would seem to be relevant to Wiktionary content. ‑‑ Eiríkr Útlendi │Tala við mig 00:13, 18 July 2019 (UTC)
- @Eirikr: Why do kanji only show the levels of 常用漢字? --Backinstadiums (talk) 23:15, 17 July 2019 (UTC)
I wonder who has "deprecated" inclusion of kana (haven't checked the history but I disagree). I have always provided unlinked kana in translations at the English Wiktionary. It has been my practice for many years. --Anatoli T. (обсудить/вклад) 01:32, 23 August 2019 (UTC)
- Drawing attention: (Notifying Eirikr, TAKASUGI Shinji, Nibiko, Suzukaze-c, Dine2016, Poketalker, Cnilep, Britannic124, Nardog, Marlin Setia1, AstroVulpes, Tsukuyone, Aogaeru4, Huhu9001, 荒巻モロゾフ, Mellohi!):
- Also, I think kana and most definitely rōmaji should be unlinked in translations - 環境 (かんきょう, kankyō), not 環境 (かんきょう, kankyō).
- I should also point out that language-specific templates are always more advanced than generic. We should strive to use kana to automatically romanise Japanese, just how
{{ja-r}}
,{{ja-x}}
are implemented, e.g. 環境 (kankyō) (with or without furigana). The generic{{t}}
or{{t+}}
can't do that. It's all the more important to include kana. --Anatoli T. (обсудить/вклад) 05:07, 23 August 2019 (UTC)- I prefer 環境 (kankyō) or just 環境 (kankyō) to the redundant 環境 (かんきょう, kankyō), but non-native speakers might have different opinions. — TAKASUGI Shinji (talk) 08:00, 23 August 2019 (UTC)
- @Anatoli, that would be me who last updated the page. As I noted in the edit comment, "+long-overdue rewrite to match current best practice". I hadn't seen any recent edits by you that added such formatted links, so I apologize for missing your modus operandi. The links I've seen added by other experienced JA editors have used the format described in the page, that is,
{{t|ja|環境|tr=kankyō}}
. Similarly, I haven't seen any experienced editors adding any other links, such as using{{m}}
or{{l}}
, that include kana in thetr=
parameter. My recollections of past discussions about this issue were all of a general theme to no longer include kana in thetr=
parameter; such discussions gave rise to the current{{ja-r}}
template, for instance. - In response to your points:
- For transliterated text as links, I wholly agree that this is sub-optimal, and I tried to write the update to indicate that transliterations should not be links. If that was unclear, we should edit the page again to clarify this point.
- For translation tables using
{{t}}
or{{t+}}
, etc., I disagree that these should include kana, for several reasons.
- For starters, this is the EN Wikt, and readers are not expected to read anything other than English, which is written using the Latin alphabet. We don't include non-Latin-alphabet phonetic guides for any other language's listings in translation tables: links consist of the linked text to the term itself, plus optionally a Latin-alphabet phonetic guide. I believe that Japanese listings in translation tables should be consistent and follow this same format.
- Additionally, kana only provide a phonetic guide, so including both kana and romaji in the
tr=
parameter is redundant. IFF we are including kana as part of{{ja-r}}
, I have no strong objection: 1) this doesn't display the kana visually within the same parenthetical transliteration guide, 2) we generate the romaji from the kana, so we're not duplicating information in the wikicode, and (more importantly, in my view) 3) if used properly,{{ja-r}}
indicates which kana are used for which kanji, which is useful information. However, the{{t}}
or{{t+}}
templates, as you also note, cannot (currently) generate ruby text this way, so the only way to add kana is 1) redundantly, alongside and essentially duplicating the information in the romaji, and 2) without any ability to tie the kana to their specific kanji.
- If a reader of a translation table wishes to see more information about a Japanese term (or indeed any term), they have only to click the link in order to see the full entry, including the kana. Translation tables are intended to be tight, compact, and succinct. Adding kana to translation table listings instead makes the tables bigger, harder to read, and visually messy, and it makes the Japanese listings inconsistent from all the other entries. I view these as negatives, and for me, this makes kana extraneous unnecessary information in this context.
- You state that, "it's all the more important to include kana" -- presumably you mean in the translation table listings themselves? I don't understand your reasons for making this statement. Could you explain in more detail?
- Do you perhaps mean that we should update the module for
{{t}}
and{{t+}}
to use a similar ruby feature as{{ja-r}}
? If so, I again have no real objection, provided that the module can be tweaked to account for unwritten kana that don't belong to any kanji -- such as the possessive の (no) or つ (tsu) that appears in the readings of certain compounds. ‑‑ Eiríkr Útlendi │Tala við mig 17:38, 23 August 2019 (UTC)
- As a note, I am concerned about the proliferation of cosmetic-only templates -- those that use short, and often obscure, names, and that either redirect to templates with longer and more obvious names (like
{{ja-x}}
redirecting to the more-obvious{{ja-usex}}
), or that replicate features of other templates that are better documented and better understood (like{{lj}}
, which redirects to{{ruby/ja}}
, and which in turn mostly just recreates the functionality in{{ja-r}}
-- but with no documentation, and a strange syntax that is very hard to read in the wikicode).
- I don't see any compelling use case for
{{ja-x}}
, so I am (gradually) replacing it with{{ja-usex}}
. Once fully orphaned, we can delete the redirection page. - I intend to eventually do the same for
{{lj}}
and the related{{wj}}
-- replace these with functionally equivalent templates that are better used, better maintained, and better documented. ‑‑ Eiríkr Útlendi │Tala við mig 00:29, 19 March 2022 (UTC)
- Someone (*coughs loudly*) is allergic to typing and thinks that things like
{{lj}}
and{{wj}}
are sensible, responsible abbreviations for a multilingual project where the norm is to prefix templates with their entire language code. —Fish bowl (talk) 00:43, 19 March 2022 (UTC)
- Someone (*coughs loudly*) is allergic to typing and thinks that things like
two-level hierarchy of alternative spellings
Hi everyone. I have just created {{ja-gv}}
, a soft-redirection template designed specifically for kyūjitai forms. Please take a look at 天道蟲, てんとう蟲 and 樣 for examples.
Note that this template is used to soft-redirect kyūjitai forms to shinjitai forms, unlike {{ja-see}}
which is used to soft-redirect within the shinjitai world. For example, in the entry てんとう蟲, the editor only needs to prodive the shinjitai form as {{ja-gv|てんとう虫}}
. It is up to the template to find the lemma form in the shinjitai world by fixing double redirects. There are sometimes multiple lemma forms as shown by the example of 樣, and {{ja-gv}}
can handle that. Another difference is that {{ja-gv}}
automatically copies {{ja-kanjitab}}
s from the shinjitai entry because the shinjitai and the kyūjitai spellings will have the same pattern of readings. {{ja-see}}
can't handle that as different spellings within the shinjitai world (てんとう虫 and 天道虫) have different reading patterns.
What do you think about this two-level hierarchy solution?
(Notifying Eirikr, TAKASUGI Shinji, Nibiko, Atitarev, Suzukaze-c, Poketalker, Cnilep, Britannic124, Marlin Setia1, AstroVulpes, Tsukuyone, Aogaeru4, Huhu9001, 荒巻モロゾフ, Mellohi!): --Dine2016 (talk) 09:06, 18 September 2019 (UTC)
Shinjitai standard
Which shinjitai standard should we use for lemma entries? Currently we use 躯, but Daijirin uses 軀. --Dine2016 (talk) 04:57, 30 January 2020 (UTC)
- @Dine2016, which edition of Daijirin? My local copy uses 躯, as does my local copy of the KDJ. My dead-tree copy of the SMK5 has 軀 instead. The NHK Hatsuon dictionary only has the kana form むくろ (mukuro), with no kanji listed, and compounds using this kanji are likewise missing.
- Meanwhile, I see at Kotobank that their Daijirin entry lists multiple kanji spellings, with the shinjitai 躯 as the first (and presumably preferred?) spelling.
- By way of comparison, I note too that the JA Wiktionary for 躯 explicitly lists 軀 as kyūjitai, and their むくろ entry lists 躯 as a kanji spelling, but not 軀.
- I note that my SMK5 was published in 2000. Is it possible that the shinjitai character 躯 has become more accepted, or has been officially endorsed, in the past 20 years? ‑‑ Eiríkr Útlendi │Tala við mig 18:06, 30 January 2020 (UTC)
- Sorry, I meant the weblio version of Daijirin. --Dine2016 (talk) 03:33, 31 January 2020 (UTC)
Proposal for a lossless transcription of Old Japanese using hiragana + hentaigana as found in Unicode
Sketch found here --Backinstadiums (talk) 11:44, 22 December 2019 (UTC)
- @Backinstadiums: It's interesting, but defective -- where are the voiced obstruents? The proposal confusingly uses dakuten for と (to) to produce ど (do), but fails to do so with 倍 (pe2), rendering this in romaji as nbe in a way that is definitely not lossless, but rather instead problematically additional.
- I also don't agree with their explicit addition of -n- before voiced obstruents. While this appears to be the general consensus for how these voiced consonants evolved, I don't think this is a correct reconstruction, nor is it clear notation, not least due to the potential for confusion with later moraic ん (n).
- As a minor point, some of their glyph choices strike me as a bit odd -- for ⟨mi2⟩, for instance, it seems very odd for them to choose the quite-complicated U+1B0CA 𛃊 rather than the comparatively simpler U+1B0CF 𛃏. Or for ⟨no1⟩, U+1B09A 𛂚 seems an odd choice given the existence of simpler U+1B09D 𛂝. And while ⟨me2⟩ is lacking a hentaigana, 米 is a slightly more complicated character than 目, and is also (subjectively, marginally) more difficult to write due to the direction of the strokes, and more difficult to type.
- ⇒ In functional terms, if we could adapt the proposal to 1) add dakuten to all voiced obstruents, and 2) get rid of that confusing extra -n- preceding voiced obstruents in romanization, I think we'd have something that we could use. Ideally, we'd also 3) swap out some of the glyphs for simpler alternatives.
- That said, I think this is a non-starter until we have much wider availability of fonts that support these codepoints. If our users only get tofubake (c.f. this StackExchange page), we're not helping anyone. ‑‑ Eiríkr Útlendi │Tala við mig 17:24, 23 December 2019 (UTC)
Diphthongs in current colloquial Japanese
Why isn't any of the following mentioned in the respective entries?
In colloquial Japanese, many diphthongs disappeared. So, words like でかい (DEKAI) and やばい (YABAI) became でけえ (DEKEE) and やべえ (YABEE), and words like わるい (WARUI) and さむい (SAMUI) became わりい (WARII) and さみい (SAMII). With these changes, words like はやい (HAYAI) became (HAYEE), and words like か ゆい (KAYUI) became (KAYII)
--Backinstadiums (talk) 10:57, 16 January 2020 (UTC)
- Because that's still considered slang, possibly even dialectal. Even in colloquial speech, it's non-standard and primarily limited to certain demographics and social contexts. I don't know where you got that quote, but it's not quite correct. ‑‑ Eiríkr Útlendi │Tala við mig 17:28, 16 January 2020 (UTC)
- On top of that, I don't think most Japanese people would recognize /yi/ as a sound in their language, nor would they recognize the odd hentaigana in that KAYII image of yours. ‑‑ Eiríkr Útlendi │Tala við mig 17:32, 16 January 2020 (UTC)
- Yi or wu has never existed in the Japanese language. Listing the colloquial conjugated forms is not a bad idea, though. — TAKASUGI Shinji (talk) 02:21, 18 January 2020 (UTC)
- The quote is from this proposal sent to Unicode by a certain "Abraham Gross". —Suzukaze-c◇◇ 08:02, 18 January 2020 (UTC)
- I had a look. Ugh. Poorly written. I dispute his wording that "many diphthongs disappeared". This phenomenon represents a specific kind of social signalling, whereas he presents it as a past historical sound shift that is complete. He also misleadingly states, "Having these characters [YI and YE] available for writing would be invaluable as a way to represent these sounds in Japanese, for transcribing into Japanese, for digitizing old books, and for Japanese scholars." But as noted above, /ji/ is not a sound that Japanese speakers use or can pronounce, so the utility of the proposed hiragana YI is ... questionable at best. His phonological argument for the existence of /wu/ is tenuous at best.
- The evidence of use he presents is all from the 1890s, when various groups were trying to standardize and "fill in" the otherwise empty items in the so-called 五十音 chart. This strikes me as unnecessary completionism driven more by aesthetics than utility. Notably, none of these glyphs proposed in the 1890s caught on -- they're just not that useful when they represent sounds that are either 1) unused in the modern language, such as /je/, or 2) effectively impossible for native speakers to pronounce or distinguish, such as /ji/ and /wu/.
- Meh.
- Anyway, back on the topic of monophthongization, I would support adding these to the WT:AJA page, so long as the social context is clearly explained. ‑‑ Eiríkr Útlendi │Tala við mig 18:51, 21 January 2020 (UTC)
It seems that this contraction only affects -i adjective and a single word "みたい". Can anyone think of another non-"-i adjective" that contracts this way? It is kind of strange with "みたい" being a sole exception. -- Huhu9001 (talk) 08:47, 26 January 2020 (UTC)
- @Huhu9001, that's a very good point. I cannot think of many other examples; the only ones that come to mind are words that end in /-ai/ in "standard" polite speech -- pretty much all inflecting as _-i_ adjectives, with the exception of _-na_ adjective みたい (mitai).
- Come to think of it, I have seen rare instances of 世界 (sekai) flattening to /sekeː/.
- I think the key is that the resulting monophthongized pronunciation must still be unambiguous enough to impart the correct meaning. For example, _-na_ adjective 嫌い (kirai, “hated, extremely disliked”) cannot monophthongize without becoming homophonous with 綺麗 (kirei, “pretty, attractive; clean, clean-cut”). The utterance /ano ko ɡa kireː/ can generally only be parsed as あの子が綺麗 (ano ko ga kirei, “that girl is pretty”), rather than as a monophthongized version of あの子が嫌い (ano ko ga kirai, “I hate that girl”). Admittedly, the pitch accents are different, with 綺麗 (/kíꜜrèː/) vs. 嫌い (/kìráí/ → /kìréː/), so theoretically the two would still be distinguishable. However, I cannot find any instances of monophthongized 嫌い (kirai), which I suspect is due to the potential for confusion and the nearly opposite connotations of the two words.
- I also cannot find instances of medial monophthongization.
- However, I grant that my fruitless searches might be due more to my weak Google-fu and unfamiliarity with non-standard Japanese. ‑‑ Eiríkr Útlendi │Tala við mig 18:39, 30 January 2020 (UTC)
- 「『ちげぇ』よ」? @Eirikr, Huhu9001: —Suzukaze-c◇◇ 08:12, 21 February 2020 (UTC)
- @Suzukaze-c: I believe ちげぇ is from ちがう, not ちがい. "ちがいよ!" (??) "ちがうよ!" (✓) -- Huhu9001 (talk) 09:31, 21 February 2020 (UTC)
- Yes, I agree with you. I misread the conversation while needing sleep. —Suzukaze-c◇◇
- @Suzukaze-c: I believe ちげぇ is from ちがう, not ちがい. "ちがいよ!" (??) "ちがうよ!" (✓) -- Huhu9001 (talk) 09:31, 21 February 2020 (UTC)
Headword format
- Discussion moved from User talk:Suzukaze-c#bot request.
It's hard for me to run a bot since I access Wiktionary via a proxy. Do you have any interest in reforming the Japanese entry layout? As a first step, simply changing the headword format to match other inflecting languages should be enough.
- 使う • (tsukau) tr godan (stem 使い (tsukai), past 使った (tsukatta))
- 食べる • (taberu) tr ichidan (stem 食べ (tabe), past 食べた (tabeta))
- 高い • (takai) -i (adverbial 高く (takaku))
- 有名 • (yūmei) -na (adnominal 有名な (yūmei na), adverbial 有名に (yūmei ni))
The main work involving bots is moving the kyūjitai and historical kana to some other place. Everything else can stay in the headword templates. While this new format doesn't reduce the complexity of source code of entries, it is a great improvement for the reader. --Dine2016 (talk) 10:55, 14 January 2020 (UTC)
- I would be glad to help out with AutoWikiBrowser.
- I think it is not a bad idea. I like the format you demonstrate here. —Suzukaze-c◇◇ 06:17, 15 January 2020 (UTC)
- Everyone, what do you think of the headword format above? I think this is what we should have had from the beginning (see the documentation of Module:headword).
- The first step in my plan is to modify Module:ja-headword to move kana and romaji to the left. Using furigana in Japanese headwords is analogous to adding the vowels to Arabic headwords or acutes to Russian headwords. But my main concern is that it may look ugly without manual
%
hints, e.g. 紫式部. Other edge cases like 出づ and 物の哀れ only affect a small number of entries and can be manually fixed. - (Notifying Eirikr, TAKASUGI Shinji, Nibiko, Atitarev, Suzukaze-c, Poketalker, Cnilep, Britannic124, Marlin Setia1, AstroVulpes, Tsukuyone, Aogaeru4, Huhu9001, 荒巻モロゾフ, Mellohi!): --Dine2016 (talk) 15:18, 15 January 2020 (UTC)
- I do think that the conjugation tables in current use are unnecessarily intimidating. I think the format shown above is nicely simplified, and the forms included (stem, past tense, adverbial) are probably most useful to include. (For -na adjectives, perhaps even adnominal & adverbial are unnecessary, but I don't really object to including them.) I also like that the conjugation classes are named. Perhaps those could link to something like an appendix for readers who want all the gory details? Cnilep (talk) 01:25, 16 January 2020 (UTC)
- I generally agree. But we need to first determine where the kyūjitai and historical kana are going. -- Huhu9001 (talk) 08:03, 16 January 2020 (UTC)
- General thoughts:
- Agreed that we need to decide where to put existing data that doesn't fit in the new proposed format -- kyūjitai and historical kana. Presumably kyūjitai should go in
{{ja-kanjitab}}
, but I'm not sure where historical kana would go if not the headword. FWIW, monolingual JA dictionaries include historical kana in their entry headlines. - Agreed that the proposed subset of verb forms for the headline are likely the most immediately useful: 終止形・連用形・過去形. Notably, the Nippo Jisho / Vocabulario da Lingoa de Iapam presents these same forms on its verb-entry headlines.
- @Cnilep, not sure what you mean by "the conjugation tables in current use are unnecessarily intimidating". I'm certainly open to the idea of reworking them, and I've long thought the current state presents a seemingly arbitrary subset of forms, but I don't think the conjugation tables should be removed altogether.
- I guess the main thing I had in mind is that if one wants, say, the past form of 遊ぶ and that information is only available in the conjugation table, one needs to look through 14 other forms before coming upon 遊んだ. I agree that the tables are useful, and I appreciate the fact that they are collapsed by default. I see proposed changes as a useful addition, not as a replacement. Cnilep (talk) 00:13, 17 January 2020 (UTC)
- Also, if we include adnominal forms for -i adjectives (which we kinda have to, since that's also the terminal / dictionary form), it makes sense to include that for -na adjectives as well. Principle of least surprise, not confusing users, etc.
- @Dine2016, your ruby for 物の哀れ are a bit off -- was that meant to show the current incorrect output of
{{ja-r}}
when not using%
to manually specify kana string breaks? - I don't agree with hiding the link to Wiktionary:Japanese transliteration in the ・. This is poor usability -- I had no idea this link was there until I looked at the wikisource, and even then, the ・ is so small that it's not easy to click on when using a mobile device, and challenging even using a mouse, especially for any of our mobility-impaired users.
- @Dine2016, what is your use case for including this link? If it's to explain our approach to romanization, then perhaps all of the romaji strings should have this link? That seems a more obvious and hard-to-miss presentation to our readers.
- Separately, it is confusing to reverse which text is italics and which is not. I would prefer for all romaji to be in italics, and descriptive text to be non-italicized. This aligns with the output of various other templates like
{{m}}
,{{ja-r}}
, and{{ja-readings}}
, and would be more visually consistent. Example of what this might look like for the entry headlines:
- Presumably we could implement the above using careful tweaks of our CSS.
- I notice that the な (na) and に (ni) for 有名 are linked to the entries. That is good. I suggest that these links be made more specific, to deliver the user directly to the relevant etymology / sense.
- For that matter, I notice now that we don't have any adverbial sense for に (ni). We'll need to add that.
- Agreed that we need to decide where to put existing data that doesn't fit in the new proposed format -- kyūjitai and historical kana. Presumably kyūjitai should go in
- Again, on the whole, I support this idea of changing the entry headlines for Japanese. The above is intended as feedback to ensure that our results are optimal. ‑‑ Eiríkr Útlendi │Tala við mig 18:43, 16 January 2020 (UTC)
- @Eirikr: The bullet (🙃) and italics are part of the standard Wiktionary Template:head...
{{l}}
also does not italicize romanizations, and{{ja-readings}}
did not use italics until the recent discussion. —Suzukaze-c◇◇ 00:25, 17 January 2020 (UTC)- @Suzukaze-c: Hmm, thank you. The interpuncts may be part of the default
{{head}}
, but they are new to the JA entries since (I believe) Dine2016's edits earlier today. I don't think I like that particular part of this layout -- very poor discoverability. I agree with your linked comment, that definitely needs a rethink and redesign. And regarding{{l}}
, I've never agreed with the decision to differentiate italicization behavior between{{m}}
and{{l}}
. Cheers, ‑‑ Eiríkr Útlendi │Tala við mig 01:25, 17 January 2020 (UTC) - @Eirikr: Not sure what you mean by "if we include adnominal forms for -i adjectives ..." The new format I proposed and implemented shows adnominal forms for -na adjectives (which is dictionary form + na), but not -i adjectives, since they are the same as the dictionary form. As for the off-placed kana, that's exactly the point -- the headword templates must fail rather than choose one candidate when there are several ways to match kanji and kana. --Dine2016 (talk) 08:36, 17 January 2020 (UTC)
- @Dine2016: I think he meant he wants 物の哀れ instead of 物の哀れ. -- Huhu9001 (talk) 11:42, 17 January 2020 (UTC)
- @Dine2016, Huhu9001, re:
{{ja-r}}
, it should never put ruby on kana (unless maybe the ruby provided is completely different from the kana? That happens sometimes in manga...). And the template doesn't currently put ruby on kana, so that's a non-issue. The problem is simply that the template currently mis-parses certain strings unless the editor manually inserts the%
markers to delineate the breaks. ‑‑ Eiríkr Útlendi │Tala við mig 17:50, 17 January 2020 (UTC) - @Dine2016, re: adnominal forms, yes, we get that by default for -i adjectives since the adnominal and terminal / dictionary forms are identical. I was mostly responding to Cnilep's comment about possibly removing the adnominal from the headline for -na adjectives. ‑‑ Eiríkr Útlendi │Tala við mig 17:50, 17 January 2020 (UTC)
- @Dine2016, Huhu9001, re:
- @Dine2016: I think he meant he wants 物の哀れ instead of 物の哀れ. -- Huhu9001 (talk) 11:42, 17 January 2020 (UTC)
- @Suzukaze-c: Hmm, thank you. The interpuncts may be part of the default
- @Eirikr: The bullet (🙃) and italics are part of the standard Wiktionary Template:head...
- Is there any good way to make kanji-kana matching more precise? I'm thinking about using a 漢字音訓表. --Dine2016 (talk) 13:40, 17 January 2020 (UTC)
- @Dine2016, so long as the table is fully populated. A lot of lesser-used readings are omitted from many dictionaries, which often focus just on modern usage. We try to cover the whole language, even historically, so our entries are likely to include many more readings. ‑‑ Eiríkr Útlendi │Tala við mig 17:50, 17 January 2020 (UTC)
- Maybe we can make the code automatically insert "%" to the kanji-form using a default rule when it detects there are some "%" in the kana-form but none in the kanji-form. -- Huhu9001 (talk) 14:19, 17 January 2020 (UTC)
- @Huhu9001, I can imagine many failure modes for that. For instance, what if the editor only added a few
%
markers, just enough to get the desired behavior from the current (or some past) state of{{ja-r}}
? What about jukujikun, which sometimes appear in compounds with other kanji with regular on or kun? I'm not sure that automatically inserting%
in the right places is a realistic goal. ‑‑ Eiríkr Útlendi │Tala við mig 17:50, 17 January 2020 (UTC)
- Well, perhaps a much simpler idea is to add a new parameter for Japanese headers, with which you can add "%" or spaces directly to the kanji-form:
{{ja-noun|もの の あわれ|kanji=物 の 哀れ}}
-- Huhu9001 (talk) 18:24, 17 January 2020 (UTC) - Oh, we've already got this parameter. It is
head=
. We can just give this function to it. -- Huhu9001 (talk) 18:27, 17 January 2020 (UTC)
- Well, perhaps a much simpler idea is to add a new parameter for Japanese headers, with which you can add "%" or spaces directly to the kanji-form:
- @Huhu9001, I can imagine many failure modes for that. For instance, what if the editor only added a few
- Is there any good way to make kanji-kana matching more precise? I'm thinking about using a 漢字音訓表. --Dine2016 (talk) 13:40, 17 January 2020 (UTC)
- @Huhu9001, Dine2016, not sure what's going on, but over on the こし entry,
{{ja-noun|コシ}}
is generating awfulness -- instead of presenting コシ as an alternative spelling, it's showing this:
- こし or こし • (koshi)
- Definitely not expected or desired output, and highly likely to confuse users. As a general approach, kana should never have kana ruby, unless the ruby string is provided as a manga-style way to provide a completely different gloss for the main term. Adding katakana ruby over hiragana with the same phonetic values is just ... extremely confusing. ‑‑ Eiríkr Útlendi │Tala við mig 00:32, 18 January 2020 (UTC)
- I've made a mess. @Eirikr: How is it now? -- Huhu9001 (talk) 00:39, 18 January 2020 (UTC)
- @Huhu9001, much better, thank you! ‑‑ Eiríkr Útlendi │Tala við mig 01:04, 18 January 2020 (UTC)
- @Huhu9001 instead of having editors manually insert
%
, what about having the template fetch the kanji-kana matching from{{ja-kanjitab}}
s on the same page? --Dine2016 (talk) 03:37, 18 January 2020 (UTC)- It may be a good idea. -- Huhu9001 (talk) 05:34, 18 January 2020 (UTC)
- @Huhu9001 instead of having editors manually insert
- @Huhu9001, much better, thank you! ‑‑ Eiríkr Útlendi │Tala við mig 01:04, 18 January 2020 (UTC)
- I've made a mess. @Eirikr: How is it now? -- Huhu9001 (talk) 00:39, 18 January 2020 (UTC)
- Checked the layout for the 帰る entry today out of curiosity and saw further changes.
- I'd like to suggest that the verb conjugation type notation include both the 教育 labels ("ichidan conjugation, godan conjugation") and the common English-language labels ("type II, type I"). We must remember that our readership is 1) users reading in English, where English-language terminology should at least be offered, and 2) probably learners of Japanese, so labels and other meta-information should provide appropriate context for learners. I'm a fan of using both labels if possible, as the "type XX" notation is extremely common in English-language materials for Japanese learners, whereas the XXdan notation is really the only notation used in Japanese-language materials, which Japanese learners will eventually start using (so long as they continue in their studies, of course).
- (Also, should we move this thread to somewhere more specific for this discussion? At the bare minimum, for posterity?)
- Cheers, ‑‑ Eiríkr Útlendi │Tala við mig 17:08, 21 January 2020 (UTC)
- In addition, a link through to w:Japanese verbs or some similar explanatory page would likely be a good idea. ‑‑ Eiríkr Útlendi │Tala við mig 17:11, 21 January 2020 (UTC)
- moved ✓ —Suzukaze-c◇◇ 17:53, 21 January 2020 (UTC)
- @Eirikr: Godan and ichidan are 学校文法/国文法 labels. The 日本語教育文法 equivalents are Group I and Group II. --Dine2016 (talk) 09:39, 22 January 2020 (UTC)
- @Dine2016, no real argument there. My only quibble is that I'm more accustomed to seeing "type I / II" than "group I / II", and the WP page at Japanese verb conjugation similarly uses the "type" terminology.
- Separately, I think I remember you suggested putting shinjitai and kyūjitai information in
{{ja-kanjitab}}
rather than the POS headline. I believe it would be much more appropriate in{{ja-kanjitab}}
, and putting it there would also mean we wouldn't need to duplicate this for each POS headline. ‑‑ Eiríkr Útlendi │Tala við mig 00:44, 23 January 2020 (UTC)
- @Eirikr: Godan and ichidan are 学校文法/国文法 labels. The 日本語教育文法 equivalents are Group I and Group II. --Dine2016 (talk) 09:39, 22 January 2020 (UTC)
- moved ✓ —Suzukaze-c◇◇ 17:53, 21 January 2020 (UTC)
- In addition, a link through to w:Japanese verbs or some similar explanatory page would likely be a good idea. ‑‑ Eiríkr Útlendi │Tala við mig 17:11, 21 January 2020 (UTC)
@Everyone: do we need new formats for -no adjectives (and -na/no adjectives) as well as adverbs that are optionally or mandatorily followed by to? --Dine2016 (talk) 03:12, 23 January 2020 (UTC)
- @Dine2016:: broadly, yes. Not sure yet how best to handle -no adjectives, since these are 1) a relatively new class of word, historically speaking, and 2) they appear to be either a shift in usage of nouns, or a shift in particles for -na adjectives. For the と (to) adverbs, we should have some means of clarifying for readers whether the particle is required or optional. ‑‑ Eiríkr Útlendi │Tala við mig 16:52, 23 January 2020 (UTC)
- I do not know enough and decline to give an opinion. —Suzukaze-c◇◇ 23:25, 23 January 2020 (UTC)
Recently I noticed there is a Template:ja-altread. It may be related to this topic. -- Huhu9001 (talk) 09:32, 23 January 2020 (UTC)
- @Huhu9001: In almost all cases,
{{ja-altread}}
is misused: in almost all cases, the separate readings should be split out into separate etymology sections. See examples of this at 他, where the Chinese-derived ta and native-Japanese hoka readings require splitting out; 候#Etymology_4 where again each of the three readings here have their own phonological derivations and other specifics; 妹#Etymology_2 and 弟#Etymology_2 where the "alternative" reading is actually dialect, and where Kagoshima is arguably distinct enough to be considered a separate language (we need a bigger discussion of how to handle 方言 before diving in to this degree); 指 where the "alternative" reading appears to be a fusion + vowel shift deriving from お + ゆび and is obsolete, where our current layout using{{ja-altread}}
misleadingly suggests that this is a regular everyday reading for this kanji; 札#Etymology_2, where again we misleadingly include two archaic, borderline-obsolete readings and present them as equivalent to modern usage; 私#Etymology_1, where a dialectal form is included under a kanji spelling, but without textual evidence for that kanji spelling; 面#Etymology_2_2, where the reading omo is incorrectly given as simply an "alternative" for omote, despite omote in fact deriving from 面 (omo, “face; front”) + 手 (te, in reference to direction); etc. etc.
- There are occasional instances where this template has been used in ways that seem more correct and justified, such as at 御#Etymology_3 or 梟 (and even here, the 梟 entry needs more explanation, such as the explanatory notes about the readings at 御#Etymology_3). However, in almost all cases,
{{ja-altread}}
has resulted in incomplete and even sloppy lexicography. I currently count 236 entries linking to this template. I expect that most of these will require reworking. ‑‑ Eiríkr Útlendi │Tala við mig 16:52, 23 January 2020 (UTC)- I agree with Eirikr. —Suzukaze-c◇◇ 02:57, 24 January 2020 (UTC)
- @Eirikr: Since 御#Etymology_3 is an obsolete word, it appears that romaji would reflect the neoclassical pronunciation. We don't know much about this kind of pronunciation except:
- Historical kana are translated to modern pronunciation according to the rules.
- /Vu/ monophthongization rules can be optionally applied between verb 語幹 and 語尾. For example, 会ふ can be read either アウ or オー, and 変ふ (classical form of 変える) can be read either カウ or コー.
- The volitional "auxiliary verb" む is pronounced ン (not sure if this is optional or mandatory).
- However, it is unsure whether む in other words that developed into ん is pronounced ン. If this is mandatory, then we shouldn't have the ōmu- romaji (which would be a kind of spelling pronunciation, like konnichi ha). Bjarke Frellesvig's A History of the Japanese language mentioned this "neoclassical" pronunciation in section 1.1.6, where it is called "NJ reading", and gave an example where 神主 (Old Japanese: kamunusi) is read kannusi in "NJ reading". but as that book focuses on the spoken language, it does not study this kind of reading. --Dine2016 (talk) 06:29, 25 January 2020 (UTC)
- @Eirikr: Since 御#Etymology_3 is an obsolete word, it appears that romaji would reflect the neoclassical pronunciation. We don't know much about this kind of pronunciation except:
- I agree with Eirikr. —Suzukaze-c◇◇ 02:57, 24 January 2020 (UTC)
Where to put historical kana
hiragana (modern) | こい |
hiragana (historical) | こひ |
kanji (shinjitai) | |
kanji (kyūjitai) | |
kun’yomi |
@Suzukaze-c, Eirikr, Huhu9001 It's easy to agree that kyujitai should be moved to {{ja-kanjitab}}
. As for where historical kana would go, one option is to move them to the pronunciation section (see below) and another is to move them to a new version of {{ja-kanjitab}}
intended as a morpheme template (see right).
(Example adapted from User:荒巻モロゾフ/draft. Modern terms would have both modern and historical kana but only modern romaji.)
I prefer to put it in the pronunciation section because:
- The kana spellings are important information so it should be displayed in the main "flow" of the page. The morpheme templates on the right of the page are easily ignored because today's computers have wider screens, so they should be reserved for supplementary information (alternative kanji spellings, reading patterns, etc.)
- The kana spellings reflect modern and historical pronunciation, so it is more logical to put them in the pronunciation section. It doesn't hurt to duplicate them in the morpheme templates where they are distributed to the kanji or morphemes, but doing this to modern kana is sufficient.
- Templates on the same page are processed separately and cannot know the arguments supplied to each other. If the kana spellings and pronunciation are placed in different templates, there is no way for them to generate romanizations that depend on both the modern kana spelling and the pronunciation, such as Nihon-shiki and Waapuro Hepburn (Hepburn with long sounds expanded according to kana spelling, common on manga and anime websites).
What do you think? --Dine2016 (talk) 08:47, 25 January 2020 (UTC)
- I would like a new template for historical kana. I think the historical kana, and the historical pronunciations associated with them, are just "supplementary information".
- (Off-topic, I also want
|alt=
to be separated from t:ja-kanjitab, as they are not necessarily "kanji".) - Communication between templates is technically possible by making one page transclude itself. However I don't know whether there will be unpredicted consequences once this kind of self-transclusion is put into large-scale practice. -- Huhu9001 (talk) 09:10, 25 January 2020 (UTC)
- Self-transclusion may work if there is only one etymology section. But when there are multiple, you don't know which etymology section contains the "current word" to fetch arguments to other templates. Therefore it is still desirable to merge them. If one template has the reading pattern of the lemma spelling, one has alternative kanji spellings, and one has historical kana, they can't communicate and their relative order in the HTML output is fixed. It's better to merge them into a single morpheme template which will have greater freedom in the presentation of the information.
- As Eirikr said above, we're going to cover all stages from Old Japanese to Modern Japanese in the
==Japanese==
section, so it doesn't make sense to separate modern kana/pronunciation from historical kana/pronunciation and treat the latter as supplementary information. They can be presented side by side. --Dine2016 (talk) 10:26, 25 January 2020 (UTC)
- So does that mean all OJP L2 will be eliminated soon? -- Huhu9001 (talk) 14:35, 25 January 2020 (UTC)
- No, how to cover premodern Japanese is not yet agreed upon. That's another topic. (I want to have
==Old Japanese==
cover OJP from a different perspective than==Japanese==
.) --Dine2016 (talk) 15:34, 25 January 2020 (UTC)
- No, how to cover premodern Japanese is not yet agreed upon. That's another topic. (I want to have
- So does that mean all OJP L2 will be eliminated soon? -- Huhu9001 (talk) 14:35, 25 January 2020 (UTC)
By the way, I suggest avoiding the usage of class="mw-collapsible" because it does not work on the mobile site. -- Huhu9001 (talk) 08:50, 26 January 2020 (UTC)
Single-kanji + suru verbs
I ran across @Huhu9001's recent change at 会する. The ruby work correctly now, but the romanization links to kai suru, where the practice for other suru verbs has been to link to the two portions separately. While 会す is effectively inseparable, with potential form 会せる, it appears that 会する is parsed by (at least some) modern speakers as separable, with potential form 会できる. Alternatively, if we are to present this as an inseparable verb like 愛する, the romanization should not have any spaces.
Do others have any different views on this kind of verb? ‑‑ Eiríkr Útlendi │Tala við mig 06:47, 26 January 2020 (UTC)
- I have no idea on this. -- Huhu9001 (talk) 08:42, 26 January 2020 (UTC)
- This kind of verbs have their own declension that’s different from normal kango+suru verbs, so they should be one word, like 愛する. Sartma (talk) 12:35, 20 September 2021 (UTC)
Inflectional suffixes
@Dine2016, re: the recent changes like this one, might I suggest a simple template? This would allow for shorter wikitext and consistent wording, and an easy way to change the wording if such is ever needed. ‑‑ Eiríkr Útlendi │Tala við mig 21:09, 26 January 2020 (UTC)
- @Eirikr: Of course. A template can also easily generate categories as well. If we create a template, I suggest we further divide inflectional suffixes (from a morphological analysis) into inflecting ones and uninflecting ones, because this is another place where 学校文法 and 形態論 disagree: う and た are uninflecting, but they are traditionally classified as 助動詞 because う is historically derived from む, and た has the 活用形 たろ(う) and たら which are distinct suffixes from た from a synchronic analysis. --Dine2016 (talk) 23:53, 26 January 2020 (UTC)
@Eirikr I noticed that you added an anchor to the よう page. This is another proof that MediaWiki is unsuitable for dictionary entries. We should have been using page titles like ja/よう, suff. like the Oxford English Dictionary, which will allow unambiguous identification of lexical items. But now we lump several lexical items (i.e. entries in printed dictionaries) on one page, what to do? My solution is that we begin each Etymology section with a template that identifies the current lexical item like {{ja-spellings}}
, which will generate anchors like 蛙#ja-かえる and 蛙#ja-かわず. If there are no alternative spellings, the template can accept an extra identifier to generate anchors like よう#ja-volitional. This will allow one to link to a specific lexical item unambiguously, even if etymology numbers are reordered or if other languages (e.g. Chinese) interfere.
You may note that the standard entry layout (WT:EL) requires headword templates as the backbone of entries. This is because in languages like English, a change in POS usually results in different lexical items (e.g. en/change, v. and en/change, n. are different entries in the OALD). Languages like Japanese are quite different: as long as the kanji and kana are the same, printed dictionaries usually considers them to be the same lexical item, however the POS changes. This is another reason why we should work around the standard entry layout as much as possible. The Chinese editors who implemented Unified Chinese did a good job in shifting the backbone of entries from headword templates to {{zh-forms}}
and {{zh-pron}}
, and I think the Japanese editors should follow their example. --Dine2016 (talk) 06:10, 28 January 2020 (UTC)
- @Dine2016: Yes, I've long considered the current Wiktionary data structure to be ... horrible, from a data management perspective. The MW back-end allows us to do things like have a lemma spelling page, with each language as sub-pages under that, and optionally each etym and/or POS as sub-pages under the language -- with transclusion bundling all that into a consolidated lemma spelling page showing all languages for that grapheme. Users could follow the grapheme, or follow the language-specific sub-page. Frankly, I don't care if kappa#Finnish changes, but I do care if kappa#Japanese changes. Etc.
- However, for various reasons (some of which frankly don't make sense to me), other editors have been vehemently opposed to any such approach. Despite the fact that some of our forum pages effectively already do this -- such as Wiktionary:Beer parlor splitting things out by years and months for at least semi-sane management, and consolidating at Wiktionary:Beer parlor for ease of reading.
- I even knocked up a sample of what this might look like years ago at User:Eirikr/Sandbox3/ni. Now, with Lua, I suspect that something even better might be possible -- on the technical level, anyway.
- Re: your proposal for auto-generating anchors, I'm all for it.
- Incidentally, one reason for why I put the anchor at よう above the etym header is usability -- so that the etym header itself would be visible when a user arrived at that anchor. I've long thought it very confusing when I land on a page and I can't tell what heading I'm looking at.
- Cheers, ‑‑ Eiríkr Útlendi │Tala við mig 18:27, 28 January 2020 (UTC)
- If you agree that the current Wiktionary data structure to be horrible, I think we should make the source of entries as logical as possible (that is, favor logical markup over presentational markup), even at the cost of the final rendering. Here the anchor belongs to Etymology 3, so it should be under Etymology 3, not at the end of Etymology 2.
- In fact, I've encountered a similar problem when writing
{{ja-see}}
: single-kanji entries without Etymology sections may have the following structure:
==Japanese== ===Kanji=== ... {{ja-kanjitab|...}} ===Noun=== ...
- In this case the
{{ja-kanjitab}}
belongs to the word, not the ===Kanji=== section, so the code should not discard it when discarding the ===Kanji=== section. My solution was to write extra code to watch out for this case, but this is nevertheless dirty. --Dine2016 (talk) 05:29, 29 January 2020 (UTC)- @Dine2016, for single-kanji entries, if there is any POS content at all, there should be an
===Etymology===
header, even if there is no etymology actually provided at the current time. I'm glad you coded a workaround for{{ja-see}}
. As you note, that kind of entry structure is defective, and given time, we should eventually fix those. It's much easier to fix them if we can find them, but I'm not sure of an easy way to find such entries; do you have any good ideas?
- Re: wikicode structure, I prioritize usability for readers above usability for editors, hence where I put the anchor
<span>
. For the example at hand of よう, ideally there would be a way of generating an anchor in the etym header itself that includes editor-specifiable semantic information, not just a mutable index number. Lacking that, we are stuck with hackish workarounds. Oh well. ‑‑ Eiríkr Útlendi │Tala við mig 18:02, 29 January 2020 (UTC)- @Eirikr: "I prioritize usability for readers above usability for editors": Our standard layout which expresses hierarchy using headers is horrible for readers as well. For example, please take a look at the POS contents of 岸, then take a look at the ja-see-kango of が. Which layout is more "ichimokuryōzen"?
- If we want to serve readers better, we should keep the source of entries reasonably parseable, so that others can develop better layouts for Wiktionary. Do you prefer to use EDICT via its official interface WWWJDIC, or via the third-party Jisho.org? Similarly, Wiktionary readers will certainly use a third-party interface to read Wiktionary data, and we should prepare for that.
- "do you have any good ideas": (1) find a template that is used on all Japanese entries, such as the headword templates, (2) inject code into mod:ja-headword to make the headword templates transclude the entire page and analyze the Japanese section, (3) track or categorize the entries you want to find and (4) leave everything else to the Wikimedia servers. (I haven't tried this idea, though.) --Dine2016 (talk) 03:24, 30 January 2020 (UTC)
- @Dine2016, re: the header-heavy layout, agreed that it's sub-optimal. With a few minor quibbles, I quite prefer some version of the mock-ups you've pulled together in the past. Collapsing nouns and する (suru) verbs to a single header, for instance, is pretty much a no-brainer, but one that contravenes the dicta of WT:ELE. I ran into broader-community resistance to moving away from WT:ELE when we were developing the layout and templates for romanized Japanese entries -- technically speaking, it makes no sense to me to require anything more than the
==Japanese==
header and a single template that generates the===Romanization===
header and the sense line linking to the kana spelling -- and that's what we had for a brief while -- but other editors (notably, those not working with Japanese) pushed quite hard for keeping separate templates for{{ja-romaji}}
and{{ja-romanization of}}
. I still don't understand the reasoning behind this. Anyway, that experience made me a bit resigned to the current structure, and I've focused my energies more on getting the information in here. It may be time to revisit the layout in a more sustained and systematic fashion, especially now with Lua and all the capabilities we've gained since the last go-round (at least, the last one I participated in directly). - Re: finding entries based on structure, I kinda figured that might be the way to go. I don't have the Lua chops to pursue that, however. :-/ Maybe some day, but responsibilities IRL leave me little time to pursue that, and I confess that I find the spaghetti-ness of our module infrastructure makes it a bit hard to trace what goes where without putting in more effort than I can comfortably afford. Something for another day, perhaps. ‑‑ Eiríkr Útlendi │Tala við mig 17:52, 30 January 2020 (UTC)
- @Dine2016, re: the header-heavy layout, agreed that it's sub-optimal. With a few minor quibbles, I quite prefer some version of the mock-ups you've pulled together in the past. Collapsing nouns and する (suru) verbs to a single header, for instance, is pretty much a no-brainer, but one that contravenes the dicta of WT:ELE. I ran into broader-community resistance to moving away from WT:ELE when we were developing the layout and templates for romanized Japanese entries -- technically speaking, it makes no sense to me to require anything more than the
- @Dine2016, for single-kanji entries, if there is any POS content at all, there should be an
- In this case the
make the pronunciation section required and more prominent
After reformatting the headword lines, the next thing I want to do is give {{ja-pron}}
a gray background.
As shown in the previous section, headers do a poor job of showing hierarchy. When browsing a multiple-etymology entry, one can easily find oneself lost in an ocean of headers which are all aligned to the left and don't differ much in font size, and have to look forward for headword lines to determine which word is being described. By contrast, Chinese entries have a more regular rhythm because the pronunciation section is central and has a gray background, so one can easily tell the next word from the previous.
Japanese | Chinese | |
---|---|---|
Etymology 1 Pronunciation Noun Etymology 2 Pronunciation Noun Etymology 3 Noun |
Etymology 1 Pronunciation Noun Etymology 2 Pronunciation Noun Etymology 3 Pronunciation Noun |
It is easy to edit {{ja-pron}}
to adopt the Chinese format, but I'd like to take the opportunity to propose a more regular entry layout as well. For example, etymology sections that lack a pronunciation section should be supplied one, and Osaka accents and usage notes outside {{ja-pron}}
should be either incorporated or have the bullets removed. Most importantly, we can move the historical kana from the headword lines to {{ja-pron}}
, whose primary role would be to identify the current word, by modern pronunciation or by historical spelling (which still embodies the modern pronunciation if you follow the rules).
What do you think of this approach?
(Notifying Eirikr, TAKASUGI Shinji, Nibiko, Atitarev, Suzukaze-c, Poketalker, Cnilep, Britannic124, Marlin Setia1, AstroVulpes, Tsukuyone, Aogaeru4, Huhu9001, 荒巻モロゾフ, Mellohi!): --Dine2016 (talk) 07:14, 31 January 2020 (UTC)
- @Dine2016: Whatever you're doing, you're doing well. Thanks for the effort and sorry for not always participating. I'd like to have any alternative kana (including the historical) and transliterations to move to the pronunication section, which was already discussed, I think and I'd like to propose discouraging and even banning multi-word rōmaji, delinking any rōmaji, which have any space in them whereever they appear. Why do we need entries such as seishin bunretsu byō and why this entry 精神分裂病 (seishin bunretsu byō) should expose it (have a hyperlink)?--Anatoli T. (обсудить/вклад) 07:24, 31 January 2020 (UTC)
- @Metaknowledge, KevinUp -- Huhu9001 (talk) 08:36, 31 January 2020 (UTC)
- If I can be honest, I'm not a huge fan of the grey box around
{{zh-pron}}
. It definitely aids navigation, but I have personally concluded that allowing the text to flow freely is superior. (Perhaps we could highlight Etymology headers with CSS for all languages?) —Suzukaze-c◇◇ 09:29, 31 January 2020 (UTC) - I like these ideas, and I also support Anatoli's suggestions. I deeply appreciate the efforts to make the Japanese sections less of a mess, although if we're going to be completely subsuming Old Japanese and including those pronunciations in the Japanese L2, I think we also need to consider things like giving the (usually multitudinous) attested OJ spellings. —Μετάknowledgediscuss/deeds 17:43, 31 January 2020 (UTC)
- If I can be honest, I'm not a huge fan of the grey box around
- @Metaknowledge, KevinUp -- Huhu9001 (talk) 08:36, 31 January 2020 (UTC)
- @Dine2016, broad agreement from me. I'm not 100% sure I like the shading, but I'm fully supportive of making the pronunciation more prominent. This is much more of an issue for JA terms than for EN, given the possible wide profusion of unrelated pronunciations for a given headword spelling, such as clearly observable at Japanese 柄 with its 9 etymologies and 8 distinct pronunciations. ‑‑ Eiríkr Útlendi │Tala við mig 22:02, 31 January 2020 (UTC)
- @Atitarev, I agree that spelling information should be handled differently. However, spellings -- kanji and kana both -- are not really "pronunciation" information, so I disagree with putting spellings there. I also don't understand your opposition to romanizations that include whitespace -- if the JA term is clearly a multi-word term, I don't understand why the romanization would not similarly clearly indicate word barriers, in conformance with Latin-alphabet formatting practices across many languages of putting a whitespace between words. Or perhaps I'm misunderstanding, and you're instead just opposed to the existence of multi-word romanized entries? ‑‑ Eiríkr Útlendi │Tala við mig 22:02, 31 January 2020 (UTC)
- @Eirikr: All sorts of transliterations and IPA are nicely implemented in the Chinese model, which I really like.
- Yes, I oppose multi-word romanisation entries, they are no longer needed (not really) as a disambiguation tool (also applies to Mandarin pinyin entries). The disambiguation and a large number of homophones is the main reason why we have romanisation entries in the first place. —Anatoli T. (обсудить/вклад) 23:10, 31 January 2020 (UTC)
- @Atitarev, aha, thank you for clarifying. The MW search features have improved over the years, and romanized pages might not be needed anymore for users without IME or other Japanese text entry methods. So long as users can search for a romanized form and still find the lemma page reasonably easily, I would support the removal of romanized entries. But note the precondition :). ‑‑ Eiríkr Útlendi │Tala við mig 23:24, 31 January 2020 (UTC)
- @Eirikr: "so I disagree with putting spellings there": Do you mean that modern kana and romaji should not be in the pronunciation section? If so, what about something like this? --Dine2016 (talk) 03:06, 1 February 2020 (UTC)
- @Dine2016, Eirikr: Displaying more than one transliteration or respelling, altenative form calls for moving it outside the header. If you're familiar with Chinese pronunciation sections, please also take a look Romanization at Korean 심란하다 (simnanhada), Burmese အာမခံသေတ္တာ (ama.hkamsetta), Thai นิวเคลียส (niu-klíias) entries. Nothing new here.
- @Eirikr: I'm fine with the precondition to consider the entire removal of romanisation entries but, as a first step, it would be good to eliminate normalisation of entries as full-blown romanised phrases, sentences and expressions, which are currently out of proportion in rōmaji and pinyin entries. I mean 中華民族偉大復興/中华民族伟大复兴 (Zhōnghuá mínzú wěidà fùxīng) and 阿耨多羅三藐三菩提 (anokutara sanmyaku sanbodai) are probably idiomatic but what the hell are we doing with entries like [[Zhōnghuá mínzú wěidà fùxīng]] (pinyin) or [[anokutara sanmyaku sanbodai]] (rōmaji)? --Anatoli T. (обсудить/вклад) 04:25, 1 February 2020 (UTC)
- @Eirikr: "so I disagree with putting spellings there": Do you mean that modern kana and romaji should not be in the pronunciation section? If so, what about something like this? --Dine2016 (talk) 03:06, 1 February 2020 (UTC)
- @Atitarev, aha, thank you for clarifying. The MW search features have improved over the years, and romanized pages might not be needed anymore for users without IME or other Japanese text entry methods. So long as users can search for a romanized form and still find the lemma page reasonably easily, I would support the removal of romanized entries. But note the precondition :). ‑‑ Eiríkr Útlendi │Tala við mig 23:24, 31 January 2020 (UTC)
- @Eirikr: Or do you mean that modern and historical kana should be in the morpheme templates, but all sorts of romaji (and phonetic kana) can be in the pronunciation section? --Dine2016 (talk) 07:39, 3 February 2020 (UTC)
- @Dine2016: My main point is that spelling and pronunciation are orthogonal. They're related, but separate phenomena, and I think we need to treat them separately. Your mock-up looked quite good to me. I might suggest tweaks, like showing our standard modified Hepburn by default, and only showing the others if the user opts to expand, but on the whole I like it -- it cleanly separates the written forms (kanji, kana, romanizations) from the spoken forms (the pronunciation).
- I hope that helps clarify my position. ‑‑ Eiríkr Útlendi │Tala við mig 20:32, 3 February 2020 (UTC)
@Eirikr: As Atitarev (talk • contribs) pointed out, romanizations are not spellings, and other languages put transcriptions and transliterations in the pronunciation section as well even when they depend on the spelling. Mixing Japanese and Latin script in the morpheme/forms box is bad layout, moving the Latin script to{{ja-pron}}
and placing it along the IPA is much cleaner. --Dine2016 (talk) 06:13, 5 February 2020 (UTC)- @Dine2016: There are also some other languages that do not put translit. in the pronunciation section. θεός#Greek; إله#Arabic; देव#Hindi. -- Huhu9001 (talk) 10:22, 5 February 2020 (UTC)
- @Huhu9001: Ah, yes, I should have said "some other languages" instead. On the other hand, Greek, Arabic, and Hindi entries offer only one kind of romanization, so the romanization is simply put in the headword line. When there are multiple kinds of romanizations displayed on the page, is there any place for them if not the pronunciation section? --Dine2016 (talk) 10:27, 5 February 2020 (UTC)
- @Dine2016:
- At its core, Japanese is a bit different from most other languages: a single written form in Japanese may have multiple unrelated spoken forms, each with its own meanings and derivations. The spoken and written forms are largely independent, to a degree we don't see even in English with its otherwise wide potential disparity between the spelling and the pronunciation. And the spoken form of a Japanese term may have multiple unrelated written forms as well, again each with its own meanings and derivations. This allows for a truly broad, flexible, and powerful degree of expression -- and it also presents some (I believe) unique lexicographical challenges. Any given Japanese term as we record it here effectively represents a node in a two-dimensional array or matrix, where one dimension or axis is the written form, and the other is the spoken form. Some terms are simple, with only one spoken and one written form, but others can be quite complex. Which axis we focus on reveals different relations.
- Regardless of the status of romanizations as "spellings", official, or unofficial, romanizations are clearly written forms, not spoken forms. Pronunciation is, by definition, about the spoken language. As such, putting details about written forms in a section that is specifically about spoken forms seems like incorrect organization of data. If we are to separate out other written-form details into its own section -- which I think is a good idea, and which I think is mostly achieved in your mock-up, which collects written forms into a discrete section -- then I think we should put romanizations there as well -- as, indeed, your mock up does. I'm a bit confused now that you appear to be arguing to instead put romanizations in the pronunciation section? ‑‑ Eiríkr Útlendi │Tala við mig 18:49, 5 February 2020 (UTC)
- (Well, hyphenation isn't always related to pronunciation either... —Suzukaze-c◇◇ 20:47, 5 February 2020 (UTC))
- @Suzukaze-c: No argument there. How is this relevant? ‑‑ Eiríkr Útlendi │Tala við mig 21:13, 5 February 2020 (UTC)
@Eirikr: Other languages put spelling information in the pronunciation section as well. Even English has hyphenation info in the pronunciation section. Japanese is not really different from English. The reason it appears to be more complex is because some editors consider a word to be its lemma spelling, instead of the sum of all possible kanji and kana. So they write the entry in a way that depends on the lemma spelling. For example, the lexical item oru "to be, to exist" was originally lemmatized at 居る, so the original editor considered it to be 居る, and added |yomi=k and talked about its iru reading. This clearly went wrong when it was moved to おる. If we instead considered 居る to be a representation of the word { spoken form: oru, written form: [居る, おる], meaning: 'to exist, to be' }, then we wouldn't have had |yomi=k or talked about its iru reading, because these are properties of one of the spellings of the word, not the word itself. In light of this, Japanese 居る is not really different from English maneuver, and we can put spelling information in the pronunciation section. --Dine2016 (talk) 02:59, 6 February 2020 (UTC)- @Dine2016: There is still a distinction between speech and writing information even if you treat the word as a json. @Eirikr: Perhaps try to use a single template like
{{ja-info}}
in wikitext, which creates two tables, one for pronunciation, and one for spelling? -- Huhu9001 (talk) 07:41, 6 February 2020 (UTC)My point is that there is no harm in putting both spelling and pronunciation info in the pronunciation section. Other languages are doing it as well. English. Chinese. Korean. Burmese. Thai. --Dine2016 (talk) 08:56, 6 February 2020 (UTC)- Personally I don't like the t:zh-pron style. Everything is jammed in a small box. Homophones should have been important information of Chinese words, but now it is collapsed and buried in it. -- Huhu9001 (talk) 09:38, 6 February 2020 (UTC)
- (just a note: Wyang is responsible for zh, ko, my, th, kh, bo. my used to have extra romanizations in the headword until
{{my-IPA}}
was written —Suzukaze-c◇◇ 15:59, 6 February 2020 (UTC))- And back then, Burmese headword lines were absurdly long and hard to parse by eye. I think Wyang was right on this issue in general, because I find those entries much easier to use now. —Μετάknowledgediscuss/deeds 16:17, 6 February 2020 (UTC)
- No value judgements; just a reminder that a moderately common practice was implemented by one person in multiple places. —Suzukaze-c◇◇ 20:35, 6 February 2020 (UTC)
- And back then, Burmese headword lines were absurdly long and hard to parse by eye. I think Wyang was right on this issue in general, because I find those entries much easier to use now. —Μετάknowledgediscuss/deeds 16:17, 6 February 2020 (UTC)
- (just a note: Wyang is responsible for zh, ko, my, th, kh, bo. my used to have extra romanizations in the headword until
- Personally I don't like the t:zh-pron style. Everything is jammed in a small box. Homophones should have been important information of Chinese words, but now it is collapsed and buried in it. -- Huhu9001 (talk) 09:38, 6 February 2020 (UTC)
- @Dine2016: There is still a distinction between speech and writing information even if you treat the word as a json. @Eirikr: Perhaps try to use a single template like
- @Suzukaze-c: No argument there. How is this relevant? ‑‑ Eiríkr Útlendi │Tala við mig 21:13, 5 February 2020 (UTC)
- (Well, hyphenation isn't always related to pronunciation either... —Suzukaze-c◇◇ 20:47, 5 February 2020 (UTC))
- @Huhu9001: Ah, yes, I should have said "some other languages" instead. On the other hand, Greek, Arabic, and Hindi entries offer only one kind of romanization, so the romanization is simply put in the headword line. When there are multiple kinds of romanizations displayed on the page, is there any place for them if not the pronunciation section? --Dine2016 (talk) 10:27, 5 February 2020 (UTC)
- @Dine2016: There are also some other languages that do not put translit. in the pronunciation section. θεός#Greek; إله#Arabic; देव#Hindi. -- Huhu9001 (talk) 10:22, 5 February 2020 (UTC)
- @Eirikr: Or do you mean that modern and historical kana should be in the morpheme templates, but all sorts of romaji (and phonetic kana) can be in the pronunciation section? --Dine2016 (talk) 07:39, 3 February 2020 (UTC)
- I withdraw what I said
learn; study | school | |
---|---|---|
hiraganaModern hiragana: がっこう | がく→がっ | こう |
historical hiragana: がくかう, がつかう | がく→がつ | かう |
kanjishinjitai: 学校 | ||
kyūjitai: 學校 | ||
rōmajiHepburn romanization: gakkō | gaku > gak | kō |
Waapuro Hepburn: gakkou | gaku > gak | kou |
Kunrei-shiki / Nihon-shiki: gakkô | gaku > gak | kô |
I think my Chinese background has made me biased against romaji. It seems that many westerners start learning Japanese with romaji, and mastering the kana script is a great challenge to them (unlike Chinese learners who start out with kana and kanji+furigana directly). For them, romaji is the Japanese script at an early stage, even if it is not an official Japanese script. Maybe we should incorporate the romanizations into the morpheme template, to enable such learners to identify their morphemes without bothering to learn kana.
@Eirikr, Atitarev, Huhu9001, Suzukaze-c (Please correct me if I'm wrong about western learners.) --Dine2016 (talk) 12:15, 8 February 2020 (UTC)
- @Dine2016: What parameters do we need for this table? -- Huhu9001 (talk) 12:35, 8 February 2020 (UTC)
- @Huhu9001: This is just a proof of concept. I don't know how to code it either. --Dine2016 (talk) 13:49, 8 February 2020 (UTC)
- I was worrying about whether this table needs a lot of information to create, rendering the wikitext overly complicated. We have some entries like 立てば芍薬座れば牡丹歩く姿は百合の花. -- Huhu9001 (talk) 14:12, 8 February 2020 (UTC)
- @Huhu9001: This is just a proof of concept. I don't know how to code it either. --Dine2016 (talk) 13:49, 8 February 2020 (UTC)
- Also, since you are advocating a morpheme template, rather than a kanji one, how will it present words like 行(き)止(ま)り? -- Huhu9001 (talk) 14:12, 8 February 2020 (UTC)
- @Dine2016: I maintain that rōmaji is not the Japanese writing system (it's kanji, hiragana and katakana), even if it's used for transcriptions, also by native speakers. The comparison with Hanyu pinyin is not 100% but very close. --Anatoli T. (обсудить/вклад) 23:30, 9 February 2020 (UTC)
- And yet, it exists. :)
- @Anatoli, if you're advocating for the wholescale removal of romaji from Japanese entries, or even just advocating against the inclusion of alternative romanization schemes (other than the modified Hepburn we've used here for some time), I cannot agree. If instead you are advocating for the removal of romaji-only entries, I can support that, as above. That said, your position here is a bit unclear?
- English-speaking (or at least English-reading) learners of Japanese will almost always be taught romaji before any other writing. Given the wide use of romaji, I think we would do our users a grave disservice to exclude romaji from our entries. I have also seen enough confusion over the years with regard to different romanization systems that I believe it would be useful to include the different spellings from different romanization schemes within a single entry, at the bare minimum to help with discoverability and searching. ‑‑ Eiríkr Útlendi │Tala við mig 16:36, 10 February 2020 (UTC)
- @Eirikr: You misunderstood. I don't suggest to remove romaji from entries. I only suggest to delink multiword romaji. We shouldn't have entries for whole phrases in romaji, individual components will suffice for disambiguation. The entry お誕生日おめでとうございます has a red link to o-tanjōbi omedetō gozaimasu and we have otanjōbi omedetō gozaimasu, which can be broken up into component romaji for linking. --Anatoli T. (обсудить/вклад) 18:57, 10 February 2020 (UTC)
- @Anatoli, thank you. As clarified, I second this proposal. ‑‑ Eiríkr Útlendi │Tala við mig 19:48, 10 February 2020 (UTC)
- @Eirikr: You misunderstood. I don't suggest to remove romaji from entries. I only suggest to delink multiword romaji. We shouldn't have entries for whole phrases in romaji, individual components will suffice for disambiguation. The entry お誕生日おめでとうございます has a red link to o-tanjōbi omedetō gozaimasu and we have otanjōbi omedetō gozaimasu, which can be broken up into component romaji for linking. --Anatoli T. (обсудить/вклад) 18:57, 10 February 2020 (UTC)
- @Dine2016: I maintain that rōmaji is not the Japanese writing system (it's kanji, hiragana and katakana), even if it's used for transcriptions, also by native speakers. The comparison with Hanyu pinyin is not 100% but very close. --Anatoli T. (обсудить/вклад) 23:30, 9 February 2020 (UTC)
- @Eirikr: Question: Where should Portuguese spellings (e.g. 親切 → xinxet) from the Christian materials be placed? I think it's bad if "xinxet" is placed in the forms/morpheme template while its reconstructions is placed in the pronunciation section. At least a copy of "xinxet" should be in the pronunciation section for reference. --Dine2016 (talk) 10:12, 11 February 2020 (UTC)
- @Dine2016: Interesting question.
- Note that these are not just from religious materials -- see also the Nippo Jisho, particularly the one on Google Books.
- There are important differences between the Portuguese references and modern, not just in orthography. The Portuguese texts record a continued distinction of /ɔː/ (from older /au/) versus /oː/ (from /oo/ or /ou/). The presence of ⟨xe⟩ in the Portuguese also points towards /ɕe/ where we have modern /se/, indicating a greater degree of palatalization in the past. Things like the ⟨xinxet⟩ example you include also point towards a kind of final consonant without a following vowel, something that is absolutely ruled out for everything but /ɴ/ and /s/ in modern "standard" Japanese.
- Of the several challenges presented by these historical texts, one in particular is not knowing very well which dialect of Japanese they recorded. We know that modern dialects can have quite a range of differences in phonology and construction. Did the Portuguese record the "standard" Japanese of the time, as used by the central government? Or was this more reflective of the local dialect of the regions in which the Portuguese were active, mainly in the southwest around Nagasaki? I'm not sure.
- Back to your question of "where", I'd suggest perhaps somewhere close to the historical kana, considering that the Portuguese sources are a kind of historical romaji. It would probably also be a good idea to link through from the middle-Japanese Portuguese romaji through to an appropriate section at WT:AJA, which would explain the romanization scheme and the historical context.
- Since these generally record the Middle Japanese pronunciation of (probably) the Kyūshū region of around 1600, I'm not sure they're immediately germane to the modern language, so they probably shouldn't be shown by default, and much like the historical kana and the alternative non-Hepburn romanizations, they should probably only be shown after the user deliberately expands something. ‑‑ Eiríkr Útlendi │Tala við mig 18:14, 11 February 2020 (UTC)
Proto-Japonic terms derived from Indo-European languages
Why does such a provocative, yet disappointingly empty category such as this exist? カモイ (talk) 07:11, 2 March 2020 (UTC)
- @カモイ: Unknown. One option to learn more would be to check the Category page's History tab to see who created it, and contact them directly. ‑‑ Eiríkr Útlendi │Tala við mig 22:12, 2 March 2020 (UTC)
- I see, thanks, I will do that. カモイ (talk) 23:18, 2 March 2020 (UTC)
- @カモイ, FWIW, I doubt any such terms have been identified. There are Japanese terms that derive from PIE, but only as borrowings, things like 蜜 (mitsu, “honey”) from Chinese from Tocharian, cognate with English mead; or 瓦 (kawara, “roof tile”), from Sanskrit कपाल (kapāla, “skull; any flat bone; cover, covering”), cognate with English head. But again, these are definitely borrowings, as opposed to native terms with roots that stretch clear back to any presumed relationship with PIE. ‑‑ Eiríkr Útlendi │Tala við mig 22:26, 3 March 2020 (UTC)
- I see, thanks, I will do that. カモイ (talk) 23:18, 2 March 2020 (UTC)
Kyūjitai normalized to shinjitai in Unicode and {{ja-kanji forms}}
62 kyūjitai are normalized to shinjitai in Unicode. MediaWiki also performs this normalization for kyūjitai literally included in source text, but not for HTML entities:
禍
is automatically normalized to禍
and stored only in the latter form in source text of given page.禍
is displayed as禍
.
Template:ja-kanji forms says k=y
should be used, resulting in using Korean font, but it is not working.
E.g. 禍 page currently contains:
{{ja-kanji forms|k=y}}
is displayed as a table with minimally bigger shinjitai 禍 falsely labelled as kyūjitai.{{ja-kanji|grade=c|rs=示09|kyu=禍}}
is correctly displayed as:
(common “Jōyō” kanji, shinjitai kanji, kyūjitai form 禍)
{{ko-hanja|hangeul=화|eumhun=|rv=hwa|mr=hwa|y=hwa}}
is correctly displayed as:
禍 • (hwa) (hangeul 화, revised hwa, McCune–Reischauer hwa, Yale hwa)
I think that HTML entities would be better solution than using Korean font, because characters remain in intended graphical form after copy-pasting elsewhere.
{{ja-kanji forms|禍|禍}}
does not work, it generates single&
as kyūjitai.{{ja-kanji forms|禍|禍}}
does not work, it generates shinjitai as kyūjitai.- However both
{{ja-kanji forms|禍|[[禍]]}}
and{{ja-kanji forms|禍|[[禍]]}}
appear to work, they generate correct kyūjitai without link. (Shinjitai in first argument optionally can be enclosed in[[ ]]
with no change of behavior.)
Can recommended usage of this template be changed to use HTML entities for these kyūjitai? (Personally I prefer hexadecimal notation.) Arfrever (talk) 17:44, 17 June 2020 (UTC)
- @Arfrever:
{{ja-kanji forms}}
is entirely reliant on your computer having the proper fonts. If you don't have a Korean font, you see Japanese; if you don't have a Japanese font, you see Chinese; if you don't have a Chinese font you see rectangular boxes.- As for links to ampersand, I speculate that Module:links (the backend code for
{{l-self}}
) is recognizing#xFA52;
as an HTML anchor (similar to foobar#English).
- —Suzukaze-c (talk) 01:33, 31 July 2020 (UTC)
Arabic numeral alternative forms
Is there a standard for whether or how to create an entry for alternative forms of terms using Arabic numerals rather than kanji? For example, an entry for 6ヶ月 that redirects to 六ヶ月? It looks like there are currently separate entries for 6月 and 六月 but I don't think that's the right approach here. Noktulo (talk) 17:59, 5 July 2020 (UTC)
I also thought about maybe using the ja-see template but this note made me think it might not be the right answer either: "Please use this template for alternative spellings in the Japanese script only. Alternative sound forms (e.g. 追っ払う) and alternative spellings in other scripts (e.g. H#Japanese) currently still use the old approach." Noktulo (talk) 18:06, 5 July 2020 (UTC)
- @Noktulo: IMO, redirects from 6月→六月 would be great, and
{{ja-see}}
is acceptable. —Suzukaze-c (talk) 01:29, 31 July 2020 (UTC)- @Suzukaze-c: Would it make more sense to direct 六月 to 6月 since 6月 is the more common modern usage? Noktulo (talk) 16:07, 7 August 2020 (UTC)
- @Noktulo: Sure. —Suzukaze-c (talk) 19:51, 7 August 2020 (UTC)
- @Suzukaze-c: Would it make more sense to direct 六月 to 6月 since 6月 is the more common modern usage? Noktulo (talk) 16:07, 7 August 2020 (UTC)
○○感: sum of parts?
Such as 満足感, 疎外感 (red link), 喪失感 (red link). Special:Search/intitle:感 "japanese lemmas" intitle:/..感/. —Suzukaze-c (talk) 01:26, 31 July 2020 (UTC)
bot cleanup work
- Discussion moved from User talk:Suzukaze-c.
{{ja-adj}}
currently accepts |infl=i
, |infl=い
, |decl=i
, |decl=い
, and the like. Would it be a good idea to stick to a single format (say |infl=i
), and eliminate the rest? Similarly, I think {{ja-pron}}
need only one parameter for devoiced positions (preferably by morae).
(That said, the biggest difficulty of parsing Wiktionary data for reuse is distinguishing between different lexical items in a single entry. They are sometimes divided by Etymology headers, sometimes by Pronunciation, sometimes both, so PoS headers may be on L3, L4 or L5. If we used page titles in the format "KANJI/KANA", most entries would contain only one lexical item, and the task would be significantly easier.) --Nyarukoseijin (talk) 10:07, 19 May 2020 (UTC)
- infl: [4]
- ja-pron: Template talk:ja-pron ctrl-f
devm
. - headers: Wiktionary:Beer parlour/2019/March#Eliminating the difference in formatting between no-etymology, single-etymology and multiple-etymology entries
- ( ;∀;)
- page titles: True, but at the same time I worry that "choosing" a kanji spelling would be troublesome/controversial.
- ( ;∀;)
- —Suzukaze-c◇◇ 04:29, 20 May 2020 (UTC)
- infl: also
text = replace(text, 'infl=い', 'infl=i')
- ja-pron: convert existing
|dev=
to|devm=
, thenif dev ~= "" then error(...) end
- headers: Thanks, I wasn't aware of that discussion. But I prefer something like
==English (1)==
(which would render empty Etymology headers unnecessary). - page titles: choose the first kanji spelling listed in say Daijirin
- Would it be a good idea to start cleanup work now, one kind of change at a time (say
|infl=i
), instead of waiting until everyone agrees on the final format? --Nyarukoseijin (talk) 11:02, 20 May 2020 (UTC)- Well, it would be most straightforward to remove all inflectional types from
{{ja-adj}}
since they're already in the Inflection section. If{{ja-adj}}
needs them it can just transclude the whole entry and examine the Inflection section. - Maybe the most useful thing at the moment is to improve the inflectional templates. It's better to use a unified template (e.g.
{{ja-infl|i}}
instead of{{ja-i}}
) since it allows easy extension (e.g.{{ja-infl|no}}
,{{ja-infl|nari|tari}}
). The reading should be given in non-truncated form (e.g. むずかしい over むずかし) if given at all. Also it would be better to give the traditional and JFL paradigms separately -- the traditional paradigm can be extended for godan verbs (two kinds of 未然形, two kinds of 連用形, 可能動詞) and show 接続 info, the JFL paradigm should put the ます form first, etc. --Nyarukoseijin (talk) 11:48, 20 May 2020 (UTC)
- Well, it would be most straightforward to remove all inflectional types from
- Re: I worry that "choosing" a kanji spelling would be troublesome/controversial. -- why choose? The reader would go to 付く or 着く or 就く or 即く or 憑く. The data would live at the
[[KANJI/KANA]]
page, while the reader-facing pages would present the data.
- I don't think we can expect readers to understand that they need to go to any
[[KANJI/KANA]]
page. - (@Nyarukoseijin, correct me if I've misunderstood your idea.)
- Re: adjective inflection templates, I like the idea of unifying into
{{ja-infl}}
. I confess I don't understand your distinction between "traditional" and "JFL", and I don't even know for sure what JFL means, so I don't quite know what to think of that part. - Re: entry structure in general, I've never been a fan of Wiktionary's insistence on different structure -- sub-Etym headings starting at L3 for single-etym entries, but at L4 for multi-etym entries. Unnecessarily complicated and confusing, and now biting us in the ass as we try to automate more. While any
[[KANJI/KANA]]
page would likely only have one etymology, I can think of rare cases where a single such kanji + kana combo might have multiple etymologies. There's also the issue of what happens when such an entry might be transcluded into a bigger one, such as if we try to replicate monolingual electronic dictionary behavior, where the user goes to つく and gets everything that's read as つく. Suggestion: 1) either suck it up and we deal with differing header levels and structures in the Lua code, or 2) ideally, but also less likely to get traction with the greater EN WT editor community, strike up a thread at WT:GP or WT:BP etc. and propose that we always start sub-etym headers at L4, and reduce this unnecessary variation in header levels. I suppose 2.5) we could propose this structure just for JA entries. - Re:
===Pronunciation 1,2,3...===
as the top-level entry header, that's something I experimented with for kana-spelling entries, where a reader might look something up based on something they heard but don't know how to spell. If the reader knows that it's, say, はし[2] and not はし[0], they want to be able to quickly and easily find the info for はし[2]. Since the old soft-redirect only listed lemmata with no other info (sometimes a short gloss), and since even{{ja-see}}
currently shows only the glosses but not pitch, the reader can't see which はし entry is for はし[2] unless they click through to each lemma entry to find the right one. This is bad usability.
- However, with
{{ja-see}}
, this could be fixed by tweaking what the template shows to include pitch accent info. Also, using===Pronunciation===
at the top of the entry sounds more and more like it is untenable, and causing more complexity than it's worth. - I'm okay with this going away (i.e. keeping Pron always under Etym), particularly if
{{ja-see}}
could be updated. - ‑‑ Eiríkr Útlendi │Tala við mig 18:56, 20 May 2020 (UTC)
- @Eirikr:
- choosing kanji spelling: You got it! The title and even the format of the data-holding page is irrelevant, since Lua is powerful enough to transform it to standard entry layout on reader-facing pages. The Chinese editors implemented the idea long ago (see for example Module:zh/data/dial-syn/我們 as well as 我們, 我等, 吾等, etc.).
- inflection templates: "traditional" = 学校文法, "JFL" = Japanese as a foreign language = 日本語教育文法, so the traditional paradigm is 未然形, 連用形, ... and the JFL paradigm is ます形, て形, .... We have "Stem forms" and "Key constructions" which roughly correspond to the two, but it's better to rearrange the latter in the JFL order. The former can be extended like
- @Eirikr:
- infl: also
未然形 書か(~ない,~れる,~せる,~ず) 書こ(~う) 連用形 書き(~ます,~たい,…) 書い(~て,~た,~たり) … 可能動詞 書ける(一段) // note that this is not derived from 仮定形!
- entry structure: The so-called "single-etym entries" and "multi-etym entries" are single entries and multiple entries with homonym numbers in professional dictionaries like OED. I don't see why Wiktionary should be different. Hence my proposal of
==English 1==
. Japanese can use a similar approach. For example, if we redirect 辨 to弁/べん/1
, the effect is to fetch==Japanese 1==
from 弁/べん. Or we can use special page titles like 弁/べん/1 or 辨/べん for those rare cases. - pronunciation: Verb ren'yokei and the derived noun may have different accents, such as 休み[2] (verb) and 休み[3] (noun). But one can always use
|accn_note=
and do away with===Pronunciation 1===
etc. --Nyarukoseijin (talk) 05:30, 21 May 2020 (UTC)
- entry structure: The so-called "single-etym entries" and "multi-etym entries" are single entries and multiple entries with homonym numbers in professional dictionaries like OED. I don't see why Wiktionary should be different. Hence my proposal of
- @Suzukaze-c, Nyarukoseijin -- Ah, it occurred to me today one other case where
===Pronunciation===
might want to come before===Etymology===
-- cases where there is one pronunciation that applies to multiple etymologies. See セント (sento), for instance. The only way to keep===Etymology===
at the top here is to duplicate the pronunciation information, which seems ... inelegant, maybe even sloppy. ‑‑ Eiríkr Útlendi │Tala við mig 17:37, 27 May 2020 (UTC)- I'm afraid I can't agree. How would you add an acc_ref that applies to only one of the homonyms (say a Kyoto accent dictionary listing only "cent")? The two pronunciations should be maintained separately even though they are identical at present. --Nyarukoseijin (talk) 18:55, 27 May 2020 (UTC)
- Agreed. —Suzukaze-c◇◇ 19:45, 27 May 2020 (UTC)
- If you all don't mind data duplication, then by all means please tweak the structure at セント accordingly. I'll happily follow suit going forward. ‑‑ Eiríkr Útlendi │Tala við mig 00:19, 28 May 2020 (UTC)
- Agreed. —Suzukaze-c◇◇ 19:45, 27 May 2020 (UTC)
- I'm afraid I can't agree. How would you add an acc_ref that applies to only one of the homonyms (say a Kyoto accent dictionary listing only "cent")? The two pronunciations should be maintained separately even though they are identical at present. --Nyarukoseijin (talk) 18:55, 27 May 2020 (UTC)
- Regarding
[[KANJI/KANA]]
page titles, for better support for classical verbs, maybe it would be better to include stem in addition to (or instead of) kana. (Stem here refers to real stem, not 連用形 (called "stem" in output of Template:ja-verb) or "stem forms" (seen in conjugation tables (Template:ja-ichi etc.)).) - There are pairs of classical verbs, which have different stems and identical 終止形. Examples:
- ー
- ー
- ー
- So my suggestion (at least for verbs) would be
[[STEM/KANJI]]
or something similar. - (E.g.
[[mit-/満つ]]
,[[ak-/開く]]
) - Arfrever (talk) 20:10, 17 July 2020 (UTC)
- @Arfrever: Hmm, a couple thoughts.
- The "stems" as output by
{{ja-verb}}
are not just the 連用形, instead they are the 形 of the verb minus suffixes, and correspond to what most educational materials in English describe: the portion of the verb onto which other conjugational endings attach. I would be resistant to changing this terminology in our output, not least as we have used this for years and we don't want to confuse people. - For what you describe using Latin characters, I'm more used to encountering the term "root". This is the unchanging portion of the verb, which is always found in all conjugated forms (except for the cases of 長音 and 促音, such as for ~て and ~た conjugations, which can be viewed as sound shifts masking the actual verb root).
- Classical verbs 開く (transitive) and 伸ぶ (intransitive) have roots ak- and nob- respectively. The roots ake- and nobi- are only for the modern forms. We can see that simply by the existence of the terminal forms aku and nobu, which are clearly not derived from any such roots ake- or nobi-. See also the conjugation tables at w:ja:下二段活用 and w:ja:上二段活用.
- The "stems" as output by
- Considering that the terminal forms for classical 下二段 and 上二段 verbs already differ from the modern terminal forms, these entries would already be separate under the proposed new schema, without using any Latin characters. The
KANA/KANJI
scheme is also easier to understand, as one is simply the kana rendering (which matches pronunciation), and the other is the "fuller" spelling as expected in formal writing. - The main problem the new schema is intended to solve is the question of how to locate 和語 entries in such a way that is 1) unambiguous and 2) easy to identify. Under both the new schema above with entries located at
KANA/KANJI
, and at your proposed addressing scheme ofROOT-/KANJI
, certain entries would include both modern and classical terms. Unless we want to split classical out entirely as effectively a separate language, that seems unavoidable. And given the prevalence of classical Japanese even in modern formal writing, I don't think we even want to split it out. Notably, modern Japanese comprehensive dictionaries include both (classical and modern). - The more I think about it, the more confused I actually become -- what problem are you trying to solve by adding in Latin-alphabet verb roots? ‑‑ Eiríkr Útlendi │Tala við mig 23:12, 17 July 2020 (UTC)
- @Arfrever: Hmm, a couple thoughts.
- Regarding stem versus root: I use terms root and stem in their strict senses. Roots are etymologically core parts of words without affixes, carrying semantic content. Compounds words have multiple roots, but still one stem. w:Word stem#Usage has good description of stem.
-
- 付き合う has 2 roots: tuk- and ap-, and stem tukiap-
- Old Japanese did not like adjacent vowels, so when a form of a word or a derived word was created and two vowels would be adjacent, vowel elision would occur:
- stem ake- + conclusive suffix -u → conclusive aku
- stem nobi- + conclusive suffix -u → conclusive nobu
- stem nobe- + conclusive suffix -u → conclusive nobu
- In many languages, historic phonetic changes have resulted that stem is not strict subset of lemma.
- Latin noun homō has stem homin-.
- I used Latin alphabet, because roots and stems ending with a consonant cannot be written using kana.
- Arfrever (talk) 01:32, 18 July 2020 (UTC)
- After more consideration, I think that it would be actually good to store data for classical and modern forms in the same pages, to avoid duplication of information on separate pages. My previous proposition should have been called
[[STEM/KANJI_WITH_OKURIGANA]]
and revised version of my proposition is[[STEM/KANJI]]
(where KANJI means kanji without okurigana). - Examples:
[[tuk-/付]]
would contain data for 付く (yodan/godan)[[tuke-/付]]
would contain data for classical 付く (nidan) and modern 付ける[[tukap-/使]]
would contain data for classical 使ふ (yodan) and modern 使う[[tukape-/使]]
would contain data for for classical 使ふ (nidan) and modern 使える[[tukape-/閊]]
would contain data for for classical 閊ふ (nidan) and modern 閊える[[tukape-/仕]]
would contain data for for classical 仕ふ (nidan) and modern 仕える[[tukiap-/付合]]
would contain data for for classical 付き合ふ (yodan) (if it existed in classical times) and modern 付き合う
- Arfrever (talk) 07:06, 18 July 2020 (UTC)
- @Arfrever, it is still unclear what this change is intended to achieve. The end result is effectively the same as Nyarukoseijin's
KANA/KANJI
scheme in terms of what lives at each address: yourtuk-/付
would appear to match NYarukoseijin'sつく/付く
, etc. However, your naming proposal is more confusing. ‑‑ Eiríkr Útlendi │Tala við mig 04:19, 19 July 2020 (UTC)- Another problem: Not everyone is a historical linguist or syntactician :P —Suzukaze-c (talk) 04:26, 19 July 2020 (UTC)
- @Arfrever, it is still unclear what this change is intended to achieve. The end result is effectively the same as Nyarukoseijin's
- After more consideration, I think that it would be actually good to store data for classical and modern forms in the same pages, to avoid duplication of information on separate pages. My previous proposition should have been called
Alt forms vs. deriveds
- Discussion moved from User talk:Suzukaze-c.
Re: diff --
So far as I've understood it, alt forms for JA entries have been for alternative written forms that represent the exact same word -- same etym, same pronunciation, same meanings (for at least some of the senses).
Meanwhile, パンツ一丁 (pantsu itchō) and パンイチ (pan'ichi) have the same meaning, but different etymologies and different pronunciations. The latter is also clearly a derivation (shortening) of the former, as you noted. The two cannot be swapped out one-to-one in quite the same way as other alternative forms, such as 竜・龍 (ryū, “dragon”) or 壊す・毀す (kowasu, “to break something”) or 輝く・耀く (kagayaku, “to twinkle, to sparkle”).
The addition of alternative-form functionality to {{ja-kanjitab}}
(perhaps soon to be spun off into some other template) seemed to be in part intended to do away with the additional ===Alternative forms===
header. Curious as to @Nyarukoseijin's perspective. ‑‑ Eiríkr Útlendi │Tala við mig 05:23, 10 June 2020 (UTC)
- Nyarukoseijin's
|alt=
is specifically/exclusively for alternative spellings (which would be your 3 (6?) examples), in contrast to alternative forms. —Suzukaze-c (talk) 05:39, 10 June 2020 (UTC)- Yet, an alternative form is another form of the same word. The above are two different words, albeit related. I see also that Wiktionary:Entry_layout#Alternative_forms describes a similar use case. ‑‑ Eiríkr Útlendi │Tala við mig 05:44, 10 June 2020 (UTC)
- Wiktionary:Votes/pl-2010-07/Alternative forms header abolished "Alternative spellings" apparently because people had a hard time distinguishing them in English, but it is easier in Japanese. I believe Nyarukoseijin wants to distinguish the two as well, but maybe I am remembering wrong, and I will let them clarify if they want to. —Suzukaze-c (talk) 05:50, 10 June 2020 (UTC)
- Yet, an alternative form is another form of the same word. The above are two different words, albeit related. I see also that Wiktionary:Entry_layout#Alternative_forms describes a similar use case. ‑‑ Eiríkr Útlendi │Tala við mig 05:44, 10 June 2020 (UTC)
- That depends on how you define "alternative forms". When I coded the
|alt=
parameter of{{ja-kanjitab}}
, I was influenced by the modern linguistic camp, which considered spellings as secondary to forms (defined as sound shape + meaning). So 位 and くらい are alternative spellings of a form while 位/くらい and 位/ぐらい are alternative forms of a word. Does Wiktionary define "alternative forms" in some other way? --Nyarukoseijin (talk) 06:23, 10 June 2020 (UTC)- @Suzukaze-c: Ah, yes, I wanted to establish this two-level hierarchy but the main motivation for introducing
{{ja-kanjitab|alt=...}}
was to reduce the number of keystrokes for adding alternative spellings. But{{ja-kanjitab|alt=...}}
doesn't show romaji, so when the romaji is different, use ===Alternative forms=== to show it. The two-level hierarchy comes naturally as an afterthought. --Nyarukoseijin (talk) 06:56, 10 June 2020 (UTC)
- @Suzukaze-c: Ah, yes, I wanted to establish this two-level hierarchy but the main motivation for introducing
- @Eirikr I think the same principles apply to おっぱい. IMO, Derived terms is too inspecific about the meaning (paipai and paiotsu are synonyms of oppai). Synonyms is somewhat more acceptable. Alternative forms makes a clear declaration that paipai and paiotsu are synonyms of oppai and have a shared origin with oppai. —Suzukaze-c (talk) 05:12, 15 July 2020 (UTC)
- I must disagree in a few different ways.
- paipai and paiotsu are not full synonyms of oppai. A young child using the term paiotsu would be very out of place, while a grown man bragging about his girlfriend's paipai would sound similarly odd.
- Neither are paipai or paiotsu forms of the word oppai: they do not represent different spellings, nor are they inflections, nor are they rendaku, etc.
- Both are derived from oppai, one as reduplication of the second element, the other as a verlan-style reversal of the order of syllables, a not-uncommon kind of word-mangling used to create slang.
- An alternative form must be of the same word. In this case, we have three separate words, albeit related.
- Considering that paipai and paiotsu derive from oppai, listing these under
====Derived terms====
seems appropriate. That does not preclude us from also listing them as synonyms. I'd be supportive of adding them to the{{synonyms}}
list, with proper qualifiers (such as clarifying that paiotsu is not childish). ‑‑ Eiríkr Útlendi │Tala við mig 07:24, 15 July 2020 (UTC)
- I must disagree in a few different ways.
- not full synonyms — True-ish? That is what
{{q}}
and{{lb}}
is for. The basic meaning is the same. The Daijirin (@ weblio) says "〔幼児語〕乳。また、乳房。おっぱい。" - forms/derivation — True: they are not spellings/inflections/rendaku/etc. This is where my opinion comes in: I think that a mere rearrangement/modification of the phonological form counts as a form. Like いまだ→まだ, やはり→やっぱり, よい→いい, レズビアン→レズ・ビアン.
- not full synonyms — True-ish? That is what
- —Suzukaze-c (talk) 22:01, 18 July 2020 (UTC)
Description of how endings attach
How should we describe the way endings like り#Etymology 2 attach?
- Attaches to the 已然形 of godan verbs and 未然形 of s-irregular verbs. [Or alternatively, to the 命令形 of both classes of verbs.]
- Attaches in the shape -eri to the stem of consonant-stem verbs and the s- of suru.
- Change the -u of godan verbs to -eri and suru to seri.
- Change the う-sound of godan verbs to the え-sound (and する to せ) and attach り.
The first way assumes the reader is familiar with the traditional six-活用形 system; the second assumes the linguist approach. The third and forth are free of both defects, but the former is hampered by the irregularity of romaji (tsu - u + e = te) while the fourth requires familiarity with kana and the gojuon table.
The current policy seems to favor the traditional description of Japanese grammar (#1). But if we follow the traditional grammar blindly, we will not be able, for example, to treat た and たら as distinct endings or to refer to the te form (for that's the 連用形 + 助詞 て, with 音便 changes for godan verbs) in description of auxiliaries like しまう.
What do you think? If we ever reach a consensus, it's important that someone put it in WT:AJA so that future editors can see it and our entries for auxiliaries can have some uniformity.
(Notifying Eirikr, TAKASUGI Shinji, Atitarev, Suzukaze-c, Poketalker, Cnilep, Marlin Setia1, Huhu9001, 荒巻モロゾフ, 片割れ靴下, Onionbar, Shen233, Alves9, Cpt.Guapo): --2409:894C:3C30:4E0:4DEC:8FA4:A6AF:B4DC 06:44, 28 April 2021 (UTC)
- If we can do so succinctly, I think it makes the most sense to describe using both the native Japanese approach and the English-speaker approach -- #1 above, and #3. If we decide that's too long, then just #3. A familiarity with kana forms and basic Japanese phonology is assumed as a requirement for people interested enough in Japanese terms to be reading about inflection, so I don't think we need to use #4 in a bid to avoid confusing people about つ (tsu) becoming て (te) and losing that medial /-s-/. ‑‑ Eiríkr Útlendi │Tala við mig 17:39, 28 April 2021 (UTC)
- @Eirikr: Works good. I began う#Usage notes with #3 and mentioned both #1 and #2 briefly.
- I don't suggest using this format for auxiliaries attaching to the te form, because we don't want to repeat the 音便 rules every time (like "change -ku to -ite shimau", etc.) It suffices to say "Attaches to the te form". Problem: equally many endings attach to the 連用形, for which we need a good name in "Attaches to the ...". The traditional label means "continuative stem", linguists call it the "infinitive", and English-language materials call it the "masu stem" or simply the "stem". Which one should we choose?
- One final question: Is it important to give the accent rules of endings? The problem is that Tokyo accent is sometimes unstable. For example, Samuel Martin's grammar says that volitional forms of accented verbs are always accented on the penultimate mora (ta↗be↘ru -> ta↗beyo↘o), but those of unaccented verbs can be either unaccented or accented (ka↗u -> ka↗oo or ka↗o↘o), the former is the traditional version. OJAD lists only the latter, suggesting that the traditional version is no longer current. Don't know which to follow... --2409:894C:3C24:AAB:73A8:E55:B18F:B372 15:24, 30 April 2021 (UTC)
Re-thinking the pronunciation section
(Notifying Eirikr, TAKASUGI Shinji, Atitarev, Suzukaze-c, Poketalker, Cnilep, Marlin Setia1, Huhu9001, 荒巻モロゾフ, 片割れ靴下, Onionbar, Shen233, Alves9, Cpt.Guapo): I've noticed that the pronunciation section is extremely redundant, and that the IPA transcription is not precise since doesn't indicate tonal accent. I do realise that it would be a lot of work to change everything now, but at the moment it's just very confusing and imprecise. Take for example 男の子. The first line gives the accent pattern on the hiragana spelling (besides, why not katakana, as all accent dictionaries do?), followed by the same information in rōmaji, then the name of the accent (nakadaka) and in the end the number of the mŏra where the accent falls. It's the same information repeated 4 times...! After that, a faulty IPA transcription that doesn't show the tonal accent. It would be like giving the pronunciation of an Ancient Greek word like this:
- ἄνθρωπος [ánthrṑpòs] (proparoxytone - accent on 1st syllable)
- IPA [an.tʰrɔː.pos] (--> without indicating the tonal accent on the first syllable!)
All we would need is knowing the syllable that bares the accent, and we could do that easily in the Romanization: otokónoko. That's really all we need, plus a precise IPA transcription, one that includes the accent. The Pronunciation entry should just be something like:
- IPA: [o̞to̞ko̞ꜜno̞ko̞]
Incidentally, 男の子 is one word and should not be transliterated with spaces (if it was 3 individual words it would be pronounced otoko no ko). I've seen quite a lot of miss-Romanizations here, where individual words are transliterated with spaces between components. That's something we should definitely correct.
To be honest, it would be great if all Japanese Romanization had the accent. Accent is phoneMic in Japanese, so it should always be indicated in Romanizations. The 男の子 entry should be something like this:
Noun
男の子 (otokónoko)
- a boy
Synonyms
- 少年 (shōnen)
Antonyms
- 女の子 (onnánoko)
Derived terms
- 男の娘 (otokónoko)
Related terms
What do you think? Sartma (talk) 09:48, 20 September 2021 (UTC)
- Japanese has wide accent variation by the areas, so I don't recommend annotating the accent upon Hepburn romanization. Some words have more than two patterns, even in the Tokyo type standard accent.--荒巻モロゾフ (talk) 05:49, 21 September 2021 (UTC)
- Standard Japanese has some accent variation, I wouldn't call it wide. The accentuation we're currently giving is the Standard one, and so is the Japanese given under "Japanese" here on Wiktionary, so I'm not sure I understand why only Romanizations would be a problem. You can have a double Romanization when two different pronunciations are possible, e.g.: 心 (kokóro or kokoró). Those cases would be a minority, the vast majority of entries would only have one Romanization. It would be so much easier for learners of Japanese to remember where the accent falls if it was given in the Romanization. Japanese Romanizations give the false impression that there is no accent or that accent is not important (not enough to be indicated anyway). How can 箸 (hashi)/橋 (hashi)/端 (hashi) be better than or preferable to 箸 (háshi)/橋 (hashí)/端 (hashi)? A good dictionary would indicate the accent in the Romanization, why shouldn't we do that?
- What do you think about simplifying the Pronunciation section by eliminating all redundant and irrelevant information (accent type names, tonal mora number, the transliteration of the Japanese notation) and giving just the IPA transcription? Sartma (talk) 10:26, 21 September 2021 (UTC)
- Actually, Japanese people who speak in identical accent with Tokyo is minority relatively. Even the dialects with Tokyo-type accent are not completely identical to the Tokyo dialect. The reason why commercially available dictionaries have not adopted the idea of accent marks above the Hepburn romanization is because it is not practical for actual spoken Japanese.--荒巻モロゾフ (talk) 02:33, 22 September 2021 (UTC)
- @荒巻モロゾフ: That would be an argument against indicating accents at all, but it’s clear from how Japanese entries are currently set up that we do want to give the Standard Japanese accent. Unless you mean to say that you’d prefer not to indicate accents at all? But if that’s not the case, I don’t see why not indicating them in the romanisation. As for the reason why most (but mind, not all; the best ones do indicate accents on romanisations) commercial dictionaries don’t indicate the accent on transliterations, unless you were working in their editing team when that kind of decision was taken and know for sure, I don’t see how you can be so certain that it was because of how practical it is or not for spoken Japanese. That can only be your opinion, no matter how much you dress it up as a fact. Sartma (talk) 08:24, 22 September 2021 (UTC)
- What’s your opinion about simplifying the Pronunciation section? Sartma (talk) 08:24, 22 September 2021 (UTC)
- Actually, Japanese people who speak in identical accent with Tokyo is minority relatively. Even the dialects with Tokyo-type accent are not completely identical to the Tokyo dialect. The reason why commercially available dictionaries have not adopted the idea of accent marks above the Hepburn romanization is because it is not practical for actual spoken Japanese.--荒巻モロゾフ (talk) 02:33, 22 September 2021 (UTC)
- @Sartma, I'm confused about your request, which spans multiple different points. I'll try to summarize what I've understood you're talking about, and provide my thoughts.
- Why not use katakana for pronunciation notation?
- Conversely, why would we? We use hiragana for just about all kana here since as far back as I'm aware, other than terms that are expressly and lexically written in katakana. There was no compelling reason to use katakana earlier, and now we have a lot of content already written in hiragana.
- I can't think of much benefit to changing all of that now, and I struggle to think of any concrete benefit. Making such a change seems like a lot of work for no gain.
- Why give information in so many different formats?
- In part, because it doesn't fit cleanly all in one string. Years ago when we were figuring out the formatting for the pronunciation section, we realized that
[strict phonetic notation]
includes so many different diacritics to specify vowel values, that adding in the tonal markers sometimes produced illegible results. - In part, because we are catering to different readers. We can only assume familiarity with English (as this is the English Wiktionary). Some readers (and editors) know and want kana, so we include that as well.
- In part, because it doesn't fit cleanly all in one string. Years ago when we were figuring out the formatting for the pronunciation section, we realized that
- Why not use accent markers in romanization?
- In part, because no source does this, outside of specialized academic contexts. We seek to avoid surprising or confusing our readers.
- In part, because accent markers are incompatible with the modified Hepburn romanization scheme we use here. For instance, 少年 (shōnen) is pitch pattern 0 -- there is a rise halfway through the first syllable shō, which is impossible to indicate without splitting the ⟨ō⟩ notation for the doubled /o/ vowel.
- In part, because diacritics are not used in English, and we cannot expect English readers (again, our main audience) to be able to easily input diacritics.
- In part, because adding diacritics introduces other technical problems -- hàshí and háshì would have to be two different romaji entries, rather than the standardized hashi entry that romaji users would expect.
- In part, as 荒巻モロゾフ notes, doing so would imply that Tokyo accent is the only pitch accent for all of Japanese, which is incorrect.
- In part, again as 荒巻モロゾフ notes, because there is some variation even within Tokyo speech, as evidenced by words like 若布 (wakame) or 居酒屋 (izakaya), where various sources agree that there are two valid pitch accents even in "standard" Japanese. Some terms, like 分かり切る (wakarikiru), have even more than two pitch patterns listed. All of these pitch patterns are valid, so indicating only one or the other would be incorrect. Conversely, indicating all valid pitch patterns every time we romanize is unacceptable.
- Why not romanize without whitespace?
- In part, because romanizations do not benefit from the contrast between kana and kanji, and longer terms quickly become hard to read.
- In part, because we differentiate between "terms" (integral lexical units) and "words" (the pieces from which these "terms" may be composed, which themselves can be used independently). Consider English apple tree or White House or outrigger canoe, each of which is an integral "term" composed of multiple "words".
- In part, because terms like 男の子 are clearly three words in Japanese, 男 + の + 子.
- What’s your opinion about simplifying the Pronunciation section?
- I cannot agree with your proposal: "eliminating all redundant and irrelevant information (accent type names, tonal mora number, the transliteration of the Japanese notation) and giving just the IPA transcription".
- Accent type names are useful, in that other resources use these terms, making this terminology that our readers are likely to encounter or already be familiar with.
- Tonal mora number is useful, in that other resources use this notation as well.
- The transliteration of the Japanese notation (I'm assuming you mean the pitch-marked loose phonemic IPA transcription on the same line as the kana pitch accent string) is also useful, in that this is the English Wiktionary, and we seek to make our entries accessible to an English-reading audience.
- The main changes I'd like to see are only three, at the moment:
- We should stop displaying the yomi type. This describes the reading, not the pronunciation per se. This is already better handled by the
{{ja-kanjitab}}
template, and is effectively deprecated for the{{ja-pron}}
template. - For the line with the kana + loose IPA showing the pitch, the formatting of the transcription needs cleanup. The template currently outputs using
[square brackets]
, which is incorrect notation --[square brackets]
are intended for strict phonetic transcription, whereas/slashes/
are intended for looser phonemic transcription. The letters used here should also be the IPA letters, such as ⟨ ɕ ⟩ for the initial sound of 少年, which instead is spelled using ⟨ sh ⟩. - There should be a better visual separation between the hiragara pitch accent string and the phonemic IPA pitch accent string. Perhaps a ・ nakaguro would work here?
- We should stop displaying the yomi type. This describes the reading, not the pronunciation per se. This is already better handled by the
- I cannot agree with your proposal: "eliminating all redundant and irrelevant information (accent type names, tonal mora number, the transliteration of the Japanese notation) and giving just the IPA transcription".
- Separately, in your reply to 荒巻モロゾフ ("how can you be so certain"), you appear to be arguing that dictionaries should show a single pitch accent pattern for all romanizations. This is an incorrect approach for the reasons listed above. In addition, I would like to draw your attention to the second point at Wiktionary:What_Wiktionary_is_not -- our aim with Wiktionary is to describe how words are used, not to prescribe how words should be used. Specifying a pitch accent pattern in all of our romanizations is overly specific and incorrectly prescribes pronunciation for that term.
- Overall, you appear to be proposing wide-ranging changes that would require substantial work. However, I do not see any gains commensurate with the work involved, and indeed, some of your proposed changes go against the stated mission of the Wiktionary project. ‑‑ Eiríkr Útlendi │Tala við mig 00:38, 23 September 2021 (UTC)
- @Eirikr Thanks for taking the time and for the well organised reply! My main point was the Pronunciation section, which I find confusing and hyper-redundant. The rest was a mix of observations and wishes. I'll reply to each of your points right below them. Sartma (talk) 11:21, 23 September 2021 (UTC)
- I guess my answer would be the same as the one you gave me down below about indicating the accent on romanizations: because no source does this and because we seek to avoid surprising or confusing our readers. I've never seen a pronunciation dictionary using hiragana (they might exist, but as far as I know all major ones don't do that), being pronunciation and accent a matter of sound, it's just normal, Japanese wise, to use katakana, not hiragana. Anyway, this was just an observation, I'm not too bothered by it. I find more strange that we are using the Japanese notation in a dictionary for English speakers, just to be then forced to give its "translation" as a phonemic transcription between square brackets right after it, with the only effect of being redundant and confusing (and weren't we trying not to be confusing?). Sartma (talk) 11:21, 23 September 2021 (UTC)
- I understand that some readers and editors might want kana. I'm arguing that this choice makes everything more confusing. One would expect to find only relevant information given in an efficient and clear way, not a sea of repetitions in which it's difficult to understand what's important and relevant and what's just repeated for the sake of it. And it's not true that it doesn't fit cleanly all in one string. Above I give you an example of one string that would efficiently and cleanly fit all relevant information in one string: IPA: [o̞to̞ko̞ꜜno̞ko̞]. Sartma (talk) 11:21, 23 September 2021 (UTC)
- This is simply not true. Tuttle Pocket Japanese Dictionary is hardly a specialized academic book, neither is the Kenkyusha's New Japanese-English Dictionary. Both incredibly good dictionaries are not afraid of giving the accent in the romanization. Sartma (talk) 08:27, 24 September 2021 (UTC)
- This to me is just another fault of how accent is given here on Wiktionary (using a system aimed at Japanese native speakers, not English learners). Your description of a zero-accent word is wrong. The lowering of the first mora of a word when it doesn't bear the accent is a characteristic of phrasal tonal contour, not of the accent of the word. If you say あの少年 (ano shōnen) there is no rise in the first syllable of shōnen. The rise is a phrasal phenomenon, not a word one. Your misunderstanding of how Japanese accent works to me is just another proof that the way Japanese accent is represented on here is wrong and confusing. Sartma (talk) 11:21, 23 September 2021 (UTC)
- This is a technical issue, so not really an argument against indicating the accent on the romanization. I think Japanese editors had to overcome way more difficult technical issues when creating Japanese pages, I can't believe a simple, standard, accent sign on a simple vowel would be such an insurmountable problem. Sartma (talk) 11:21, 23 September 2021 (UTC)
- The romanization entry, the one user would search, could stay hashi for both (Wiktionary automatically gets rid of diacritics in searches anyway). The distinction between hashí and háshi (no need to indicate unaccented syllables, it's redundant: they are always predictable) would be inside the page, after the relevant POS entry. Sartma (talk) 11:21, 23 September 2021 (UTC)
- That is already the current implication. The Pronunciation section gives the Standard (not "Tokyo", just based on Tokyo accent) Japanese accent, so this is not really an issue. Adding the accent to the romanization wouldn't add any implication that's not there already. Sartma (talk) 11:21, 23 September 2021 (UTC)
- Accent/pronunciation variation is not a peculiarity of Japanese. You move 10 km away from a random place in Italy and people already use different vowels, no-one speaks Italian like it's supposed to be in the dictionary (and it's not an exaggeration, literally no Italian natively speaks 100% in line with what's in an Italian dictionary, unless they studied to become actors), but still Italian Wiktionary would give what's consider as Standard Italian. As far as I can tell from observation, Japanese Wiktionary does the same. If there is more than one way to pronounce a word, we can give them all at POS level (like: 若布 (wákame or wakáme)), I can't see any issue with that. In usexes we can just use any of those, since they'd all be correct. Sartma (talk) 11:21, 23 September 2021 (UTC)
- I don't think any Japanese individual word is long enough for reading to actually become an issue. Do you have any example to prove this claim? Sartma (talk) 11:21, 23 September 2021 (UTC)
- You're giving examples of compound words. This is not what I was referring to. I'm ok with using spaces in those (even if I'd prefer a hyphen). Sartma (talk) 11:21, 23 September 2021 (UTC)
- This is not true. 男の子 is etymologically formed by 3 Japanese morphemes, but it's one single word. There is a difference in Japanese between あの男の子 (ano otokónoko, “that boy”) and あの男の子 (ano otoko no ko, “the kid of the aforementioned man”) (this second example is quite marked and it would need a very specific context, I'm aware of that, but it's not at all impossible). This minimal pair clearly shows that 男の子 (“boy”) per se is not 3 words, but just 3 morphemes of one single word, and the accent itself would make the difference with 男の子 (“that man's kid”) clear. Sartma (talk) 11:21, 23 September 2021 (UTC)
- Besides, I've just seen the Etymology section of 男の子, and that's wrong, too. 男の (otoko no)=男である (otokó de áru, “male”), not 男が所有する (otokó ga shoyū-suru, “a man's”). The の is not possessive. I'll fix that. Sartma (talk) 11:48, 23 September 2021 (UTC)
- I'm not saying they are not useful. They are as useful as it would be to know the accent of a word is a properispomenon, paroxytone, etc. in Greek, too. That's the word the readers would encounter and are already familiar with. My argument would be that the Pronunciation section is not the right place for that kind of information to be displayed (as it's not in all other languages on Wiktionary). Only Japanese is doing this: why? Sartma (talk) 11:21, 23 September 2021 (UTC)
- Independently of what other resources do, do we agree that it's currently a redundant piece of information, since it's already given in two places before in the same line? Sartma (talk) 11:21, 23 September 2021 (UTC)
- I'd argue that the pitch-marked phonemic IPA is the only thing that should be there. At the moment that is literally just a "translation" of the Japanese notation. There is no need for the Japanese notation. That has been developed and meant for Japanese people, and we agree that Wiktionary is for English speakers. Moreover, it gives a wrong depiction of what the pitch accent is, mixing pitch accent with phrasal tonal contour, one more reason to get rid of it. Sartma (talk) 11:21, 23 September 2021 (UTC)
- I agree with all the above. I'd get rid of the Japanese notation (as I explained above, it's wrong, it's not meant for English speakers and it's redundant: same info given in the following phonemic transcription). Sartma (talk) 11:21, 23 September 2021 (UTC)
- No, I never meant nor implied that we should only prescribe one accent if a word has more standard pronunciations. I would give all of them, as standard Japanese dictionaries that indicate accents do. Sartma (talk) 11:21, 23 September 2021 (UTC)
- I know, I said so myself at the beginning of my post. I do believe, though, that being the Pronunciation section handled by a template, that at least would be an easier fix. You are probably used to see it and its total redundancy might not confuse you much anymore, but to someone knew (even someone like me, who already knows that information. Imagine someone who doesn't!) that little section is just a big confusing mess. If you think that you could sum everything up with just one single clear phonemic transcription (and eventually keep the phonetic one too, after correcting it), to me the gain of that little template fix would be immense. Sartma (talk) 11:21, 23 September 2021 (UTC)
- @Sartma, interleaving replies like this without adding a signature is bad form -- it is hard to follow who wrote what. It is also hard to read the wikicode, which is necessary when editing to write a response.
- That aside, I see that we have some confusion here.
- What you point to in those works is not romaji, but rather a specialized headword, which is annotated with specific additional information. Dictionaries themselves are often "specialized academic contexts". What I point to in my comment above is plain romaji, what our readers may have seen in romanized texts. Texts like this one, or this one, or this one. In the romaji we use in Wiktionary, outside of pronunciation sections, we use modified Hepburn with no pitch accent indicators -- much like the three sample texts linked here.
- I have never seen running romaji text outside of a dictionary that includes pitch accent markers. And, as described below, the pitch accent marked in most dictionaries is specific to the "standard" variety of Japanese, which is inappropriately prescriptive for our mission here at Wiktionary. In addition, many words have multiple valid pitch accents even within the "standard" variety of Japanese.
- I feel quite strongly that pitch accent information belongs only in the pronunciation section.
- If you are proposing to revamp our entire approach to the romanization of Japanese throughout Wiktionary, I cannot disagree strongly enough. If you are ultimately only talking about the transcription scheme for pronunciation sections, I am much less concerned. ‑‑ Eiríkr Útlendi │Tala við mig 00:08, 24 September 2021 (UTC)
- Sorry, I forgot to sign that one reply. It's signed now. Ideally I'd like to see accents everywhere, but I do understand that that would be too much work and that it's not usually done, so I'm proposing a change, I was just expressing my wishes there. I do believe that the romanization next to the POS should have the accent, though. Accents are phonemic in Japanese and should always be indicated, even in the vast minority of cases where more than one accentuation is possible. I'll summarise my reply down below to avoid interleaving. Sartma (talk) 08:27, 24 September 2021 (UTC)
- @Sartma, you misunderstand.
- I did not say that any word is "zero-accent". I said that the word 少年 (shōnen) has "pitch pattern 0". This is the notation used in Daijirin, Shinmeikai, NHK, and various other dictionaries to indicate that a word has no downstep. If a word does have a downstep, the number indicates the mora after which the downstep occurs.
- Japanese pitch accent is indeed phrasal in how it manifests in speech. That said, words do generally have a rise in pitch. Your link to the Kenkyūsha page clearly shows this in their indication of pitch accent, such as na꜒sai-ma˥se. ‑‑ Eiríkr Útlendi │Tala við mig 00:08, 24 September 2021 (UTC)
@Sartma: Your reply is too much text, scattered across too much of the above, ultimately making it poorly organized and hard to read. I suspect that the crux of some of these issues is potential confusion over what is meant by "romanization" -- do you mean everywhere, or just in pronunciation sections? Discussions of "romanizing Japanese" in the past have been about use of romaji whenever a Japanese word is rendered into the Roman alphabet. If you mean only in a more limited context, please clarify.
Past there, and two replies I inlined above before becoming overwhelmed, I have other things I need to get done today, and I must leave this as it is. Cheers, ‑‑ Eiríkr Útlendi │Tala við mig 00:08, 24 September 2021 (UTC)
- @Eirikr: Sorry, I thought it would have been easier to discuss under each of your bullet points. Here's a summary:
- Pronunciation section
- Currently hyper-redundant and confusing. Same info given 3 times (Japanese-style, loose phonemic IPA, accented mora number) + irrelevant Japanese technical name of accent pattern that doesn't belong here (this kind of info is never given in any other language on WT and it doesn't add anything but confusion) + accentless phonetic IPA. In all this redundant information overload one easily misses the only relevant piece of information: where the accent falls. That's the only thing the reader needs to know, all the rest is always predictable.
- My proposal would be to simplify the whole section to a simple phoneMic transcription, something like this:
- To which you can add a super detailed phoneTic transcription with the pitch accent of all moræ etc. between square brackets, if you really feel you need to (I don't think it's needed for a language with such a simple phonetic inventory like Japanese).
- Romanization
- In the POS section, right after the Japanese entry, I'd like to see a transliteration that indicates the accented mora, something like this:
- I would like to see the accent in usex as well, but I understand that it would be too much work, so ok if not.
- General comments
- Using spaces in words like 男の子 (
otoko no kootokónoko) and 掌 (te no hiraténohira) is just plain wrong. Those are individual words and there is no need to separate their etymological components/morphemes. They are exactly the same as 茸 (kínoko), and you don't write this last one aski no ko. Sartma (talk) 23:55, 25 September 2021 (UTC)
- Using spaces in words like 男の子 (
I've deinterleaved this discussion: diff — Fytcha〈 T | L | C 〉 12:07, 20 July 2022 (UTC)
thoughts on the formatting of classical terms and old spellings, starting with 思ふ, おもふ, and (theoretically) 取る
- 思ふ
- added a second Verb section for modern 五段活用(歴史的仮名遣い) in addition to the existing 四段活用(古語)
{{ja-verb|おもふ|tr=intrans|type=yo}}
produces the inappropriate romanization of omofu; change to{{ja-verb|おもう|hhira=おもふ|tr=intrans|type=yo}}
?{{ja-verb|type=godan}}
does very badly in generating inflected forms, of course- should be
stem 思ひ (omoi), past 思つた (omotta)
- should be
- 四段活用(古語) inflection table probably needs modern romanization (おもひけり omoikeri)?
- おもふ
- 取る
- second Verb section for 四段活用(古語)?
—Suzukaze-c (talk) 03:33, 9 October 2021 (UTC)
- @Suzukaze-c -- For some additional data on historical conjugations for 思う, see also this page of the Nippo Jisho (first full entry on the top left), showing the following:
Vomoi, ô, ôta
- The dictionary's convention for verbs is to give the 連用形, then the 終止形 ending, then the 過去形 ending. So we have おもひ (omoi), おもふ (omō, no distinctly pronounced final う), and おもふた* (omōta, but I'm unsure of the kana spelling). The latter form is (I think?) still the regular past-tense in Kansai-ben.
- Compare to the entry for Tori (last full entry on the bottom right):
Tori, ru, otta
- This gives us とり (tori), とる (toru), and とった (totta), same as modern Tōkyō-based 標準語. ‑‑ Eiríkr Útlendi │Tala við mig 21:36, 11 October 2021 (UTC)
- @Eirikr That is genuinely interesting information, but I must say that I was referring to pre-reform spellings of modern language (the kind found in early 1900s literature). I've changed the wording to be less stupidly opaque. I do think that we should eventually cover that period in the Conjugation section though. —Suzukaze-c (talk) 06:54, 21 October 2021 (UTC)
- @Suzukaze-c: Ah! Sorry, I picked up on the "Classical" in the heading and went with that.
- If you're thinking just of early modern Japanese, using pre-reform spellings, that's a different kettle of fish. :) I think the current state of the 思ふ entry is pretty good -- we've got the classical, with all of the classical details, and then we've got the modern term with the archaic spelling. I might add some labels to try to make things a bit clearer. Also, as you note, we need to rejigger the romanization module to deal with classical differently -- and for that, we probably need to do some more research to figure out quite when we lose the medial /f/, [ɸ] and instead get either /w/ or no consonant at all. ‑‑ Eiríkr Útlendi │Tala við mig 17:28, 21 October 2021 (UTC)
- @Eirikr That is genuinely interesting information, but I must say that I was referring to pre-reform spellings of modern language (the kind found in early 1900s literature). I've changed the wording to be less stupidly opaque. I do think that we should eventually cover that period in the Conjugation section though. —Suzukaze-c (talk) 06:54, 21 October 2021 (UTC)
- food for thought: とうとし#Etymology 1 —Suzukaze-c (talk) 00:08, 5 November 2021 (UTC)
We should fix guidelines of Proto-Japonic
I have great unsatisfaction about Proto-Japonic articles currently. If left as they are now, they may provide misunderstandings about Proto-Japonic to the reader. I could list up following problematic points to change:
- Nouns of them don't have any accent annotations, one of the important features of the words to classify that causes length differentiation of the vowels in Northern Ryukyuan.
- Verbs of them are not written in the stem, but dead copies of Japanese.
- In spite of all verbs and adjectives must belong to one of the two tone groups, they are not indicated.
- They mixed up *N consisting each kind of 濁音 (*Nk, *Ns, *Nt, *Np) and the consonant *n included in each ナ行 (na, ni, nu, ne, no).
- <y> is used as 露出形 coda (some examples of Old Japanese vowel are developed from hiatus in the compound, so to distinct /i/ and /j/ complicates the developmental model unnecessarily).
- (And I personally think that writing /j/ as <y> is not IPA compliant and not recommended though, as not all the researches don't use that method, and for compatibility it might be allowed.)
Guidelines need to be developed to solve those issues. @Eirikr, TAKASUGI Shinji, Atitarev, Fish bowl, Poketalker, Cnilep, Marlin Setia1, Huhu9001, 片割れ靴下, Onionbar, Shen233, Alves9, Cpt.Guapo, Sartma, Lugria, Mellohi!, Chuterix--荒巻モロゾフ (talk) 11:37, 10 January 2023 (UTC)
RFC discussion: June 2017–April 2024
The following discussion has been moved from Wiktionary:Requests for cleanup (permalink).
This discussion is no longer live and is left here as an archive. Please do not modify this conversation, but feel free to discuss its conclusions.
Japanese. Still mildly out-of-date, and the formatting makes it difficult to understand sometimes. —suzukaze (t・c) 17:04, 14 June 2017 (UTC)
- If anything here still needs to be cleaned up, let someone start a new and more specific RFC. - -sche (discuss) 18:26, 12 April 2024 (UTC)
Ordering {{ja-see}} kun homophone entries?
I was looking for guidance on how kanji entries for kun’yomi homophones in {{ja-see}}
should be ordered, and I couldn’t find them. Are there any?
Context: I noticed that すすめ only referred to 勧め and not to 薦め, so I edited to add it. I added 薦め after 勧め, because this pair happens to go in that same order in all four collations I could think of:
- Chronological by edit (obviously);
- Stroke count (勧 has 13 strokes, 薦 16);
- Unicode code point collation (勧 at U+52E7, 薦 at U+85A6); and
- Relative frequency of use of the kanji (as listed in the 2010 survey by the 文化庁 (Agency of Cultural Affairs)).
So in this case, I didn’t think I needed the guidance. But in other cases (such as creating a new kana spelling page with multiple references), if it’s written down somewhere, I couldn’t find it. Is it documented somewhere? If not, should it be? TreyHarris (talk) 19:24, 16 September 2024 (UTC)