2019-03-27 09:20
That's a surprisingly low figure, at least if you had asked me before posting your stat what would be the percentage of those who contributed at least a sentence I would have responded 70-80%.

Of course I understand there are a lot of people who use Tatoeba without contributing sentences, that's perfectly normal, but why register a user if you're not planning to contribute?

For the sake of favorites? To leave a comment? Bots?
2019-02-04 18:05
> You are in no way obligated to proofread all the Ukrainian sentences

I know that, thank you.
2019-02-04 17:36
> There is, in my opinion, no reason to be upset about massive addition of bad sentences.

Depends on the person, I guess.

I would be rather upset if someone started adding a lot of bad Ukrainian sentences. I feel like the quality of the Ukrainian corpus is my responsibility, and that would definitely hurt, especially if I felt like there were so many of them I can't really proofread them.

2019-01-14 09:58
It's been a while, thanks a lot.
2019-01-11 20:18
> So Horus only notices linked duplicates?

No, Horus is supposed to merge all duplicates within one language, it doesn't matter whether linked or not.
2019-01-10 20:51
Дуже дякую! :)
2019-01-06 12:53
Thanks for that, and especially for this bit:

> to my surprise, the downloadable ‘Links’-file contains thousands and thousands of sentence numbers that aren’t in the ‘Sentences’-file

So this actually explains why there could be sentences with no direct translations but with the indirect ones. Obviously, this also mean there could be a lot of examples of sentences containing some indirect translations that are not immediate translations of any direct translations of those sentences - and yeah, sorry for this convoluted/awkward phrase.

The indirect translations are obviously reconstructed from the links database, and if it contains links to non-existent sentences, then this is it, different sentences could be indirectly linked via deleted sentences.

Before your explanation I hadn't been able to imagine how that might have happened, so I had been thoroughly confused.
2019-01-04 15:23 - 2019-01-04 15:27
How is it even possible for a sentence to have 0 direct translations and a few indirect ones?

According to the logs, it was added as a translation for this:

Which doesn't exist for some reason, so that might be part of the explanation, but still a weird bug. Also, why doesn't #6116872 exist? There is nothing in the logs that might indicate it was deleted.
2018-12-31 15:37 - 2019-01-04 15:28
Dear diary,

Just wondering if a native speaker of English could add a few sentences with "l'esprit de l'escalier"?
2018-12-31 14:12
Italian - December - about 50K sentences - that's an incredible boost.
2018-12-23 20:12 - 2018-12-24 10:30
My six year old enjoys saying "Merry whatever" this Christmas season. Sounds kind of grumpy, but also cute.

He claims to have picked it up from some cartoon, I haven't seen it.

Anyway, I liked this expression a lot myself :) So, Merry Whatever!

EDIT: A kind person sent me a PM telling it was probably the Grinch:

Looks like that, yeah.
2018-12-11 09:29
> From my point of view, a good sentence should be indistinguishable whether it is original or a translation.

Of course it is. I don't think we have ever argued about that.

However, even without translation native speakers quite often produce awkwardly sounding sentences that they can either correct themselves, or leave as is. Not everybody is able to speak and write clearly and smoothly. So I thought you meant your "Partly unnatural sentences" were in that category.

If they clearly sounded like something a native speaker would never say, then it's just unnatural sentence, in my opinion.
2018-12-10 09:19 - 2018-12-10 09:20
You've done a great job, and I liked how you created a lot of categories instead of just saying "good" or "not good".

> Sentences with translation errors (mistranslated words, tense errors, pronoun errors etc. ) : 37 (3.7%)

If we started checking this within the task defined by Trang, it would be impossible to complete.

Imagine a native speaker of English trying to determine which sentences are "good", and also check whether all 100 translations into 50 languages of each sentence is correct.

I think most of Turkish sentences are linked to English ones, but what if you come across something linked to say Ukrainian or Marathi, would you check the translation as well?

From my point of view, and according to my low standards, I'd say 94% of Turkish sentences are good sentences. This is the only category of "bad" sentences, IMHO:

> Completely unnatural sentences (literal translations, very strange word choices/orders etc. - sentences that a native speaker would never say) : 61 (6.1%)
2018-12-10 09:13
> If by "still acceptable", you mean to include obviously incorrect language use

I meant "awkward, but something a native speaker might write, and not correct themselves after re-reading it".

It's really difficult to come up with English sentences like this to me because I'm nowhere near native level, but in Ukrainian I've been coming across a lot of examples of such phrases. Basically, "bad style" or something like that.
2018-12-10 09:11 - 2018-12-10 09:13
> That being said, following Tatoeba's rules, why would you classify as "good" any of 2, 3, or 4?

Yes, they need to be corrected, but they are still good according to my low standards. My standard is: if it can pass for a sentence written by a sober native speaker who had re-read it twice - it's good enough.

So I just wanted to know what the standards for "good" are. Something that needs correction? Then any single fault makes it "bad". Which is fine by me, "good" and "bad" are too broad of terms.

Basically, I would like to distinguish between something like "How much time?" (intended meaning - "What time is it?") and a genuine typo or even an error that a lot of native speaker make.

But if the idea is to put both "How much time?" and "I'm taller then him." into the same basket because they both need correction, I'd be fine with that as well.
2018-12-08 21:59 - 2018-12-08 22:00
I'd say that whoever checks the sentences for their languages shouldn't be an active contributor, or at least not one of the top contributors for their language.

For example, for Ukrainian I would be mostly checking my own sentences. Which is not that bad, it's proofreading, but it's always makes more sense to have a fresh look, and it's more efficient.

Also, I think there could be a lot of categories of "good" and "bad" sentences.

For example, depending on the context, I can classify as good either some subset of the list below, or all of them:

1. Sounds like something a native speaker would say, no spelling mistakes, no punctuation mistakes.

2. Sounds like something a native speaker would say, no spelling mistakes, some problems.with punctuation (missing comma, missing full stop at the end, unnecessary comma, etc.)

3. Sounds like something a native speaker would say, contains a minor typo. (Tom has an elephannt.)

4 Sounds like something a native speaker would say, contains a spelling mistake typical to native speakers ("I know more then you.", "I would of done it.", "Your my friend.")

5. Sounds awkward, but still acceptable.

6. Sounds like something a native speaker of a different dialect of my language would say.
2018-12-03 11:46
I did what you suggested - picked a dozen of random Ukrainian sentences and checked the links, they seem to be all fine.

I also checked in Excel there are no pairs of sentences that are both linked and unlinked in the two files, so, I guess, they should be fine to be processed.

2018-11-29 09:09
> Note that some sentences with a lot of translations have most of the translations by the same member, so these numbers do not really indicate the popularity of a sentence.

That's actually an interesting idea for another chart - "Sentences translated by most people".

Also, it would be curious to see something like "Sentences with most alternative translations into one language", I wonder whether any sentence can beat this one having 211 alternative Finnish
2018-11-27 14:48
> people who have been warned repeatedly not to engage in this behavior.

I only remember they were told to stop discussing the Kabyle flag issue otherwise they would be suspended.

Unless I missed something and there were other warnings, I don't think suspending them was entirely fair.

Besides, their discussion was sometimes interesting to read. Tatoeba is effectively dead as a community, that brought some life to it.
2018-11-13 08:56
Thanks CK, I really like all those tools.