clear
{{language.name}} No language found.
swap_horiz
{{language.name}} No language found.
search
Warning: this website is for testing purposes. Everything you submit will be definitely lost.

Wall (5283 threads)

seveleu_dubrovnik
12 days ago - 12 days ago
Hey, I have added an article on Tatoeba to the Belarusian Wikipedia !

https://be.wikipedia.org/wiki/Tatoeba
gillux
12 days ago
We upgraded our search engine to the latest version of Manticore. Manticore is a fork of Sphinx. You shouldn’t notice anything new because the search functionality remains the same. It just improves performance a little bit and paves the way for future improvements.

That said, while we were at it, we added stemming support for four additional languages:

• Danish
• Hungarian
• Romanian
• Norwegian (Bokmål)

Have a look at this page if you wonder what stemming is about: https://en.wiki.tatoeba.org/art...h#more-details
CK
CK
13 days ago
** Audio Milestone **

https://tatoeba.org/eng/audio/index
Sentences with audio (total 555,555)
JeanM
18 days ago
I'd be curious to hear what people think about the following feature suggestion: allow users to release *any* personal contribution under CC0, even if it's e.g. a translation of a sentence that's under CC-BY 2.0 FR.

The original license would still apply, since a translation is a derivative work – and a warning should probably be shown. However, should the source sentence ever be released under CC0, then I believe this would mean that the translation could also be automatically switched to the less restrictive CC0 (which I favour for my own contributions).

(Although of course I am not a lawyer and I could be completely wrong about all of this.)
hide replies
AmarMecheri
17 days ago
Exactement ... et plus encore! Je suis parfaitement d'accord avec vous.
Plus encore... je ne comprends pas ... pourquoi quand je donne des variantes de MES PROPRES PHRASES sous CCO 1.0 (toujours en kabyle ou traduites par moi-même vers le français et l'anglais) ... pourquoi je suis obligé de cliquer ... encore ... pour préciser CCO 1.0, faute de quoi elles s'affichent en CC-BY 2.0 FR. Alors que TOUTES MES PHRASES M'APPARTIENNENT....

Exactly ... and more! I totally agree with you.
More ... I do not understand ... why when I give variants of MY OWN PHRASES under CCO 1.0 (still in Kabyle or translated by myself to French and English) ... why I have to click ... again ... to specify CCO 1.0, otherwise they are displayed in CC-BY 2.0 FR. While ALL MY PHRASES ARE MINE....
hide replies
JeanM
17 days ago
Je crois qu'il s'agit simplement d'une limitation de l'interface de Tatoeba. Si c'est une traduction d'une phrase sous CC0, d'après ce que j'ai compris, légalement vous pouvez toujours la diffuser sous la licence CC0.

--

I think it's just a limitation of Tatoeba's interface. If it's a translation of a sentence under CC0, legally you can always release it under CC0, as far as I understand.


https://en.wiki.tatoeba.org/art...contributions#

"While it should logically be possible to use CC0 for the translations or audio of a CC0 sentence, we have not yet implemented this possibility in Tatoeba. We will consider it once we have a larger number of CC0 sentences."
hide replies
AmarMecheri
15 days ago - 15 days ago
@JeanM
Thank you for responding, but it's way worse than what you said.
In fact, we who want to benefit our young friends researchers / students through the license CCO 1.0, we are really AFFLIGED to have to write our Kabyle sentences, one by one under license CCO 1.0.
Because, if we link them from the beginning by writing them following a key phrase, they are considered as translations and are adorned with the label CC FR 2.0. It takes crazy time for the Kabyle contributors, who wanted to gain time. We do this FOR HONOR, NOT FOR EMBRYING (annoying) Tatoeba. Do something, if possible, please! Have pity on our language, which almost disappeared!
 @JeanM
Merci d'avoir répondu, mais c'est bien pire que ce que vous avez dit.
En fait, nous qui voulons profiter nos jeunes amis chercheurs / étudiants via de la licence CCO 1.0, nous sommes vraiment AFFLIGÉS de devoir écrire nos phrases en kabyle, une par une, sous licence CCO 1.0.
En effet, si nous les lions dès le départ en les écrivant à la suite d'une phrase clé, elles sont considérées comme des traductions et sont affublées de l'étiquette CC FR 2.0. Cela nous prend un temps fou, nous les contributeurs kabyles qui voulaient gagner du temps. Nous faisons cela POUR L'HONNEUR, PAS POUR ENNUYER Tatoeba. Faites quelque chose, si possible, s'il vous plaît! Ayez pitié de notre langue qui a failli disparaître !
hide replies
CK
CK
15 days ago - 15 days ago
As someone who uses the Tatoeba Corpus for a publicly usable educational website, I don't really find it a problem to credit tatoeba.org as the source of the data I use. If I limited my use to only sentences released under CC0 (public domain), the data wouldn't be of very much use to me. I think this is also true for others who use the Tatoeba Corpus.

hide replies
JeanM
15 days ago
Having worked for a few companies (large and small) that do work in machine learning, I can say that the situation for them is often a little different. I am not a lawyer myself, but the following is based on my first-hand experience.

If a researcher/engineer at one of those companies wants to do some work which uses a dataset, the company's legal team must review the dataset's license. Legal teams love public domain data, because it makes their job easy. Even very liberal licenses that impose requirements (such as the Attribution part of CC BY) or which are written in foreign languages with foreign legal systems in mind (such as CC BY 2.0 FR) make the lawyers' work harder, and will take longer to get approval or may just be denied. Especially if the researcher wants to use a bunch of different datasets, each with its own license.

Now for major languages like French or Japanese, I as a researcher can make a business case for getting the lawyers to spend the extra time reviewing 10 licenses in 10 different jurisdictions: the company has lots of customers speaking those languages, and so better support for them will generate extra revenue. If we're talking about a minority language though, which might make a few people happy and help preserve a culture but is unlikely to make the company any money whatsoever, things are different. Chances are that, unless the dataset is in the public domain or under a license that has already been tested in the local legal system, getting approval will be much harder.

In an ideal world companies would be ok with going through a few extra hoops to support a minority language, but in reality only few places will be willing to do that. This is why I feel wide CC0 support is key – not because I want to make corporations' lives easier per se, but for the sake of communities which speak endangered languages and want to ensure their work is as widely accessible as possible.

Having said that, regardless of whether something like this gets implemented in the future or not, I'd like to say a big thank you to all the Tatoeba admins, community, and developers for the great work they've put into this project. It's pretty amazing what you've all achieved.
hide replies
AmarMecheri
13 days ago - 13 days ago
@JeanM
Not only I do appreciate the fact that you have raised the problem and explained all aspects of it, but I am also grateful to you for having grasped and defended our primary concern for safeguarding our language that we have had great difficulty in resurrect and preserve against absolutism. Big thanks!
 @JeanM
Non seulement j'apprécie le fait que vous ayez soulevé le problème et expliqué tous les aspects de celui-ci, mais je vous suis également reconnaissant d'avoir compris et défendu notre souci premier de préserver notre langue que nous avons eu beaucoup de mal à ressusciter et à préserver contre l'absolutisme. Grand merci!
AmarMecheri
15 days ago
@CK
Merci pour votre réponse.
Même si cela ne résout pas le problème que j'ai posé, à la suite des remarques judicieuses des @JeanM, je comprends qu'il soit du devoir de l'initiateur que vous êtes, de défendre l'intégrité de Tatoeba. Ce site auquel je continue d'apporter ma contribution vaille que vaille, par égard pour des personnes respectables comme vous.
Thank you for your reply.
Even if that does not solve the problem I posed, following the judicious remarks of @JeanM, I understand that it is the duty of the initiator you are, to defend the integrity of Tatoeba. This site to which I continue to make my contribution whatever comes, for the sake of respectable people like you.
gillux
15 days ago
Je comprends votre frustration. Pour répondre à la question du « pourquoi », la raison est que personne n’a encore travaillé à améliorer ça. Il se trouve que très peu de personnes travaillent sur Tatoeba et il y a des tonnes et des tonnes d’autres choses à améliorer et d’autres problèmes à corriger (dont certains bien plus critiques, qui rendent par exemple le site inaccessible).

Il existe une solution pour réduire l’inconfort en attendant que la situation s’améliore. Vous pouvez créer des phrases en CC-BY pendant un certain temps (par exemple une journée), puis changer la licence de toutes vos phrases CC-BY en CC0 d’un coup d’un seul, en allant sur la page https://tatoeba.org/licensing/switch_my_sentences
hide replies
PaulP
15 days ago
> puis changer la licence de toutes vos phrases CC-BY en CC0 d’un coup d’un seul, en allant sur la page https://tatoeba.org/licensing/switch_my_sentences

C'est drôle. Chez moi ce lien conduit à la page d'entrée (https://tatoeba.org)
hide replies
gillux
15 days ago
Oui. Comme cette fonctionnalité est encore en développement, elle est pour le moment seulement accessible à certaines personnes qui souhaitent publier leurs phrases sous licence CC0. Si vous souhaitez vous aussi avoir y accès, demandez à un administrateur.
AmarMecheri
15 days ago
@gillux
Merci infiniment pour votre réponse, à la fois, très constructive, explicite et utile. Ça me va droit au coeur!
C'est grâce à des personnes comme vous, qui se consacrent sans rechigner, à la bonne marche de TATOEBA, que je continue à contribuer malgré certaines difficultés de compréhension avec certains pontes.
J'essaierai de concrétiser vos suggestions.
Encore merci.
Au plaisir de vous lire!
hide replies
TRANG
14 days ago
Je voudrais aussi ajouter que de la même manière qu'il est actuellement possible de migrer toutes vos phrases originales vers CC0, il sera possible un jour de migrer toutes vos traductions éligibles vers CC0, en un seul clic.

Donc à moins que vous soyez dans l'urgence, ne vous ennuyez pas à créer les phrases séparément pour les lier ensuite. Contribuez normalement, et attendez simplement qu'on implémente la fonctionnalité.

J'ai créé un ticket sur GitHub à ce sujet: https://github.com/Tatoeba/tatoeba2/issues/1858

Je ne dis pas que ce sera implémenté d'ici tôt car comme gillux l'a fait remarqué, il y a énormément de boulot sur Tatoeba et très peu de resources. Mais les choses finissent par avancer, il faut juste être patient.
hide replies
AmarMecheri
13 days ago
@Trang
Vous m'en voyez ravi et je vous remercie, ainsi que @gillux, @JeanM, @CK et tous ceux parmi le staff qui contribuent aux améliorations de Tatoeba.
Je suis content que vous ayez tous deux compris mes préoccupations.
Je suis très patient et j'attendrai car j'accorde toute ma confiance à votre site, que j'ai fait mien.
Salutations cordiales.
TRANG
17 days ago
> I'd be curious to hear what people think about the following feature suggestion:
> allow users to release *any* personal contribution under CC0, even if it's e.g. a
> translation of a sentence that's under CC-BY 2.0 FR.

I can tell you this is not a legal risk I would take. By doing that, we would be making the statement that a derivative work should be completely independent from the original work in terms of intellectual property. I don't think we stand a chance if we wanted to defend that point of view. At least I wouldn't be able to defend it.
hide replies
JeanM
17 days ago
Oh I'm not suggesting the translation as a whole be released under CC0 regardless of the underlying sentence's license. I don't think that would stand, as you say. I'm suggesting something that is subtly different.

Example scenario: You write a sentence, and I translate it.

What I'm proposing is to make a distinction between:
(1) your underlying sentence, which is your own (copyright-protected) expressive creation;
(2) and my translation, which is a combination of my own expressive creation and yours.

I am further suggesting that you allow users to license their own expressive creations under CC0, if they so desire.

This might seem silly because how can one possibly take my translation and "separate" my expressive creation from yours? Obviously users of the translation will still need to abide by the license imposed by you. However, I can think of at least two scenarios where allowing a distinction between the two separate expressive creations would be useful:

(1) Imagine you license an original sentence under CC BY, I translate it, but I don't actually care about being credited myself for the translation. I should then be able to state that I do not wish to impose any further restrictions on the translation, other than the ones which already exist on the underlying work. This would mean that users of the translation would only have to abide by the CC BY license of the underlying work. Compare this to the current situation, where translators are essentially forced to apply extra restrictions to the translations they contribute (in the form of an extra CC BY license), on top of the conditions that already exist on the underlying sentence.

So basically, the current sitation is:
TRANG's sentence released under CC BY + JeanM's contribution released under CC-BY = JeanM's translation of TRANG's sentence, released under TRANG and JeanM's CC BY licenses *simultaneously*.

And what I think would be quite neat is to make this possible:
TRANG's sentence released under CC BY + JeanM's contribution released under CC0 = JeanM's translation of TRANG's sentence, released under TRANG's CC BY license *only*.

(2) Imagine you contribute loads of original sentences under CC BY, and I translate all of them. At a future point in time, you decide to relicense the original sentences under CC0. I would actually have been fine with releasing the translations under CC0 – but because of the interface, I actually had to apply an *additional* CC BY license to the translations. Then I get run over by a bus / I disappear from the face of the Internet, and so the translations are stuck with the CC BY license. Had I been allowed to state "my own expressive creations are released under CC0" then, once you relicensed your sentences under CC0, the translations would also have been automatically relicensed under the more permissive CC0.

Apologies for the wordiness, and I hope I managed to explain myself more clearly.

My own personal motivation for releasing as much as possible under CC0 is that I am a speaker of an endangered language, and I want to impose as few burdens as possible on potential users of the data I create. I am basically desperate for companies/researchers to use data in my language, and I know that many companies will prefer data the under CC0. (Incidentally, that's also the license required for text in Mozilla Common Voice)
hide replies
TRANG
17 days ago
Okay, I understand better. In fact you want a way to automatically switch the license of your translations to CC0 whenever possible.

Or you want a license that basically says "My derivative work will be automatically re-licensed to the most permissive license possible if the original work is re-licensed to a more permissive license". I don't think such a license has been created.

But I guess you could publish somewhere that you wish to have all your contributions released under CC0 when possible and you allow Tatoeba to change the license of your translations to CC0 without having to ask you. That might be enough.

Hopefully we will implement the possibility to switch the license of translations in a not too far future, and hopefully you'll still be around by then.
hide replies
JeanM
17 days ago
> Or you want a license that basically says "My derivative work will be automatically re-licensed to the most permissive license possible if the original work is re-licensed to a more permissive license". I don't think such a license has been created.

Yeah that's pretty much it. I think that can be achieved by simply stating that you release your contribution (and only your contribution – not the derivative work as a whole) under CC0, but I may be wrong as I am not a lawyer.

> But I guess you could publish somewhere that you wish to have all your contributions released under CC0 when possible

I already have a sentence on my profile to that effect, yes. In fact I've noticed a few other users with similar statements.

> Hopefully we will implement the possibility to switch the license of translations in a not too far future, and hopefully you'll still be around by then.

Nice! Until then, I'll make sure to look both ways whenever I cross the street.
AmarMecheri
16 days ago
The question is ill-posed, it seems to me. For example, I write my own Kabyle sentences under CCO 1.0 and translate them myself into French and English. In all three languages, they are mine and they are original. Why am I CONTRATING to (Why am I obliged to -) specify the nature of the CCO 1.0 license each time?
La question est mal posée, il me semble. Par exemple, j'écris mes propres phrases en kabyle sous CCO 1.0 et je les traduis moi-même en français et en anglais. Dans les trois langues, ce sont les miennes et elles sont originales. Pourquoi suis-je CONTRAINT de préciser la nature de la licence CCO 1.0, à chaque fois?
Thanuir
14 days ago
How to change all of my eligible sentences to CC0?

I have the permission to change my sentences to CC0 and have successfully done it to individual sentences. However, it is cumbersome to find them and do the change manually.

The wiki at https://en.wiki.tatoeba.org/art...contributions# claims that there is a way of changing all of my original sentences to CC0. Does it exist and, if so, where?
hide replies
gillux
14 days ago
The page is: https://tatoeba.org/licensing/switch_my_sentences

I updated the wiki page, thank you.

Beware that this feature is still under development. Feedback is welcome.
hide replies
Thanuir
14 days ago
Merci beaucoup!
TRANG
14 days ago
The link is also in the private message I sent you after granting you access to the CC0 feature.
hide replies
Thanuir
14 days ago
Sorry, I must have missed that.
hide replies
TRANG
14 days ago
Don't worry, you are probably not the only. At least I know that for the next users who request CC0 access, I'll need somehow to make this link look more important in my response message :)
hide replies
Thanuir
14 days ago
Adding the link to settings or sentences page of those who are in the CC0 club might be a good move.
sharptoothed
17 days ago
** Stats & Graphs **

Tatoeba Stats, Graphs & Charts have been updated:
https://tatoeba.j-langtools.com/allstats/
hide replies
Guybrush88
16 days ago
thanks :)
hide replies
sharptoothed
16 days ago
you're welcome :-)
CK
CK
17 days ago - 17 days ago
We now have over 400,000 English sentences with audio.

https://tatoeba.org/eng/audio/index/eng
Sentences in English with audio (total 400,009)
2019-04-01 01:53 UTC

You can browse the latest audio files and see translations with this link.
https://tatoeba.org/sentences_lists/show/4000/und
AmarMecheri
17 days ago
Hi there!
@JeanM
Exactly ... and more! I totally agree with you.
More ... I do not understand ... why when I give variants of MY OWN PHRASES under CCO 1.0 (still in Kabyle or translated by myself to French and English) ... why I have to click ... again ... to specify CCO 1.0, otherwise they are displayed in CC-BY 2.0 FR. While ALL MY PHRASES ARE MINE....
belkacem77
18 days ago
Welcome to Tamahaqt.

Tamahaqt Tahagart (Tatergit n uheggar) has just been added by Taoteba. Thanks the the tech team and especiallay @Ricardo and Aissa Mahfoudh who is working on Tamahaqt language.

We will do the best to share in info with the Tamahaqt speakers to create bridges between Berber (Amazigh) languages but also with the whole world languages.

For information, Tamahaqt is a Berber language. About 200 000 people in the South of Algeria (but also Mali, Niger, Libya, Mauritania, Burkina Faso, Senegal, and Tchad) are still speaking it.

Again, Welcome to Tamahaqt.
CK
CK
19 days ago
Stats - 2019-03-30 - Native Speaker Sentence Counts

http://tatoeba.byethost3.com/stats-190330.html

Find out members' native languages.