14 November 2013

Paroles

Non-native French speakers like myself have a hard time grappling with the French spelling system. It may seem arbitrary to write seconde and pronounce it segond, or frustrating that there is no rule to define why déçu has an accent on the e, but reçu doesn’t.

C'est la vie, mon ami.— say the French

Now that I live in French Guiana, I am trying to come up with strategies to make these words easier to learn.

Focus on the most common words

Dictionaries sort words by alphabetical order, which is convenient if you are looking for a definition, but masochistic if you are trying to figure out which words are the important ones.

One way is to sort them by frequency of usage. You take a bunch of French text and you count how many times each words appears, then you sort them by their frequency. Common words like de, je and le appear at the very top, and words like macarron, compétiteur and an antisportif show up at the bottom.

You have to do some trickery to combine words that share the same stem (for example: enchanté, enchantée, enchantés and enchantées) so they all add up as the same word, but it’s manageable. Here is my quick-and-dirty attempt to sort French words based on their usage (the file has four columns: stem, frequency, the most common word with that stem, that word without any accents (I use this column to play a guessing game)).

Once you have that file, you can start enjoying some seriously fun language-geekery.

Finding patterns amid the French chaos

The White Whale of French accents has to be the é. Every French student knows that the participe passé of -er verbs like chanter ends in é (chanté), but what are the rules governing words like préféré, élève, or fréquence?

Some sites can tell you the multiple spelling rules with their corresponding multiple exceptions, but they are not very helpful because our brains are not wired to deal with detailed rules and exceptions. Besides, native French speakers don’t study these rules. What we excel at is association and intuition. It’s much easier to memorize the lyrics of a song than 100 random words.

Divide and conquer

I have always wanted to know if I should accent words that begin with e. In the past, I grabbed my Petit Robert and tried making a list of words that began with é and another of words that began with e. I then promptly went on to forget everything from both lists.

A better approach is to focus on a few common words each day and reverse engineer the spelling rules. To add another layer of association, you can group them by type of word: words that begin with é, words that contain two és separated by a consonant, words that have an é in the second position. Pick whatever you are having trouble with.

Focus on the exceptions: words that begin with é

For example, I can use AWK to grab all the words that start with either é or e and show their frequency:

awk '$3 ~ "^é" && $2 > 500 {print $2, $3}' fr_frequency_stems.txt

170230 était
28392 écoute
12778 étrange
12660 école
10824 équipe
...
561 élite
543 épingle
538 éclaté
518 épicerie
501 écureuil

In total, there are 123 words, but they are not all equally important: école is 25 times more common than épicerie.

Let’s see what words that start with e look like:

awk '$3 ~ "^e" && $2 > 500 {print $2, $3}' fr_frequency_stems.txt

69775 encore
50094 entendu
42068 enfants
27877 entre
25380 elles
...
528 effraie
519 endommagé
509 edgar
508 estimé
506 endurer

There are around 200 commonly-used words that start with e. If we compare both lists we can see that no word that starts with é is followed by an x, an s, or a double consonant (ll, rr, ff).

awk '$3 ~ "^é[xs]|^é(ll|rr|ff)" && $2 > 500 {print $2, $3}' fr_frequency_stems.txt | wc -l

0 # extra essai ellipse erreur effacer ...

We can also see that words that start with e seem to be followed by n or m, and that only a handful of words break this rule:

awk '$3 ~ "^e[nm]" && $2 > 500 {print $2, $3}' fr_frequency_stems.txt | wc -l
108 # ex: entendu enfants entre ensemble endroit ...
...

awk '$3 ~ "^é[nm]" && $2 > 500 {print $2, $3}' fr_frequency_stems.txt

3661 énorme
3193 énergie
2474 énerve
2344 émission
2229 émotions

The words that break the n/m rule contain an é that is pronounced by itself (it forms its own syllable), unlike the rest of en/em words. You should focus your memorization efforts on these outcasts and assume that the remaining words follow the rule.

Guess wisely: words with é in the second position

We can use a similar approach to look at words that have é in the second position:

awk '$3 ~ "^.é" && $2 > 500 {print $3}' fr_frequency_stems.txt | cut -c2 | sort | uniq -c | sort -k1,1gr
    113641916151098877766

Holy Molly! This breakdown shows that words that begin with or make up 60% of the common words that have é in the second position ((113 + 64) / 300). Let’s focus on those two.

awk '$3 ~ "^de" && $2 > 500 {print $3}' fr_frequency_stems.txt | wc -l
62

awk '$3 ~ "^re" && $2 > 500 {print $3}' fr_frequency_stems.txt | wc -l
152

Well, that’s interesting. There are twice as many words than de words, but there are twice as many re words as there are words. This means that if you are not sure how to accent a word you should guess and re.

Unfortunately there doesn’t seem to be any obvious rules that we can follow to determine if these words should have é or e, so we will have to come up with our own associations.

For example, memorizing words that look alike is easier if we focus on their differences rather than if we learn them independently:

début - debout

désert - dessert

démarrer - demander

détruire - destruction

déssigner - design

You can also make up memorable stories:

La secrétaire a gardé l'accent de secret

There are hundreds of little tricks but this post has gone on long enough. I hope this approach makes your studying more effective. Let me know how it goes and share your own strategies in the comments.

UPDATE: Regarding a few comments on Reddit The diacritics are there to change the pronunciation, but non-native speakers don’t always know if a word should be pronounced é or e. I don’t know if it’s because I’m Spanish, but I’m tempted to want to say sécret (just like sécurité) instead of secret. If I know how secret is written, I can make an effort to pronounce it properly; other times I will use my knowledge of how it’s pronounced to write it properly. It’s a two-prong approach.