“Oh, I don’t need to read and write. I just want to learn to speak it.”
I hear this from a lot of language learners. I understand the sentiment — speaking to people is so much more fun, than, for example, learning to read the newspaper (unless you love news).
But learning to read and write — at least the basics — is incredibly rewarding.
General disclaimer: I’m not a linguist. I’m a linguistic hobbyist. I’ve never studied a language at university, I’ve just studied a bunch of languages. Thus, the way I describe things below — while on the back of decades of experience and a lot of research — is presented without using scientific, technical language. That said, if you’re a linguist and I’ve got something wrong, I definitely want to hear from you and probably talk to you for hours.
Why learn to read and write new scripts?
You can actually get by not learning a script at all. Even Chinese (though you’ll end up thinking in characters anyway).
Six quick reasons why you should learn:
- You’ll learn words more quickly.
- Menus. You can eat food.
- Street signs. You won’t get lost.
- Apps. Some places have their own apps, much better than what Google can provide.
- You can understand notes, text messages and blog comments written to you by your friends in another country.
- It’s really fun to write another language.
OK, I really like the last one. It’s like a secret code. Who doesn’t like codes?
One of the most fascinating things to me is that in the diversity of human languages lies another form of diversity, and that’s the intricate writing systems we have.
Everyone is loosely aware of different alphabets, that some writing systems go right-to-left (or even vertically, traditionally at least) and that some are character-based. But there are other, different writing systems still in common use today.
And that’s the thing I want to emphasize here. It’d be cute and fun to examine ALL the writing systems of the world, historically, like ancient Persian and hieroglyphs, or future writing systems like when we’re entirely using emoji. But we take a practical 80/20 focus and examine only the most common languages spoken today, the 25 most spoken according to Wikipedia — total speakers, not just native.
Take a look at that list for a second. It’s about languages, and we dove into looking at the scripts.
Some interesting things we noticed is the way you can group the scripts into four main groups (plus a couple of outliers)
- Latin: Many languages use Latin script or a close derivative: English, the European languages, a few Central and East Asian languages (Turkish and Vietnamese)
- Indian: There are so many Indian languages in the list! They’re mostly closely related to the written script used for Hindi (Devanagari).
- Chinese characters: There are several Chinese languages. These all have the same formal written form (but different informal forms, like what you’d see in a gossip rag)
- Arabic: Arabic covers a few major ones other than Arabic: Urdu, Hindi and Farsi (which are the same, just with slightly different pronunciation and a few extra letters). Hebrew (not on the list) is very closely related to Arabic, too.
- Outliers: Also Russian and Korean but thankfully those aren’t hard.
And let’s not just analyze them for fun. Let’s analyze how hard they’d be.
But first… what’s the goal of this analysis?
We don’t learn languages with the purpose of reading (let alone writing) poetry. We take a functional, 80/20 approach.
Our goal is to learn communication to survive, to understand a culture, and generally to minimize social friction and exist in a native language environment.
That means that we always assess whether it’s necessary to learn a writing system. Take Javanese, for example (not on our list, but a good example). Traditionally, it was written using a different writing system. In modern times, it’s written in Latin script for all common use. We’d make do with Latin script.
In contrast, Chinese is written in Chinese characters, but often pronunciation is taught in schools and textbooks using ‘Pinyin’, a Romanization that’s actually totally standardized.
How do you analyze how hard a language’s script is?
There are four main dimensions to analyze, with some overlap between them (it could be a flowchart, but with four elements it’d be very small). Like most of what we do at Discover Discomfort, we’ll write from the perspective of someone who speaks, reads and writes English fluently, as the language we primarily use to learn things.
You might be an Arabic, Spanish or Chinese (or something else) native speaker, but if you were to go learn a new language, chances are — if you’re an audience member of this blog — that you’re going to go learn in English for the breadth and quality of resources.
OK let’s get to it.
Dimension 1: What’s the writing system used – is it alphabetic, syllabic or character-based?
There are a few ways of classifying writing systems, but I like the one presented here. Looking at the systems used for the most common 25 spoken languages, we can summarize them further into alphabetic, syllabic or character-based writing systems.
Whoa, what are all those?
Most writing systems have alphabets. You can count your lucky stars if the language you want to learn is, too.
An alphabetic language is one that uses an alphabet. That is, each character can represent a vowel or a consonant, and they’re combined to form syllables. Typically there’s less than 30ish letters in the alphabets, and you can master reading the entire alphabet in a couple of hours.
Examples of alphabet-based writing systems include English, French, Russian, Arabic and two of the three Japanese writing systems.
(If you want to get technical, Arabic and Hebrew are actually consonant-based, but still, they have some vowels and write them out, so it’s conceptually very similar to alphabets.)
Syllabic systems: Some writing systems, like Indian ones, are syllabic. Like it sounds, this means each individual character represents one syllable (sometimes a grouping of syllables). The best examples of these are Japanese writing systems, Hiragana and Katakana, shown below.
All common Indian writing systems, like Devanagari, used for Hindi and Marathi, are syllabic; yes, there’s a consonant and a vowel, but they’re always grouped into a syllable. Syllables then make up words.
पिछले 15 वर्षो में अनेक उपलब्धियों के बावजूद कई चुनौतियां शेष हैं। उत्तम और किफायती स्वास्थ्य सेवाओं तक सभी की समान पहुंच नहीं है और संचारी एवं गैर संचारी रोगों का गैर अनुपातिक दबाव बना हुआ है। स्वास्थ्य के लिए बजटीय आबंटन भी कम है। पिछले दशक में सरकार ने स्वास्थ्य पर जीडीपी का लगभग 1% व्यय किया जोकि विश्व के दूसरे देशों के हिसाब से बहुत कम है। खंडित योजनाओं और वर्टिकल डिजीज़ प्रोग्राम्स के कारण इस राशि का भी अच्छी तरह से उपयोग नहीं किया जा सका।
From a Hindi news website. First sentence (bold) transliterates to “pichhale 15 varsho mein anek upalabdhiyon ke baavajood kaee chunautiyaan shesh hain”. Just looking at that, you can find some patterns.
Korean kind of is syllabic too – the vowel and consonant letters are grouped together into syllables and you can’t randomly form any grouping you want. They feel like Chinese characters to the learner, and in fact are written with the same stroke order rules, but are far more structured (and also less rich in meaning – purely phonetic). You learn over time what syllables make sense.
Very few common writing systems are character based, but if you’re learning one, then good luck! The only one, in fact, is the Chinese (Hanzi)/Japanese (Kanji) writing system, which has a large degree of overlap (in meaning, though not in pronunciation).
A little more detail here since there’s so much to be said. In Chinese Hanzi, each character usually is pronounced with one articulated syllable (e.g. 是 is pronounced shi4, which is like an British “sure” with a downward sloping accent). However, one syllable can have many – like, dozens! – of corresponded characters. So if someone were to just write shi4, you would have to enquire if they mean 是， 事， 市， 师 or many others.
A character does usually correspond with one meaning (sometimes broad). Like the 机 character always means ‘device’. If there’s a hand character in front of it, it’s a hand device (cellphone). If there is an image device before it, it’s an image device (camera).
In Japanese, the character does, similarly, correspond to one meaning. But there can at times be multiple pronunciations of the character: either the Japanese one or the Chinese-derived one.
Dimension 2: If it’s an alphabet, how different is it to the Latin alphabet (the one used in English)?
The easiest languages are those where there are zero differences, like Bahasa Indonesia (Malay/Indonesian language). There are no extra characters, and just a few pronunciation differences (like the ‘c’ is pronounced ch).
Close behind are those that just have a few modifications, like Latin languages (French, Spanish etc.) or other ones (like German, Turkish), which have a few extra characters or accents, like the é and ç in French, or the ñ in Spanish.
Next up is Vietnamese, which now uses Latin characters but a quite different pronunciation system, with its tones and very different sounds. Sure, it might look similar compared to another writing system, but you might feel confused as to how to order spring rolls if you have to say “Gỏi cuốn” out loud.
Then there are languages with alphabets that are foreign. Russian’s Cyrillic alphabet and Arabic script (used in Urdu and Farsi as well) are the best examples of these. Korean, too, despite the way the letters are grouped into syllables. It’s not too hard though… just learn the 30 odd characters and how to pronounce them and work to get faster.
Dimension 3: If it uses some kind of alphabet, how consistently phonetic is it?
Every common language is MORE consistently phonetic than English. I think, anyway.
Think of the poor people (like our parents) who have to learn why in English we pronounce differently the words bought, bough, cough, through, tough and furlough, or why we pronounce the same way the words care, bear, hair, there, their and millionaire. (Here’s a good story of how we got this way.)
So that’s the good news. Every language other than English is largely phonetic when reading, with some curious exceptions that nobody is going to hassle you about, like how to pronounce un oeuf (one egg) in French vs des oeufs (several eggs). Learning to write is a different story, but that’s not our goal, except for casual texting (at the sophisticated level of ‘r u going to the thing tonight’).
The main complication with other major scripts (aside from characters) are Arabic script languages, which only write long vowels (a like part, ee like feet), and don’t usually write in short vowels (a like pat, e like pet).
Dimension 4: How many letters/symbols/characters do you have to learn?
Less is fewer!
The fewer new letters or symbols you have to learn, the easier a time you’re going to have.
Easiest: Latin scripts. This is regardless of whether they have extra letters/rules. Like Indonesian. A walk in the park! French and Spanish and German aren’t far behind despite the ç, ü and ñ which just add a little spice.
Not too hard: Non-Latin alphabets. In the case of alphabets in the most common languages, it’s never too many. Korean has 24, Arabic has 28 and Russian has 31 and some are just Latin letters backwards (and totally different). Seems easy to swallow, right? You could knock that out in a couple of hours! Japanese Hiragana and Katakana aren’t far behind, each with 46 characters in common use (and you have to learn both, because they’re used for different things).
Harder: Indian languages. These have more primary characters, plus rules on combining them. Take Devanagari, the script system used in Hindi and Marathi. It has 47 primary characters (33 consonants, 14 vowels). But it doesn’t stop there. The vowels have a primary form, as well as a secondary where they’re connected to modify the pronunciation of a consonant (forming a syllabic character). Add on to that modifiers, which modify the pronunciation (e.g. nazalizing the vowel in different ways, adding an aspiration or removing the vowel altogether). Finally, ligatures, which mean that you have to learn to recognize when two consonants are fused together, including a few exceptions.
If there’s a character based writing system like Kanji or Hanzi… well, good luck. Look for articles called “How to learn 2,000 characters in 3 months” and know that that’s the elite goal, not the standard pace.
There is a large amount of conflict over how many characters you ‘have’ to learn. In Chinese, I’d argue that learning around 3-500 characters is a minimum if you’re going to make any effort, and kind of fun to learn anyway, 2,500 characters will greatly assist your existence and 3,500 characters will allow you to lead the full professional life of a foreigner (e.g. you work for IBM but in a 100% Chinese environment).
It’s different in Japan, where Hiragana and Katakana exist alongside Kanji. Unfortunately, they’re all mixed together, so you have to learn all three. Your total goal may be lower though. It’s a common refrain that you have to learn around 2,000 to fully understand Japanese, but learning the first 500 will get you a long way. I like the advice given on this page when it comes to learning Kanji, when it comes to that.
How long will it take to learn the writing systems of the most spoken languages?
Let’s put all of the above together and consider how hard all of these scripts are, and why. You might know already!
So let’s consider how long it’d take to get reasonably good at one of these, planning an hour of study a day just on the writing system.
Months: Japanese & Chinese. For Japanese, you need to know three alphabets if you’re going to learn any at all. Secondly, two of them aren’t easy with 46 letters each, and Kanji is no joke, even if you ‘only’ learn 500. For Chinese, if you’re going to learn Hanzi, you should plan for 500 at a minimum (doable, with flashcards), and then see how far you can go. It’s a huge investment.
A week or two: Indian languages (including Hindi, Marathi, Telugu, Bengali, Punjabi and Tamil): These have lots of base letters (e.g. 47 in Devanagari, used in Hindi and Marathi), rules for combining consonants with vowels and other vowels, and some exceptions, and it gets eye-wateringly complicated for the novice.
A few days: Any non-Latin alphabet, or extended Latin alphabets: Russian, Korean, Arabic, Latin, Hebrew, Vietnamese… these are all at least slightly tricky, but spend a few sessions over a few days memorizing the letter shapes and you’re there. OK, then drill it in over a few days, as you’ll almost definitely forget it if you don’t return to it.
Maybe 10 minutes: Latin scripts. Latin languages, German, Turkish, Indonesian languages and a few others. You basically already know them! Just learn a few extra characters or pronunciation rules and you’re done.
Get learning! There are a few techniques that are useful to learn to read and write quickly and functionally; we’ll get to those later.