BABEL

Why you probably won’t understand the web of the future

It’s Facebook, Jim, but not as we know it.
It’s Facebook, Jim, but not as we know it.
Image: Reuters/David Mercado
By

This item has been corrected.

The giants of the connected world are finally waking up to one of the biggest obstacles in their stated missions of connecting billions more people to the internet: The language barrier.

This week alone, Google announced the “Indian Language Internet Alliance,” which aims to get half a billion Indians online by 2017 by serving them content in local languages, and there are indications Facebook is already defaulting to local languages in India. Facebook’s head of internationalization and localization published a long piece about “The Internet’s Language Barrier” in Innovations, a quarterly journal from MIT; and Mozilla and GSMA, a trade body of mobile operators, published a white paper titled ”Unlocking relevant Web content for the next 4 billion people.”

Language barriers in globalization are hardly a new issue. So why the sudden drive for polyglotism? It’s simple: As mobile operators and web giants try to expand their markets by bringing more people online, we have reached a tipping point where the imbalance of content on the internet has become too stark to avoid.

“A lot of the content online is about very few places and those are the places you might imagine: Western Europe, Japan, Korea, North America,” says Mark Graham, an associate professor who looks at information geographies at the Oxford Internet Institute. “And a lot of the contribution to the internet comes from those very same places.”

Image for article titled Why you probably won’t understand the web of the future

English and its discontents

The English domination of the web is completely divorced from the language’s presence in the human population. “Just over half (55.8%) of Web content is estimated to be in English despite the fact that less than 5% of the world’s population speak it as a first language, with only 21% estimated to have some level of understanding,” according to GSMA and Mozilla (pdf). “By contrast, some of the world’s most widely spoken languages, such as Arabic or Hindi, account for a relatively small proportion of the Web’s content (0.8% and less than 0.1% respectively).”

Internet access doesn’t fully explain the imbalance.  The Middle East scores much lower than you would expect, given how many people in that part of the world are are online, says Graham. There are some 7,000 languages spoken around the world today. Yet Facebook is available in just 75, with another 40 in translation, writes Facebook’s Iris Orriss, who runs its localization efforts. Even fewer are supported on mobile devices. Hindi, spoken by more than 250 million people, wasn’t available on Google’s Android operating system until last year, as Quartz has reported.

Today, some 80% of the web remains dominated by just 10 languages. If Facebook, Google, and GSMA are serious about their professed goals to connect vast numbers of people to the internet (everyone in the case of Facebook and Google, another billion by 2020 in the case of GSMA), a good starting point is to give the speakers of the other 6,990 languages something to do when they come online.

Different paths to the same goal

Google, for its part, has tied up with a number of local-language publishers in India and launched a website called Hindiweb.com, a sort of low-rent one-stop shop for all the Hindi content from its partners. That is only the start. Medianama, an Indian tech website focusing on policy, lists a number of other steps that will be necessary before Google can achieve its lofty goal of bringing 500 million more Indians online. These include obvious measures, such as making the web friendlier to speakers of other large languages, especially Telugu and Tamil, and more technical ones such as improving search in Indian languages, creating a local-language Play store, and working with device makers to standardize and pre-install fonts and rendering.

The internet remains dominated by the US.
The internet remains dominated by the US.
Image: Mark Graham, Scott A. Hale, Monica Stephens/Oxford Internet Institute

Making the web more usable for non-English speakers doesn’t stop at language. Facebook’s Orriss cites the example of Russia, where some users enter their names in the roman script and other in Cyrillic. This causes a problem, she writes: “You are searching for your friends’ names in Cyrillic, but some of them registered using Roman script. Therefore, when you type a friend’s name into the search field, the software has to search for the name in both scripts using a common conversion algorithm—in essence, it has to understand this cultural norm of your native language.”

Another example is color. In the West, red is associated with danger or bad news, while in China it means good news. Any company serious about serving a global audience needs to take such subtle cues into account.

To better understand the problem, GSMA and Mozilla are conducting “field tests” in Bangladesh, Kenya, Brazil, and India, with the hope of building ”a coalition of mobile operators, device manufacturers, educators, international development donors, and NGOs who are interested in positively shaping the future of the Web.” The idea is to create local alternatives to the Android-Apple duopoly in order to have “a more dispersed digital content ecosystem, in terms of how content is created, distributed and monetized.”

Noble as they sound, all these efforts are, at heart, good business. As growth slows in the West, these companies must necessarily look East. Thriving local-language internets exist in China and Russian-speaking countries (known as the RuNet), where Western firms have had a hard time breaking in. India and Africa, on the other hand, still present opportunity: They have underdeveloped local-language online ecosystems, and pose no big political or regulatory hurdles for Western firms.

Correction (Nov. 7, 2014): An earlier version of this article misstated the number of languages in which Facebook is available as 70. The correct number is 75. Apologies.