Tech company grades news articles for readers, and experts advise them to be cautious as they try it
An Austin-based tech company aspires to restore trust in journalism by using machine learning models to rank news articles. A technology and democracy expert said that while the platform has promise, users need to try it themselves and better understand how it works before allowing it to guide their news consumption habits.
Otherweb was founded by Alex Fink, a technology entrepreneur, with a goal of eliminating junk from the digital news landscape.
“Otherweb is a new information platform that combines news, podcasts and many other sources of information in one place,” Fink said. “And that place doesn’t have paywalls or clickbait or any other form of digital junk.”
The platform, which launched on Aug. 1, uses machine learning models to rank news articles based on several metrics, including informativity, subjectivity, hatefulness, use of propaganda techniques and clickbait headlines.
Fink said the machine learning models require human input to fine tune their accuracy, rather than just being simple lines of code that are allowed to make decisions about rankings all on their own.
“So we generate a dataset of, let’s say, 10,000 articles and headlines,” Fink said. “We have a team of annotators go through that and mark each one, whether they think it’s clickbait or not. If something is inconclusive, we throw it out of the dataset to not confuse the model. And then we train the model to emulate what the humans do.”
Taken together, the results of the models generate a “ValuRank score,” a grade out of 100, which is presented to the user along with indicators for whether there was hateful or offensive language in the article. This information is presented in the style of a nutrition label with a bullet-point summary of the article’s contents and a link to the article.
Emma Llansó, director of the Center for Democracy & Technology’s Free Expression Project, said the use of a nutrition label style to present the ratings was an interesting choice by the company.
“It is a format that a lot of people might look at and think, ‘Yes, this carries some authority,’” Llansó said, comparing it to how people might trust a nutritional label on a bag of chips.
Despite the familiar feel of the label and the seemingly positive goal of improving media consumption, she offered some caution.
“I think what’s really interesting to me about a tool like this is it’s all about sort of trying to help people engage in critical thinking about the new sources that they’re reading, which is a very laudable goal,” Llansó said. “But we should also engage in critical thinking about the tool itself and to try to understand how is it making these different evaluations?”
Llansó said another concern was whether the platform would be collecting information about users.
“What is it looking at?” she said. “What is it trying to understand about my own behavior and activity on the web?”
Fink said the models used for his platform are publicly available. He even encouraged would-be competitors to copy them. He added that the site currently tracks no user data. Instead of selling user data to advertisers, Fink said he plans to sell advertisers the data.
Otherweb is already collecting on news articles so that the company better understands the media the ads are being placed alongside.
“If advertisers place something on CNN.com, they might want to know that what it will get placed on will pass the filters and appear on high quality platforms like the Otherweb,” Fink said.
Llansó emphasized that while machine learning models can very successfully identify things such as hateful or toxic speech, they are less competent at understanding that speech in context.
For example, if a reporter was quoting toxic or hateful speech from a politician or business figure, that could get picked up on the machine learning’s filter as just toxic speech and hurt the article’s rating.
“So I think the real risk with relying too heavily on machine learning tools is they can kind of over-promise, they can sort of declare in black and white on something that looks like a nutrition label,” Llansó said. “And in reality, what these tools are doing is giving an assessment of probability. And that assessment itself can be biased or constrained in different ways based on how the tool is developed.”
Fink acknowledged the difficulty of context for his machine learning models.
“It does happen in our case where an article quotes somebody saying something hateful, and (Otherweb) would actually decide that this makes the article somewhat hateful,” Fink said. “That is a problem we would like to solve at some point, but natural language processing at this point is not as bulletproof as we’d like it to be.”
Fink said there were plans for major expansions of the new platform, including launching an app for Apple and Android and expanding the platform to cover podcasts, books and Wikipedia pages.
For now, Otherweb can only scrape off of websites without paywalls, so articles from the New York Times or the Wall Street Journal are unavailable, but news from Reuters or ABC News are available.
Fink said he hoped platforms like Otherweb will change incentives for news dissemination and consumption.
“The reason we’re seeing the information ecosystem as broken as it is is that most content gets monetized by advertising, and most advertising is pay per click or pay per view,” Fink said. “There is no pay per quality or pay per truth or anything like that. And so, over time, the entire ecosystem is essentially drifting toward maximizing clicks and views.”