A.I. in the Workplace

MrBeast Is Officially Trying to Buy TikTok

The guy who brought you a bunch of dumb online videos wants to buy the site that distributes dumb online videos.



A new AI agent has emerged from the parent company of TikTok to take control of your computer and perform complex workflows.

Much like Anthropic’s Computer Use, ByteDance’s new UI-TARS understands graphical user interfaces (GUIs), applies reasoning and takes autonomous, step-by-step action. 

Trained on roughly 50B tokens and offered in 7B and 72B parameter versions, the PC/MacOS agents achieves state-of-the-art (SOTA) performance on 10-plus GUI benchmarks across performance, perception, grounding and overall agent capabilities, consistently beating out OpenAI’s GPT-4o, Claude and Google’s Gemini.

“Through iterative training and reflection tuning, UI-TARS continuously learns from its mistakes and adapts to unforeseen situations with minimal human intervention,” researchers from ByteDance and Tsinghua University write in a new research paper

Source: Arxiv

How UI-TARS explains its thinking

UI-TARS works across desktop, mobile and web applications, using multimodal inputs (text, images, interactions) to understand visual environments.

Its UI features two tabs — one to the left showing its step-by-step “thinking,” and a larger one to the right where it pulls up files, websites and apps and automatically takes action. 

For example, in a demo video released today, the model is prompted to “Find round trip flights from SEA to NYC on the 5th and return on the 10th next month and filter by price in ascending order.”

In response, UI-TARS navigates to the website for Delta Airlines, fills in the “from” and “to” fields, clicks in the relevant dates and sorts and filters by price, explaining each step in its thinking box before taking action. 

In another scenario, it is instructed to install an autoDocstring extension in VS Code. Here is its step-by-step thinking as it completes that task: 

  • It first reports that “to proceed with installing the autoDocstring extension in VS Code, I first need to open the VS Code application.” 
  • Recognizing that the VS Code window has not fully loaded, the model notes that “it is necessary to wait for the application to finish initializing before proceeding with the installation of the autoDocstring extension. This ensures that all required functionalities are available for the next steps.” 
  • It then explains that it must access the Extensions view in VS Code, which can be done by clicking on a tab in the left sidebar. 
  • Running into a small glitch, it reasons: “The previous attempt to click on the ‘Extensions’ tab may not have been precise enough, so I will try clicking on it again to ensure the Extensions view opens successfully.”
  • Once in the Extensions view, it types in ‘autoDocstring’ before concluding: “Since the installation process has already been initiated, there is no further action required at this moment. I will wait for the installation to complete and confirm its success once the process is finished.”

Outperforming its rivals

Across a variety of benchmarks, researchers report that UI-TARS consistently outranked OpenAI’s GPT-4o; Anthropic’s Claude-3.5-Sonnet; Gemini-1.5-Pro and Gemini-2.0; four Qwen models; and numerous academic models.

For instance, in VisualWebBench — which measures a model’s ability to ground web elements including webpage quality assurance and optical character recognition — UI-TARS 72B scored 82.8%, outperforming GPT-4o (78.5%) and Claude 3.5 (78.2%). 

It also did significantly better on WebSRC benchmarks (understanding of semantic content and layout in web contexts) and ScreenQA-short (comprehension of complex mobile screen layouts and web structure). UI-TARS-7B achieved leading scores of 93.6% on WebSRC, while UI-TARS-72B achieved 88.6% on ScreenQA-short, outperforming Qwen, Gemini, Claude 3.5 and GPT-4o. 

“These results demonstrate the superior perception and comprehension capabilities of UI-TARS in web and mobile environments,” the researchers write. “Such perceptual ability lays the foundation for agent tasks, where accurate environmental understanding is crucial for task execution and decision-making.”

UI-TARS also showed impressive results in ScreenSpot Pro and ScreenSpot v2 , which assess a model’s ability to understand and localize elements in GUIs. Further, researchers tested its capabilities in planning multi-step actions and low-level tasks in mobile environments, and benchmarked it on OSWorld (which assesses open-ended computer tasks) and AndroidWorld (which scores autonomous agents on 116 programmatic tasks across 20 mobile apps). 

Source: Arxiv
Source: Arxiv

Under the hood

To help it take step-by-step actions and recognize what it’s seeing, UI-TARS was trained on a large-scale dataset of screenshots that parsed metadata including element description and type, visual description, bounding boxes (position information), element function and text from various websites, applications and operating systems. This allows the model to provide a comprehensive, detailed description of a screenshot, capturing not only elements but spatial relationships and overall layout. 

The model also uses state transition captioning to identify and describe the differences between two consecutive screenshots and determine whether an action — such as a mouse click or keyboard input — has occurred. Meanwhile, set-of-mark (SoM) prompting allows it to overlay distinct marks (letters, numbers) on specific regions of an image. 

The model is equipped with both short-term and long-term memory to handle tasks at hand while also retaining historical interactions to improve later decision-making. Researchers trained the model to perform both System 1 (fast, automatic and intuitive) and System 2 (slow and deliberate) reasoning. This allows for multi-step decision-making, “reflection” thinking, milestone recognition and error correction. 

Researchers emphasized that it is critical that the model be able to maintain consistent goals and engage in trial and error to hypothesize, test and evaluate potential actions before completing a task. They introduced two types of data to support this: error correction and post-reflection data. For error correction, they identified mistakes and labeled corrective actions; for post-reflection, they simulated recovery steps. 

“This strategy ensures that the agent not only learns to avoid errors but also adapts dynamically when they occur,” the researchers write.

Clearly, UI-TARS exhibits impressive capabilities, and it’ll be interesting to see its evolving use cases in the increasingly competitive AI agents space. As the researchers note: “Looking ahead, while native agents represent a significant leap forward, the future lies in the integration of active and lifelong learning, where agents autonomously drive their own learning through continuous, real-world interactions.”

Researchers point out that Claude Computer Use “performs strongly in web-based tasks but significantly struggles with mobile scenarios, indicating that the GUI operation ability of Claude has not been well transferred to the mobile domain.” 

By contrast, “UI-TARS exhibits excellent performance in both website and mobile domain.” 

As a national TikTok ban loomed last week, rival social platforms prepared to snap up the app’s 170 million users. Substack added live streaming. Instagram made photos vertical and extended the length of videos. Triller launched a website that uploads TikToks to its platform. 

TikTok made a triumphant return Sunday morning after only a few hours offline. But Meta isn’t ready to give up the fight for its market share. 

Instagram is offering popular TikTok creators a cash bonus of $10,000 to $50,000 to post their videos on its app first, according to The Information. To receive this cash bonus, creators reportedly must agree to an exclusivity period—a set amount of time they’re unable to post their video on other platforms—ranging from one month to several.

A Meta spokesperson confirmed to Inc. over email that the company “recently expanded investment in content deals to support creators,” but declined to share any further details. Meta has also launched a smaller bonus program, the Breakthrough Bonus, that allows creators to earn up to $5,000 over three months by posting original videos on its platforms, the spokesperson said. This program doesn’t require creators to post exclusively on Instagram or Facebook.

TikTok’s rivals have tried this strategy before. In 2021, YouTube Shorts said it’d pay creators up to $10,000 per month for posting original content. That same year, Snapchat Spotlight also announced an initiative to pay creators for completing challenges on its app. And when Instagram itself launched TikTok replica Reels four years ago, it offered to pay some creators up to $35,000 per month. But the program fizzled by 2023.

Creators might hesitate to go all-in with another social platform right now. For many, the TikTok ban served as “a wake-up call” that they should avoid “over-reliance on a single platform,” says Mollie Lobel, affiliate and influencer community manager at content creator network BrandCycle.

Plus, TikTok isn’t gone yet, and many prefer its algorithm to Instagram’s. “As long as TikTok is available, people are going to use it,” Richard Hanna, a marketing professor at Babson College, says.

Billionaire Elon Musk and OpenAI CEO Sam Altman are fighting on X about Stargate, the enormous infrastructure project to build data centers for OpenAI across the U.S.

Stargate, announced Tuesday during a press conference at the White House, would funnel as much as $500 billion from investors including SoftBank and Middle East AI fund MGX into data centers to support OpenAI’s AI workloads. Partners in Stargate have initially pledged $100 billion, some of which is being put toward a data center under construction in Abilene, Texas.

Elon Musk claims that Stargate doesn’t have the money it says it does.

“They don’t actually have the money,” Musk wrote in a series of posts on X on Tuesday. “SoftBank has well under $10 billion secured. I have that on good authority.”

Hours later, in a reply to a post criticizing Altman, Musk said, “Sam is a swindler.”

Musk, of course, is not a neutral party. He has his own AI company, xAI, that competes — and is currently embroiled in a lawsuit — with OpenAI. In the suit, xAI and Musk accuse OpenAI of anticompetitive practices, including discouraging investors in OpenAI from backing AI rivals.

Altman fired back at Musk in an X post Wednesday — and called his bluff.

“Wrong, as you surely know,” Altman said, responding to Musk’s allegation that SoftBank was short of capital. “[Stargate] is great for the country. i realize what is great for the country isn’t always what’s optimal for your companies, but in your new role, i hope you’ll mostly put [US] first.”

In a separate post, Altman said of Musk, “I genuinely respect your accomplishments and think you are the most inspiring entrepreneur of our time.” He later added, “I don’t think [Musk is] a nice person or treating us fairly, but you have to respect the guy, and he pushes all of us to be more ambitious.”

Musk is spearheading the Department of Government Efficiency (DOGE), a U.S. government advisory commission recommending deep cuts to federal agencies. DOGE, first announced last year, was made more official Monday by President Donald Trump’s executive order — but the commission faces a series of legal challenges.

xAI, like OpenAI, is hungry for infrastructure to develop its AI systems. Musk’s company is estimated to have spent $12 billion on its single data center in Memphis and could spend billions more upgrading the facility.

Asked about Musk’s X posts during an interview at the World Economic Forum in Davos, Satya Nadella, CEO of Microsoft, a close OpenAI collaborator and investor, declined to weigh in. “All I know is, I’m good for my $80 billion,” he said, referring to Microsoft’s recent pledge to spend a record amount on AI data centers this year.

This New AI Search Engine Has a Gimmick: Humans Answering Questions

When online search engines first appeared, they seemed miraculous. Now, though? It is a truth near-universally acknowledged that search is in the dumps, corroded by spam and ads.

Big players like Google are insistent that AI is the savior of search, despite many early attempts to integrate AI ending in disaster. There’s a wave of services offering AI-powered answers, including Perplexity and OpenAI’s SearchGPT. Recently, I got an email promoting another new AI search engine—but this one has a notably quirky approach to answering questions. Called Pearl, it’s coming out of beta this week. Like other AI-powered search engines, Pearl initially answers questions by using large language models to provide answers. Then it does something unusual: It offers a human fact-check and the option to connect with experts and chat online or on the phone about the answer.

Reading about its gimmick, I didn’t really understand why it bothered with the AI answers at all. Why not just go straight to the human? I called its CEO, Andy Kurtzig, to find out.

Kurtzig stressed that Pearl is an extension of another search project he’s been working on for decades: a more traditional offering called JustAnswer, which charges a subscription and connects people to subject-matter experts based on their questions. “We started playing with the concept of AI combined with professional services about 11 years ago,” he says. When the generative AI boom took off, he decided to make Pearl a stand-alone product. (The company has had an older chatbot product named Pearl for many years and at one point rebranded JustAnswer as Pearl and then changed it back.)

Pearl’s LLM is built on top of a number of popular foundational models, including ChatGPT, and is customized to include JustAnswer’s trove of data, which includes an extensive history of questions posed and answered since it launched in 2003.

In Kurtzig’s view, Pearl lowers the barrier to entry for answers from experts. While JustAnswer costs money, Pearl has a freemium model. Its AI answers are free, as is its first layer of human fact-check, the TrustScore™, a ranking on a scale from 1 to 5 about the quality of an AI answer. When Pearl users want to go a step further and have an expert expand upon an AI answer, they are prompted to sign up for its $28-a-month service fee.

One line in particular had jumped out at me in the initial email I received about Pearl. It had claimed that Pearl would “solve many of the mounting legal challenges AI search engines face.” But … how? Kurtzig noted that most AI search engines could be legally liable under Section 230 of the Communications Decency Act for the answers they give, since they are acting more as a publisher than a platform. As Pearl incorporates human experts into its answer process, Kurtzig believes that Pearl will have the Section 230 protections that shield traditional search engines.

On top of that, he claims that Pearl is significantly less likely to provide misinformation than many other AI search engines—which he believes are likely to deal with “a tidal wave” of lawsuits based on bad answers they give. “Those other players are building amazing technologies. I call them Ferraris or Lamborghinis,” Kurtzig says. “We’re building a Volvo—safety first.”

This pitch about Pearl’s superiority, of course, made me even more keen to try it. Kurtzig seemed so certain that Pearl would still enjoy Section 230 protections. I asked the AI if it agreed.

Pearl said it likely qualifies as an “interactive computer service” under Section 230, which would mean that it’d be shielded from being treated as a publisher, just as Kurtzig suspected. But, the AI went on, “Pearl’s situation is unique because it generates content using AI.” It didn’t have a definitive answer for me after all.

When I asked to speak to a lawyer directly, it rerouted me to JustAnswer, where it asked me to provide the answer I wanted verified. I said I needed to go back and copy the answer, as it was several paragraphs long, but when I navigated back to the Pearl website, the conversation was gone and it had reset to a fresh chat.

When I tried again, this time opening the Pearl browser on desktop, I received a similarly uncertain answer. I decided to trigger a human-fact check; after several minutes, I received the TrustScore™—a measly 3!

Pearl recommended that I seek out an actual expert opinion, porting me to its subscription page. I’d been given a log-in so I didn’t have to pay while I tested the tool. It then connected me with one of its “legal eagle” experts.

Unfortunately, the lawyer’s answers were not clearer than the AI. He noted that there was ongoing legal debate about how Section 230 will apply to AI search engines and other AI tools, but when I asked him to provide specific arguments, he gave a strange answer noting that “most use shell companies or associations to file.”

When I asked for an example of one such shell company—quite confused about what that has to do with a public debate about Section 230—the “legal eagle” asked if I wanted him to put together a package. Even more confused, I said yes. I got a pop-up window indicating that my expert wanted to charge me an additional $165 to dig up the information.

I declined, frustrated.

I then asked Pearl about the history of WIRED. The AI response was serviceable, although basically the same stuff you’ll find on Wikipedia. When I asked for its TrustScore™ I was once again confronted with a 3, suggesting it was not a very good answer. I selected the option to connect with another human expert. This time around, possibly because it was a question about the media and not a straightforward legal or medical topic, it took a while for the expert to appear—well over 20 minutes. When he did, the expert (it was never established what gave him his media bona fides, although his profile indicated he’d been working with JustAnswer since 2010) gave me a remarkably similar answer to the AI. Since I was doing a free test, it didn’t matter, but I would’ve been annoyed if I had actually paid the subscription fee just to get the same mediocre answer from both a human and an AI.

For my last stab at using the service, I went for a straightforward question: how to refinish kitchen floors. This time, things went much more smoothly. The AI returned an adequate answer, akin to a transcript of a very basic YouTube tutorial. When I asked the human expert to assign a TrustScore™, they gave it a 5. It seemed accurate enough, for sure. But—as someone who really does want to DIY refinish my kitchen’s old pine planks—I think when I actually go looking for guidance, I’ll rely on other online communities of human voices, ones that don’t charge $28 a month: YouTube and Reddit.