For the Nieman Lab Predictions for Journalism for 2024, I wrote about how text generation tools dominated last year. However, 2024 will mark a shift towards multimodal AI, opening unprecedented opportunities for publishers.
In part two of my conversation with Aliya Itzkowitz and Sam Gould from Financial Times Strategies, we discuss use cases for multimodal AI and autonomous AI agents for publishers.
Multimodal AI allows models to take in combinations of inputs like text, images, and video and convert them into different types of outputs. This technology holds great potential for newsrooms, given the variety of applications it enables. A newsroom's assets are more than just text - they have video, images, audio. Multimodal AI unlocks new possibilities for internal tools and use cases with all this content.
GPT-Vision is an example of multimodal AI you can try today through ChatGPT. It takes an image input as a prompt and answers questions about the image. One example of a use case involves inputing Figma designs for ChatGPT to assess. Here's an example where ChatGPT evaluates a Figma design for a news app tailored for a millennial audience, offering a thorough critique that spans layout, design elements, color scheme, user flow, and accessibility considerations.
The next frontier is autonomous AI agents
Unlike ChatGPT which operates within a conversational framework requiring ongoing prompting, AI agents are designed to undertake tasks autonomously, working systematically to achieve complex objectives.
These agents are programmed to execute a sequence of actions toward accomplishing intricate goals.
Currently, a major use case for AI agents is helping with coding. Smol Developer is an open-source AI agent that can generate entire codebases from simple prompts acting as a "personal junior developer." I’ve used it and this is a tool that dramatically accelerates the development process, as illustrated in this 6-minute demo video where Smol creates a Chrome extension. (Note: Basic coding knowledge is necessary to use it effectively.)
AI agents are also transforming call center operations. Companies like Voxia are pioneering in this field, developing lifelike AI voice agents for handling various call center tasks. Their demo showcases the potential of AI in customer interaction.
AI agents are still experimental but have immense potential to take AI beyond ChatGPT-style interactions.
My discussion with Aliya and Sam from FT Strategies explored the potential of multimodal AI and AI agents in publishing. This follows the first part of our episode, where they shared insights from the FT Strategies AI Design Sprint.