Onionifying Your News
Ever wanted to create your own parody news generator like The Onion? In this tutorial, we'll build a TypeScript application that scrapes news articles from the web, rewrites them in a more satirical tone using OpenAI's GPT-4 API.
The full code for this project is available here
What We'll Cover
- Scraping a News Article: Use Puppeteer alternatives like
hyperbrowser
to extract content from our news article. - Transforming the Article: Use
GPT 4o-mini
API to rewrite the articles tone. - Command-Line Integration: Allow users to pass URLs directly through the command line using the
commander
library.
Step 1: Setting Up the Project
First, initialize a new Node.js project:
mkdir onionify-news && cd onionify-newsyarn init -yyarn add --dev typescript @types/node @types/marked-terminalyarn add @hyperbrowser/sdk dotenv marked marked-terminal ora zod commander openai
You might see some extra packages (ora, marked, marked-terminal, ora, zod, and commander) in the package.json
. Most of them are purely for making the project look prettier and the project would work absolutely fine without them.
Set up TypeScript by creating a tsconfig.json:
{"compilerOptions": {"target": "ES2022","module": "node16","allowJs": true,"checkJs": false,"outDir": "./build","emitDecoratorMetadata": true,"experimentalDecorators": true,"sourceMap": true,"esModuleInterop": true,"skipLibCheck": true,"moduleResolution": "node16"}}
Step 2: Scraping a News Article
We’ll use hyperbrowser to fetch the HTML of the webpage and cheerio to parse and extract its content.
Here’s the scrapeArticle function:
import { z } from "zod";import hyperbrowser from "@hyperbrowser/sdk";const hb_client = new hyperbrowser.default({apiKey: process.env.HYPERBROWSER_API_KEY as string,});const ArticleSchema = z.object({title: z.string(),body: z.string(),author: z.string().optional(),});type Article = z.infer<typeof ArticleSchema>;async function extractArticleFeaturesFromMarkdown(text: string): Promise<Article> {spinner.text = "Getting article information from markdown";spinner.start();try {const prompt = `From the provided markdown string, extract the features required by the response format. Stick as close as possible to the provided schema.\n${text}`;const response = await openai.chat.completions.create({model: "gpt-4o-mini",messages: [{role: "system",content:"You are a data entry operator whose job is to extract certain features from a peice of text.",},{ role: "user", content: prompt },],response_format: zodResponseFormat(ArticleSchema, "article"),temperature: 0.7,max_tokens: 2000,});const parsedArticleSchema = ArticleSchema.safeParse(JSON.parse(response.choices[0].message.content || ""));if (parsedArticleSchema.success) {spinner.succeed("Got Article information from markdown");spinner.stop();return {title: parsedArticleSchema.data.title,body: parsedArticleSchema.data.body,author: parsedArticleSchema.data.author,};} else {throw new Error(`OpenAI produced response doesn't match expected output schema.\nGot ${response.choices[0].message.content}.\n\nZod Error ${parsedArticleSchema.error}`);}} catch (error) {spinner.fail("Could not get article info from markdown");spinner.stop();console.error("Error generating satirical content:", error);throw new Error("Failed to onionify article.");}}async function scrapeArticle(url: string): Promise<Article> {spinner.text = "Getting markdown features for article";spinner.start();try {const jobInfo = await hb_client.startScrapeJob({url,useProxy: false,solveCaptchas: false,});let checkCount = 0;while (checkCount < MAX_CHECKS) {const scrapeRes = await hb_client.getScrapeJob(jobInfo.jobId);if (scrapeRes.status === "completed") {if (scrapeRes.data) {spinner.succeed("Succeeded in getting markdown from article");spinner.stop();return extractArticleFeaturesFromMarkdown(scrapeRes.data?.markdown);} else {throw new Error("Got undefined when extracing markdown from article. Please check");}} else if (scrapeRes.status === "failed") {throw scrapeRes.error;}await sleep(1000);}throw new Error("Exceeded maximum checks for getting markdown for article.");} catch (err) {spinner.fail("Failed in getting markdown from article");spinner.stop();console.log("Could not get article");console.error(err);throw err;}}
This function returns the article’s title, body, and author (if available). Step 3: Onionify the article using OpenAI
We'll use OpenAIs API to rewrite the article in a more oniony tone. Install the OpenAI Node.js SDK using:
yarn add openai
Here’s the onionifyArticle function:
import { OpenAI } from "openai";const openai = new OpenAI({apiKey: process.env.OPENAI_API_KEY, // This is the default and can be omitted});async function onionifyArticle(article: Article): Promise<Article> {spinner.text = "Onionifying article";spinner.start();try {const prompt = `Rewrite the following article as if it were written for a satirical news website like The Onion.Use humor, irony, and exaggeration to transform the content while trying to stick closely to the original intent of the article:Title: ${article.title}Body: ${article.body}Author: ${article.author}Make sure the headline is absurd or humorous, and add funny commentary in the body.`;const response = await openai.chat.completions.create({model: "gpt-4o-mini",messages: [{role: "system",content:"You are a humorous and satirical writer writing for the online newspaper `The Onion`.",},{ role: "user", content: prompt },],temperature: 0.7,max_tokens: 2000,});const satiricalResponse = response.choices[0].message?.content || "";// Split the response into a title and body (adjust parsing as needed)const lines = satiricalResponse.split("\n");const satiricalTitle = lines[0].replace("Title:", "").trim();const satiricalBody = lines.slice(1).join("\n").trim();spinner.succeed("Succesfully Onionified article ");spinner.stop();return {title: satiricalTitle,body: satiricalBody,author: `Parodied version of ${article.author || "Unknown"}`,};} catch (error) {spinner.fail("Could not onionify article");spinner.stop();console.error("Error generating satirical content:", error);throw new Error("Failed to onionify article.");}}
Step 4: Adding Command-Line Integration
We’ll use the commander package to allow users to pass a URL via the CLI. Here’s the entry point for our script:
if (!process.env.OPENAI_API_KEY) {console.error("Missing Open AI API Key. Exiting");process.exit(1);}if (!process.env.HYPERBROWSER_API_KEY) {console.error("Missing HyperBrowser API Key. Exiting");process.exit(1);}const program = new Command();program.version("1.0.0").description("Scrape a news article and onionify it").argument("<url>", "The URL of the news article to scrape").action(async (url: string) => {try {console.log("Scraping the article...");const article = await scrapeArticle(url);console.log("\nOriginal Article:");console.log("Title:", article.title);console.log("Onionifying the article...");const onionifiedArticle = await onionifyArticle(article);console.log("\n--- Onionified Article ---");console.log(marked(onionifiedArticle.title));console.log(marked(onionifiedArticle.body));} catch (error) {// @ts-ignoreconsole.error("Error:", error.message);}});program.parse(process.argv);
Step 5: Running the Script
Compile the script:
yarn build
Run the compiled JavaScript file, passing the article’s URL:
yarn start "https://example.com/news-article"