Onionifying Your News

Ever wanted to create your own parody news generator like The Onion? In this tutorial, we'll build a TypeScript application that scrapes news articles from the web, rewrites them in a more satirical tone using OpenAI's GPT-4 API.

The full code for this project is available here


What We'll Cover

  1. Scraping a News Article: Use Puppeteer alternatives like hyperbrowser to extract content from our news article.
  2. Transforming the Article: Use GPT 4o-mini API to rewrite the articles tone.
  3. Command-Line Integration: Allow users to pass URLs directly through the command line using the commander library.

Step 1: Setting Up the Project

First, initialize a new Node.js project:

mkdir onionify-news && cd onionify-news
yarn init -y
yarn add --dev typescript @types/node @types/marked-terminal
yarn add @hyperbrowser/sdk dotenv marked marked-terminal ora zod commander openai

You might see some extra packages (ora, marked, marked-terminal, ora, zod, and commander) in the package.json. Most of them are purely for making the project look prettier and the project would work absolutely fine without them.

Set up TypeScript by creating a tsconfig.json:

{
"compilerOptions": {
"target": "ES2022",
"module": "node16",
"allowJs": true,
"checkJs": false,
"outDir": "./build",
"emitDecoratorMetadata": true,
"experimentalDecorators": true,
"sourceMap": true,
"esModuleInterop": true,
"skipLibCheck": true,
"moduleResolution": "node16"
}
}

Step 2: Scraping a News Article

We’ll use hyperbrowser to fetch the HTML of the webpage and cheerio to parse and extract its content.

Here’s the scrapeArticle function:

import { z } from "zod";
import hyperbrowser from "@hyperbrowser/sdk";
const hb_client = new hyperbrowser.default({
apiKey: process.env.HYPERBROWSER_API_KEY as string,
});
const ArticleSchema = z.object({
title: z.string(),
body: z.string(),
author: z.string().optional(),
});
type Article = z.infer<typeof ArticleSchema>;
async function extractArticleFeaturesFromMarkdown(
text: string
): Promise<Article> {
spinner.text = "Getting article information from markdown";
spinner.start();
try {
const prompt = `From the provided markdown string, extract the features required by the response format. Stick as close as possible to the provided schema.\n${text}`;
const response = await openai.chat.completions.create({
model: "gpt-4o-mini",
messages: [
{
role: "system",
content:
"You are a data entry operator whose job is to extract certain features from a peice of text.",
},
{ role: "user", content: prompt },
],
response_format: zodResponseFormat(ArticleSchema, "article"),
temperature: 0.7,
max_tokens: 2000,
});
const parsedArticleSchema = ArticleSchema.safeParse(
JSON.parse(response.choices[0].message.content || "")
);
if (parsedArticleSchema.success) {
spinner.succeed("Got Article information from markdown");
spinner.stop();
return {
title: parsedArticleSchema.data.title,
body: parsedArticleSchema.data.body,
author: parsedArticleSchema.data.author,
};
} else {
throw new Error(
`OpenAI produced response doesn't match expected output schema.\nGot ${response.choices[0].message.content}.\n\nZod Error ${parsedArticleSchema.error}`
);
}
} catch (error) {
spinner.fail("Could not get article info from markdown");
spinner.stop();
console.error("Error generating satirical content:", error);
throw new Error("Failed to onionify article.");
}
}
async function scrapeArticle(url: string): Promise<Article> {
spinner.text = "Getting markdown features for article";
spinner.start();
try {
const jobInfo = await hb_client.startScrapeJob({
url,
useProxy: false,
solveCaptchas: false,
});
let checkCount = 0;
while (checkCount < MAX_CHECKS) {
const scrapeRes = await hb_client.getScrapeJob(jobInfo.jobId);
if (scrapeRes.status === "completed") {
if (scrapeRes.data) {
spinner.succeed("Succeeded in getting markdown from article");
spinner.stop();
return extractArticleFeaturesFromMarkdown(scrapeRes.data?.markdown);
} else {
throw new Error(
"Got undefined when extracing markdown from article. Please check"
);
}
} else if (scrapeRes.status === "failed") {
throw scrapeRes.error;
}
await sleep(1000);
}
throw new Error(
"Exceeded maximum checks for getting markdown for article."
);
} catch (err) {
spinner.fail("Failed in getting markdown from article");
spinner.stop();
console.log("Could not get article");
console.error(err);
throw err;
}
}

This function returns the article’s title, body, and author (if available). Step 3: Onionify the article using OpenAI

We'll use OpenAIs API to rewrite the article in a more oniony tone. Install the OpenAI Node.js SDK using:

yarn add openai

Here’s the onionifyArticle function:

import { OpenAI } from "openai";
const openai = new OpenAI({
apiKey: process.env.OPENAI_API_KEY, // This is the default and can be omitted
});
async function onionifyArticle(article: Article): Promise<Article> {
spinner.text = "Onionifying article";
spinner.start();
try {
const prompt = `
Rewrite the following article as if it were written for a satirical news website like The Onion.
Use humor, irony, and exaggeration to transform the content while trying to stick closely to the original intent of the article:
Title: ${article.title}
Body: ${article.body}
Author: ${article.author}
Make sure the headline is absurd or humorous, and add funny commentary in the body.
`;
const response = await openai.chat.completions.create({
model: "gpt-4o-mini",
messages: [
{
role: "system",
content:
"You are a humorous and satirical writer writing for the online newspaper `The Onion`.",
},
{ role: "user", content: prompt },
],
temperature: 0.7,
max_tokens: 2000,
});
const satiricalResponse = response.choices[0].message?.content || "";
// Split the response into a title and body (adjust parsing as needed)
const lines = satiricalResponse.split("\n");
const satiricalTitle = lines[0].replace("Title:", "").trim();
const satiricalBody = lines.slice(1).join("\n").trim();
spinner.succeed("Succesfully Onionified article ");
spinner.stop();
return {
title: satiricalTitle,
body: satiricalBody,
author: `Parodied version of ${article.author || "Unknown"}`,
};
} catch (error) {
spinner.fail("Could not onionify article");
spinner.stop();
console.error("Error generating satirical content:", error);
throw new Error("Failed to onionify article.");
}
}

Step 4: Adding Command-Line Integration

We’ll use the commander package to allow users to pass a URL via the CLI. Here’s the entry point for our script:

if (!process.env.OPENAI_API_KEY) {
console.error("Missing Open AI API Key. Exiting");
process.exit(1);
}
if (!process.env.HYPERBROWSER_API_KEY) {
console.error("Missing HyperBrowser API Key. Exiting");
process.exit(1);
}
const program = new Command();
program
.version("1.0.0")
.description("Scrape a news article and onionify it")
.argument("<url>", "The URL of the news article to scrape")
.action(async (url: string) => {
try {
console.log("Scraping the article...");
const article = await scrapeArticle(url);
console.log("\nOriginal Article:");
console.log("Title:", article.title);
console.log("Onionifying the article...");
const onionifiedArticle = await onionifyArticle(article);
console.log("\n--- Onionified Article ---");
console.log(marked(onionifiedArticle.title));
console.log(marked(onionifiedArticle.body));
} catch (error) {
// @ts-ignore
console.error("Error:", error.message);
}
});
program.parse(process.argv);

Step 5: Running the Script

Compile the script:

yarn build

Run the compiled JavaScript file, passing the article’s URL:

yarn start "https://example.com/news-article"

Example Output

asciicast

Hyperbrowser

Start building today

Launch your browser in seconds. No credit card required.