We analyzed 4 million data points to see what makes it to the front page of Reddit. Here’s what we learned.

DataStories,posted on 4th February 2016
RedditContent analysisSocial mediaSentiment analysis

Why people are so damn interested in getting to the front page of Reddit?

Look at any viral stories on news outlets like CNN.

Look at any viral videos floating around Facebook.

Look at any viral images going around on BuzzFeed.

Look at any trending story on Twitter.

… the breeding ground for all these viral stories often is Reddit.

“Reddit Scraping” is a common practice by news agencies to figure out which stories will be the most popular on their own websites.

Reddit headline Turns into CNN headline Turns into BuzzFeed headline

“Reddit Scraping” is such a common practice now because several thousand random people already UPVOTED these images proving the story is “viral”.

This is why if something hits the front page of Reddit, you can expect to see it popping up on news sources and social media within hours (if not faster).

For example, let’s look at the front page of Reddit and make it into a “Viral Story” and re-adapt it into articles for different sites:

As soon as something hits the front page, media outlets from around the world feature these posts. For example, a Reddit front page giraffe story got on CNN.com homepage in 2 hours.

Our Reddit Scraping Process to get 4 million+ data points :

We noticed that the results on the first page were changing very quickly, and set out to scrape the rankings every two minutes. We started on December 16, 2015 and continued up until January 8, 2016.

This resulted in the following data:

We scraped the top 100 posts every 2 minutes for 22 days, 3 metrics per position [score/upvotes, number of comments and rank] = 1,584,000 * 3 = 4,572,000 and 15 metrics related to each post out of 2,344 unique posts = 35,160 metrics.

Stats for the top 100 posts on the reddit page:

We found 2,344 unique posts appearing on the front page in the three-week period - this is about 106 different posts per day all getting in and out of the front page!

  • Top 100 ranks are collected every 2 minutes - 720 rankings in total.
    ….we later aggregated them to 15-minute intervals - 96 rankings per day.
  • For each post and we look at the headline, content if it is a text post, subreddit, number of subscribers per subreddit, is it an image, video, podcast, is it an 18+, what is the sentiment, or emotional polarity of the headline, is it an internal Reddit's self-post or an external post (e.g. to a photo hosted on Imgur).
  • For each post at each moment we capture the reddit score, the number of comments, and the rank of the post.
  • Then we deleted the posts which were in the top 100 for less than 2 minutes, and got 8,000+ posts.

There is quite a bit of action in the Reddit's top 100!

Here is what we THOUGHT we would see:

Before we got the Reddit's data we thought that the number of comments drives the upvotes of Reddit posts and causes them to appear on the front page. And yet - posts with a small number of comments can often get super high scores and appear in the top 25 of all subreddits, and this does not only happen with images!

Reddit is also notorious in letting negative and cynical posts thrive - maybe due to the main demographic. We wanted to check whether it is actually true, or just an impression because the eye might simply be noticing negative statements better.

These interesting observations were made (and validated) exclusively based on the data. Here is what we discovered:

FINDING #1: Starting at 9am PST is the fastest time for getting upvotes

If we look at the average evolution of upvotes over a day, we see that the scores on the front page are growing significantly starting from 9am morning PST, reaching the stable peak between 5pm and 9pm PST.

9am is the fastest time to get upvotes Best time to post

FINDING #2: For text posts, Very Positive or Very Negative posts perform significantly better than Neutral ones.

In general, there is no relationship between the sentiment, or how positive, negative, or neutral the headline of the post itself is and it's popularity. However, if we look at the Reddit's own textual posts (no images), then among them all polar post headlines (positive or negative) perform significantly better than the neutral ones.

Best chance to get a text post upvoted? Very Positive or Very Negative

FINDING #3: Textual self-posts with postitive headlines stay on Reddit’s front page significantly longer

Positive textual self-posts stay longer on the frontpage

FINDING #4: Images get much more upvotes than text posts.

Images are definitely performing much better than textual posts in terms of the maximal scores, while the textual posts get significantly more comments.

Image posts get more upvotes than text posts

FINDING #5: However….text posts get more comments and stay on the front page longer

Even though image post on average get more upvotes…..text posts tend to get much more comments and stay on the front page longer.

This is probably a text post can cause more of a conversation than an image post usually can.For example, the image post below is a cute picture of a dog cuddling a baby… which attracted a massive 4,804 upvotes, but only a measly 301 comments.

The text post however got only 1622 upvotes, and a massive 3,449 comments… because the post inherently lends itself to lively conversation.

Image vs Text post example

FINDING #6: There are 5 Sub-Reddits that completely dominate the front page of Reddit.

Subreddits r/funny, r/pics, r/gifs, r/TodayILearned, r/gaming dominate the Reddit's front page. Posts from these subreddits get bigger scores, higher ranks, and are most frequently present in the top 25 of Reddit.

Five sub-reddits completely dominate the front page

FINDING #7: The average life of a post on Reddit's front page is 4 hours and 15 minutes.

The average life of a post on the Reddit's front page is 4 hours and 15 minutes. Some posts disappear after 15 minutes, some live for as long as 18 hours. Interestingly, textual self-posts with a positive headline live on the front page significantly longer than the ones with a neural or even negative headline. It pays off to have a positive headline, even if your post is negative in content.

Top page Reddit posts stay on the front page an average of 4 hours 15 minutes

FINDING #8: The average scores of Reddit's posts labeled as 18+ are significantly higher at night. Hmm...why would this be? ;-)

Scores of posts labeled as 18+ are higher

FINDING #9: Putting a number in your headline increases chances of being among the top posts.

There are a lot more posts with numbers in the headline among the top posts. These posts are also getting slightly higher scores (the difference is statistically significant) than the posts without any numbers is the headline.

Putting a number in your headline increases chances

FINDING #10: The number of comments of posts on the front page are 5.5 times higher than of posts in the top 100 on average.

Not only are posts on the front page ranked much higher, but they are actually 8 times higher in medians than the other posts in the top 100.

Click for interactive graphs

HOW IMPORTANT ARE SUB-REDDITS?

You can’t just “post something on Reddit.” You must post in a Sub-Reddit.

EXAMPLE: A funny picture of a guy getting in the face with a water balloon would go on the sub-reddit r/funny.

ANOTHER EXAMPLE: A gif post about David Beckham kicking a really cool soccer shot would go under r/sports.

Some of the superpopular sub-Reddits are:

  • r/AskReddit with 10,230,000+ subscribers.
  • r/funny with 10,177,000+ subscribers.
  • r/TodayILearned with 10,089,200+ subscribers.

Each of these three examples has more than 10,000,000 subscribers….and estimations of 10x that number in “lurkers” (people on Reddit who read the content but don’t create a Reddit account).

By comparison, the average cable channel in the United States has between 500,000 and 3,000,000 subscribers.

The magic of Reddit is that there are smaller sub-Reddits for very specific interests like:

  • /r/photoshopbattles with 5,090,772+ subscribers (a channel for Photoshop Battles).
  • r/TwoXChromosomes with 4,254,940+ (a channel for women’s issues).
  • r/Europe with 547,500+ (a channel about…well…Europe).
  • r/Belgium with 24,550+ subscribers. (a channel about Belgium)

But what's interesting is that posts from small subreddits still have a chance to appear on the Reddit's front page, and frequently make it to the front page.

This means if you’re trying to get to the Reddit front page, you don’t have to only post in super-popular sub-Reddits.

So we watched the top rated posts on the frontpage of Reddit for 22 days and kept tracking data on multiple things:

  • How their ranks are related (if at all) with their upvotes and comments.
  • How they get there.
  • How long they stay and why?

We mean... who DOESN’T want to know how to get on Reddit's first page?

Which sub-Reddits perform best?

Reddit pre selects the top 50 subreddits to pull their best performing posts on it's front page. We looked at how frequently the posts from each subreddit get into top 25 and got interesting stats:

Top Sub-Reddit:

r/funny has the highest fraction of posts - 9.8%!

Second best performing Sub-Reddit (a two-way tie):

r/pics = 7.2% of all posts.

r/gifs = 7.2% of all posts.

Third best performing Sub-Reddit (another two-way tie):

r/todayilearned = 5.6% of all top posts.

r/gaming = 5.6% of all top posts.

HOVER over the graph to see a the most viewed Sub-Reddits:

For the secret Geeks in the audience…

You'll be happy to know the sub-reddit /r/science is fourth most popular subreddit if we look at the top 100, but in the top 25 (front page posts) it really falls down to the 14-th place among the 48 default subs sending posts to the front page.

Should You write Negative or Positive or Neutral Posts?:

We thought only negative stuff would work on Reddit (that’s the reputation it gets), but the truth is Reddit is not that bad at all!

The posts are mostly neutral in all Sub-Reddits.

We looked at sentiment distributions of the top 2,344 front page posts per Sub-Reddit.

The most positive Sub-Reddits were:

  • /r/AskReddit,
  • /r/LifeProTips
  • /r/GetMotivated.

The most neutral Sub-Reddits were:

  • /r/WritingPrompts
  • /r/WorldNews

The most negative Sub-Reddits were:

  • /r/AskScience (we found this surprising),
  • /r/ShowerThoughts,
  • /r/MildlyInteresting.

HOVER over the graph for MORE INFO:

How long does it take to get to the front page of Reddit?

It turns out, that on average the posts life on the front page for 4 hours and 15 minutes:

The time to get to the front page varies among posts:

  • Some enter posts that appear in the top 100 immediately enter in the top 25, but it takes some up to 10.5 hours to climb up to the top 25.
  • More than a half of the the posts get into the top 25 in under 2 hours!
Histogram of time to get to the front-page This means that within 2-7 hours, someone can go from totally unknown, to a worldwide internet celebrity

This graph shows how long Front Page posts stay on the front page. For example:

  • 80% of the posts last on the front page for at least 1 hour.
  • 40% of the posts last on the front page for at least 5 hours.
  • 20% of the posts last on the front page for at least 10 hours.
  • 1% of the posts last on the front page for at least 18 hours.
Duration of posts on front page

In almost every case, no posts live on the front page for over 19 hours. This is why Reddit is so addictive, there’s always new content!

Time of posts on front page Two hours to get on, four hours to get off

Some other interesting facts about staying on the front page of Reddit:

  • Top page Reddit posts stay on the front page an average of 4 hours 15 minutes.
  • The average lifetime on the front page of an image is 3.5 hours, while the text posts live for 4 hours and 45 minutes on average.
  • Internal self-posts LIVE SIGNIFICANTLY LONGER than external posts.
  • The average lifetime of a Reddit's self post is 5 hours and 15 minutes.
  • The average lifetime of an external post is only 3 hours and 45 minutes.
  • The average lifetime of text posts with a positive headline is significantly longer than the lifetime of posts with a neutral or negative headline.
  • Textual self-posts with positive headlines stay significantly longer on the front page.

The Here’s the Mega List of Reddit Data Analysis Findings!

TLDR…

  1. Top page Reddit posts stay on the front page an average of 4 hours 15 minutes.
  2. The average lifetime on the front page of an image is 3.5 hours, while the text posts live for 4 hours and 45 minutes on average.
  3. Internal self-posts LIVE SIGNIFICANTLY LONGER than external posts.
  4. The average lifetime of a Reddit's self post is 5 hours and 15 minutes.
  5. The average lifetime of an external post is only 3 hours and 45 minutes.
  6. The average lifetime of text posts with a positive headline is significantly longer than the lifetime of posts with a neutral or negative headline.
  7. Textual self-posts with positive headlines stay significantly longer on the front page.
  8. Starting at 9am PST is the fastest time for getting upvotes.
  9. For text posts, Very Positive or Very Negative posts perform significantly better than Neutral ones.
  10. Images get much more upvotes than text posts.
  11. However….text posts get more comments and stay on the front page longer.
  12. There are 5 Sub-Reddits that completely dominate the front page of Reddit.

If you want to hear about when DataStories puts out a new article, signup your email address at the bottom of the page!

P.S. We’d love if you shared this article with colleagues or friends who would find it interesting.

P.P.S. Check other data stories here

Talk to us about how you can turn your data into a system to deliver success

Our core expertise is in business-driven applications of predictive analytics and data science to solve complex business challenges which directly impact the bottom line.