Number patterns in passwords dataset

Alexander Fishkov

Alexander Fishkov, Ph.D. student Computer Science

Including a number sequence or a year in your login or account name is quite popular — some websites even suggest it in case your desired username is already taken. Following our posts on the 10 million passwords dataset, we now explore different digit sequences that occur in passwords.

Perhaps the most-used numbers in passwords are dates and years since they are easy to remember. Many users include their whole birthdate, which is obviously not a very secure option. We searched for such patterns in the passwords dataset and created the following histogram. We assumed that a valid year would be from 1900 to 2015.

To find these numbers we used patterns including MMDDYYYY, YYYYDDMM and YYYY as a separate number. Even though there may be some false positives (numbers incorrectly treated as years), the majority of detected dates refer to the largest demographic group of internet users. The most popular is the year 1987, appearing in approximately 30,000 passwords. We can also see local maximums at years 2000 and 2010, a notable exception among the late decades.

Naturally, we continued performing this same procedure on the usernames. Some websites suggest adding the current year or your birth year to your login name in case your desired name is already in use. One would expect to see a much higher detection rate on the usernames, but it turns out that only 3 percent of usernames contained dates, compared to 10 percent of passwords. Even considering the imperfections of our detection method, the result is still surprising.

As in the previous case, we see an increase around 1987, but it is dominated by post-millennium years.  Local maximums at 2000 and 2010 are also present. From these observations, we can conclude that the data supports the common intuition: People usually include “beautiful” years or the year of registration in their usernames, while birthdays are more common in passwords.

We conclude this post with the top digit sequences of the dataset:

Discuss this article on our forum with over 1,900,000 registered members.

About Alexander Fishkov

Alexander Fishkov

Alexander Fishkov, Ph.D. student Computer Science

Alexander is a Ph.D. student in Computer Science. He currently holds B.S. and M.S. degrees in Applied Math. He has experience working for industry major companies performing research in the fields of machine learning, data mining and natural language processing. In his free time, Alexander enjoys hiking, Nordic skiing and traveling.

Other posts by Alexander Fishkov:

4 thoughts on “Number patterns in passwords dataset”

  1. For clarification: Is the final graph representative of most popular number patterns in passwords or in both passwords and user names?

  2. I stumbled upon a post titled, “Do married and divorced have different occupations?” The very idea of whether a marriage will result in divorce based on a a couples respective occupations is ludicrous. What was intriguing though, was the graph representations of password selection representing popular patterns, numbers and letters. Now that is potentially useful information just as long as the hackers dont stumble upon the graphs..

Leave a Reply

Your email address will not be published. Required fields are marked *