Number patterns in passwords dataset

Alexander Fishkov

Alexander Fishkov, Ph.D. student Computer Science

Including a number sequence or a year in your login or account name is quite popular — it is even suggested by some websites in case the desired name is occupied. Following our posts on the 10 million passwords dataset, we now explore different digit sequences that occur in passwords.

Perhaps the most-used numbers in passwords are dates and years since they are easy to remember. Many users include their whole birthdate, which is obviously not security safe. We search for such patterns in the passwords dataset and obtain the following histogram. We assumed that a valid year would be from 1900 to 2015.

To find these numbers we used patterns including MMDDYYY, YYYYDDMM and YYYY as a separate number. Even though there may be some false positives (numbers incorrectly treated as years), the majority of detected dates refer to the largest demographic group of internet users. The most popular is the year 1987, appearing in approximately 30,000 passwords. We can also see local maximums at years 2000 and 2010, a notable exception among the late decades.

Naturally, we continued performing the same procedure on the usernames. Some websites suggest adding the current year or your birth year to your login name in case it is already occupied. One would expect to see a much higher detection rate on the usernames, but it turns out that only 3 percent of usernames contained dates, compared to 10 percent of passwords. Even considering the imperfections of the detection method, it is still surprising.

As in the previous case, we see an increase around 1987, but it is dominated by post-millennium years.  Local maximums at 2000 and 2010 are also present. From these observations, we can conclude that the data supports the common intuition: people usually include “beautiful” years or the year of registration in their usernames, while birthdays are more common in passwords.

We conclude this post with top digit sequences of the dataset:

Discuss this article on our forum with over 1,900,000 registered members.

About Alexander Fishkov

Alexander Fishkov

Alexander Fishkov, Ph.D. student Computer Science

Alexander is a Ph.D. student in Computer Science. He currently holds B.S. and M.S. degrees in Applied Math. He has experience working for industry major companies performing research in the fields of machine learning, data mining and natural language processing. In his free time, Alexander enjoys hiking, Nordic skiing and traveling.

Other posts by Alexander Fishkov:

4 thoughts on “Number patterns in passwords dataset”

  1. For clarification: Is the final graph representative of most popular number patterns in passwords or in both passwords and user names?

  2. I stumbled upon a post titled, “Do married and divorced have different occupations?” The very idea of whether a marriage will result in divorce based on a a couples respective occupations is ludicrous. What was intriguing though, was the graph representations of password selection representing popular patterns, numbers and letters. Now that is potentially useful information just as long as the hackers dont stumble upon the graphs..

Leave a Reply

Your email address will not be published. Required fields are marked *