Pipal, Password Analyser

On most internal pen-tests I do I generally manage to get a password dump from the DC. To do some basic analysis on this I wrote Counter and since I originally released it I've made quite a few mods to it to generate extra stats that are useful when doing reports to management.

Recently a good friend, n00bz, asked on Twitter if anyone had a tool that he could use to analyse some passwords he had. I pointed him to Counter and said if he had any suggestions for additions to let me know. He did just that and over the last month between us we have come up with a load of new features which we both think will help anyone with a large dump of cracked passwords to analyse. We also got some input from well known password analysts Matt Weir and Martin Bos who I'd like to give a big thanks to.

I have to point out before going on, all this tool does is to give you the stats and the information to help you analyse the passwords. The real work is done by you in interpreting the results, I give you the numbers, you tell the story.

Seeing as there have been so many changes to the underlying code I also decided to change the name (see why) and do a full new release.

Modular Release

Over the past few months I've been rewriting Pipal to make it modular rather than a huge, monolithic lump. Rather than try to add the extra information here, I've written a short blog post about it.

Pipal Goes Modular

Version 2

Version 2 - Two big changes, the first a massive speed increase. This patch was submitted by Stefan Venken who said a small mention would be good enough, I want to give him a big mention. Running through the LinkedIn lists would have taken many many hours on version 1, version 2 went through 3.5 million records in about 15 minutes. Thank you.

Second change is the addition of US area and zip code lookups. This little feature gives some interesting geographical data when ran across password lists originating in the US. The best example I've seen of this is the dump from the Military Singles site where some passwords could be obviously seen to be grouped around US military bases. People in the UK don't have the same relationship with phone numbers so I know this won't work here but if anyone can suggest any other areas where this might be useful then I'll look at building in some kind of location awareness feature so you can specify the source of the list and get results customized to the correct area or just run every area and see if a pattern emerges.

A non-code-base change is for version 2 is the move from hosting the code myself to github. This is my first github hosted project so I may get things wrong, if I do, sorry. A number of people asked how they could submit patches so this seems like the best way to do it, lets hope it works out. See the Download section for more info.

Worked Example

So, what does Pipal do? The easiest way to explain this is to show the output generated by parsing a leaked password list. I've chosen the list of passwords from the phpBB leak which I grabbed from the SkullSecurity site.

The first output is the number of entries in the file parsed and the number of unique entries found. Unfortunately the list I chose has already been ran through unique so these two figures match in this example.

Total entries = 184373
Total unique entries = 184373

The top 10 passwords. In this situation the list I chose has already been passed through a filter to strip any duplicates, this is why each word only appears once. The cap of showing the top 10 is configurable by a parameter on the command line, I'd suggest playing with this limit as sometimes the next entry is the one that starts to explain things.

Top 10 passwords
123456 = 1 (0.0%)
password = 1 (0.0%)
phpbb = 1 (0.0%)
qwerty = 1 (0.0%)
12345 = 1 (0.0%)
12345678 = 1 (0.0%)
letmein = 1 (0.0%)
111111 = 1 (0.0%)
1234 = 1 (0.0%)
123456789 = 1 (0.0%)

The next list is the number of base words. I define a base word as a word with any non-alpha character stripped from the start and end. This is useful to identify common words such as company names or places which the passwords have been based on. I did consider stripping all non-alpha but in one of the lists I tested on I found the base word "un1c0rn". Leaving the non-alpha in the word makes sense, removing them you get "uncrn" which doesn't really mean anything.

Unsurprisingly as this list came from phpBB the top word that passwords are based on is "phpbb", "password" next is another obvious base word but then "dragon" is one I wouldn't have expected.

Top 10 base words
phpbb = 332 (0.18%)
password = 89 (0.05%)
dragon = 76 (0.04%)
pass = 70 (0.04%)
mike = 69 (0.04%)
blue = 67 (0.04%)
test = 66 (0.04%)
qwerty = 59 (0.03%)
alex = 58 (0.03%)
alpha = 53 (0.03%)

Lengths are next, fairly self explanatory. It is a shame that the people who put the effort in and had greater than 20 character passwords still got theirs leaked.

I hope that the 948 three and under words are a mistake made when cracking the list.

Password length (length ordered)
1 = 33 (0.02%)
2 = 138 (0.07%)
3 = 777 (0.42%)
4 = 4597 (2.49%)
5 = 8199 (4.45%)
6 = 42069 (22.82%)
7 = 32731 (17.75%)
8 = 55338 (30.01%)
9 = 19187 (10.41%)
10 = 11897 (6.45%)
11 = 4934 (2.68%)
12 = 2506 (1.36%)
13 = 1019 (0.55%)
14 = 516 (0.28%)
15 = 233 (0.13%)
16 = 126 (0.07%)
17 = 37 (0.02%)
18 = 28 (0.02%)
19 = 10 (0.01%)
20 = 9 (0.0%)
21 = 6 (0.0%)
22 = 3 (0.0%)
23 = 4 (0.0%)
25 = 2 (0.0%)
27 = 3 (0.0%)
28 = 2 (0.0%)
32 = 4 (0.0%)

Password length (count ordered)
8 = 55338 (30.01%)
6 = 42069 (22.82%)
7 = 32731 (17.75%)
9 = 19187 (10.41%)
10 = 11897 (6.45%)
5 = 8199 (4.45%)
11 = 4934 (2.68%)
4 = 4597 (2.49%)
12 = 2506 (1.36%)
13 = 1019 (0.55%)
3 = 777 (0.42%)
14 = 516 (0.28%)
15 = 233 (0.13%)
2 = 138 (0.07%)
16 = 126 (0.07%)
17 = 37 (0.02%)
1 = 33 (0.02%)
18 = 28 (0.02%)
19 = 10 (0.01%)
20 = 9 (0.0%)
21 = 6 (0.0%)
23 = 4 (0.0%)
32 = 4 (0.0%)
22 = 3 (0.0%)
27 = 3 (0.0%)
25 = 2 (0.0%)
28 = 2 (0.0%)

Next a nice graph showing the length data, I'm quite proud of getting this in.

        |                                                               
        |                                                               
        |                                                               
      | |                                                               
      | |                                                               
      | |                                                               
      |||                                                               
      |||                                                               
      |||                                                               
      |||                                                               
      ||||                                                              
      ||||                                                              
      |||||                                                             
     ||||||                                                             
    ||||||||                                                            
|||||||||||||||||||||||||||||||||                                       
000000000011111111112222222222333
012345678901234567890123456789012

Some more self explanatory information comes next. 30% of people chose a 1-6 character password and 40% chose one that contained only lowercase alpha characters.

One to six characters = 55807 (30.27%)
One to eight characters = 143874 (78.03%)
More than eight characters = 40507 (21.97%)

Only lowercase alpha = 76041 (41.24%)
Only uppercase alpha = 1706 (0.93%)
Only alpha = 77747 (42.17%)
Only numeric = 20728 (11.24%)

First capital last symbol = 225 (0.12%)
First capital last number = 4749 (2.58%)

The external list is a list of words passed in to Pipal on the command line. I check how many times each of these words is included in each password. This is similar to base words but here you tell the app which base words to search for.

If you are wondering why "dragon" is only counted 76 times as a base word but shows 185 times here, that is because there are 109 base words which contain "dragon" but aren't just "dragon", for example "phpdragon".

The external list I'm using is the list claiming to be "The 25 Worst Passwords on the Internet". Another suggestion for a list of words to use is the domains from the Alexa top 1000 list, this could be good if you are analysing a list of passwords from an unknown origin or would like to know if a list from one domain is linked to any other domains.

External list (Top 10)
master = 229 (0.12%)
123456 = 208 (0.11%)
dragon = 185 (0.1%)
password = 164 (0.09%)
monkey = 118 (0.06%)
shadow = 105 (0.06%)
qwerty = 95 (0.05%)
1234567 = 72 (0.04%)
12345678 = 47 (0.03%)
letmein = 44 (0.02%)

We now look at months and days in both full and abreviated form. While "may" could be a persons name or normal word it looks like for some reason it is a popular word in the list. "June" and "April" are also popular but also names which could explain the higher proportion. For days of the week there is a very large preference for "monday" and "friday", guess which days people change their passwords.

Months
january = 8 (0.0%)
february = 3 (0.0%)
march = 23 (0.01%)
april = 48 (0.03%)
may = 171 (0.09%)
june = 56 (0.03%)
july = 27 (0.01%)
august = 22 (0.01%)
september = 3 (0.0%)
october = 15 (0.01%)
november = 7 (0.0%)
december = 6 (0.0%)

Days
monday = 12 (0.01%)
tuesday = 2 (0.0%)
wednesday = 1 (0.0%)
thursday = 3 (0.0%)
friday = 11 (0.01%)
saturday = 1 (0.0%)
sunday = 5 (0.0%)

Months (Abreviated)
jan = 341 (0.18%)
feb = 42 (0.02%)
mar = 1406 (0.76%)
apr = 108 (0.06%)
may = 171 (0.09%)
jun = 190 (0.1%)
jul = 158 (0.09%)
aug = 83 (0.05%)
sept = 17 (0.01%)
oct = 69 (0.04%)
nov = 161 (0.09%)
dec = 120 (0.07%)

Days (Abreviated)
mon = 953 (0.52%)
tues = 3 (0.0%)
wed = 69 (0.04%)
thurs = 6 (0.0%)
fri = 169 (0.09%)
sat = 187 (0.1%)
sun = 299 (0.16%)

Seeing as we've looked at months and days why not years. Looks like years around the turn of the milenium are popular in this list. I also ran this on the passwords from the myspace leak which showed years around 1990 were popular, maybe this says something about the age of the average user.

Includes years
1975 = 82 (0.04%)
1976 = 80 (0.04%)
1977 = 96 (0.05%)
1978 = 118 (0.06%)
1979 = 142 (0.08%)
1980 = 130 (0.07%)
1981 = 139 (0.08%)
1982 = 142 (0.08%)
1983 = 168 (0.09%)
1984 = 176 (0.1%)
1985 = 171 (0.09%)
1986 = 152 (0.08%)
1987 = 183 (0.1%)
1988 = 165 (0.09%)
1989 = 139 (0.08%)
1990 = 127 (0.07%)
1991 = 115 (0.06%)
1992 = 82 (0.04%)
1993 = 49 (0.03%)
1994 = 41 (0.02%)
1995 = 25 (0.01%)
1996 = 38 (0.02%)
1997 = 56 (0.03%)
1998 = 49 (0.03%)
1999 = 79 (0.04%)
2000 = 428 (0.23%)
2001 = 236 (0.13%)
2002 = 268 (0.15%)
2003 = 235 (0.13%)
2004 = 180 (0.1%)
2005 = 199 (0.11%)
2006 = 145 (0.08%)
2007 = 91 (0.05%)
2008 = 30 (0.02%)
2009 = 26 (0.01%)
2010 = 57 (0.03%)
2011 = 48 (0.03%)
2012 = 45 (0.02%)
2013 = 27 (0.01%)
2014 = 9 (0.0%)
2015 = 16 (0.01%)
2016 = 12 (0.01%)
2017 = 17 (0.01%)
2018 = 16 (0.01%)
2019 = 26 (0.01%)
2020 = 47 (0.03%)

Years (Top 10)
2000 = 428 (0.23%)
2002 = 268 (0.15%)
2001 = 236 (0.13%)
2003 = 235 (0.13%)
2005 = 199 (0.11%)
1987 = 183 (0.1%)
2004 = 180 (0.1%)
1984 = 176 (0.1%)
1985 = 171 (0.09%)
1983 = 168 (0.09%)

The common assumption is that when people are foced to use passwords with numbers in their general response is to add a single digit on the end. Looking at this next set of stats, in this list people actually prefered to add two digits onto the end. The assumption that the last digit will be "1" does however hold true.

Single digit on the end = 14447 (7.84%)
Two digits on the end = 18112 (9.82%)
Three digits on the end = 9637 (5.23%)

Last number
0 = 7753 (4.2%)
1 = 13572 (7.36%)
2 = 8735 (4.74%)
3 = 9313 (5.05%)
4 = 6279 (3.41%)
5 = 6408 (3.48%)
6 = 5991 (3.25%)
7 = 6472 (3.51%)
8 = 5726 (3.11%)
9 = 6728 (3.65%)

We now look at what the last digits are. Some of the numbers are expected but others, 21984 for example, aren't. Could this be a US zip code?

Last digit
1 = 13572 (7.36%)
3 = 9313 (5.05%)
2 = 8735 (4.74%)
0 = 7753 (4.2%)
9 = 6728 (3.65%)
7 = 6472 (3.51%)
5 = 6408 (3.48%)
4 = 6279 (3.41%)
6 = 5991 (3.25%)
8 = 5726 (3.11%)

Last 2 digits (Top 10)
23 = 3027 (1.64%)
00 = 2185 (1.19%)
01 = 1992 (1.08%)
12 = 1817 (0.99%)
11 = 1620 (0.88%)
99 = 1341 (0.73%)
21 = 1150 (0.62%)
13 = 1095 (0.59%)
69 = 1052 (0.57%)
88 = 1028 (0.56%)

Last 3 digits (Top 10)
123 = 2164 (1.17%)
000 = 708 (0.38%)
234 = 477 (0.26%)
007 = 449 (0.24%)
001 = 430 (0.23%)
666 = 397 (0.22%)
321 = 286 (0.16%)
101 = 284 (0.15%)
002 = 274 (0.15%)
111 = 261 (0.14%)

Last 4 digits (Top 10)
1234 = 424 (0.23%)
2000 = 377 (0.2%)
2002 = 215 (0.12%)
2003 = 202 (0.11%)
2001 = 181 (0.1%)
2005 = 166 (0.09%)
2004 = 153 (0.08%)
1987 = 141 (0.08%)
1988 = 133 (0.07%)
1985 = 132 (0.07%)

Last 5 digits (Top 10)
12345 = 110 (0.06%)
23456 = 68 (0.04%)
54321 = 25 (0.01%)
11111 = 23 (0.01%)
21984 = 21 (0.01%)
00000 = 18 (0.01%)
11988 = 16 (0.01%)
21985 = 15 (0.01%)
23123 = 14 (0.01%)
11984 = 13 (0.01%)

These last three are recommendations from Martin. These are where we start moving from analysis to cracking, character sets and hashcat masks.

Character sets
loweralpha: 76041 (41.24%)
loweralphanum: 65827 (35.7%)
numeric: 20728 (11.24%)
mixedalphanum: 8886 (4.82%)
mixedalpha: 4948 (2.68%)
upperalphanum: 2186 (1.19%)
upperalpha: 1706 (0.93%)
loweralphaspecialnum: 1393 (0.76%)
loweralphaspecial: 1383 (0.75%)
mixedalphaspecialnum: 483 (0.26%)
mixedalphaspecial: 268 (0.15%)
specialnum: 191 (0.1%)
special: 61 (0.03%)
upperalphaspecialnum: 48 (0.03%)
upperalphaspecial: 37 (0.02%)

Character set ordering
allstring: 82695 (44.85%)
stringdigit: 47849 (25.95%)
alldigit: 20728 (11.24%)
othermask: 12040 (6.53%)
stringdigitstring: 11274 (6.11%)
digitstring: 5490 (2.98%)
digitstringdigit: 2180 (1.18%)
stringspecialstring: 837 (0.45%)
stringspecialdigit: 521 (0.28%)
stringspecial: 489 (0.27%)
specialstring: 116 (0.06%)
specialstringspecial: 101 (0.05%)
allspecial: 61 (0.03%)

Hashcat masks (Top 10)
?l?l?l?l?l?l: 18462 (0.0%)
?l?l?l?l?l?l?l?l: 17481 (0.0%)
?l?l?l?l?l?l?l: 13981 (0.0%)
?l?l?l?l?l?l?l?l?l: 8045 (0.0%)
?d?d?d?d?d?d: 7726 (0.0%)
?l?l?l?l?l?l?l?l?l?l: 5253 (0.0%)
?l?l?l?l?l: 5249 (0.0%)
?d?d?d?d?d?d?d?d: 5116 (0.0%)
?l?l?l?l?l?l?d?d: 4956 (0.0%)
?l?l?l?l?l?d?d: 3149 (0.0%)

Install / Usage

The app will only work with Ruby 1.9.x, if you try to run it in any previous versions you will get a warning and the app will close.

Pipal is completely self contained and requires no gems installing so should work on any vanilla Ruby install.

Usage is fairly simple, -? will give you full instructions:

$ ./pipal.rb -?
pipal 1.0 Robin Wood (robin@digi.ninja) (http://digi.ninja)

Usage: pipal [OPTION] ... FILENAME
        --help, -h: show help
        --top, -t X: show the top X results (default 10)
        --output, -o <filename>: output to file
        --external, -e <filename>: external file to compare words against

        FILENAME: The file to count

When you run the app you'll get a nice progress bar which gives you a rough idea of how long the app will take to run. If you want to stop it at any point hitting ctrl-c will stop the parsing and will dump out the stats generated so far.

The progress bar is based on a line count from the file which it gets this using the wc command. If it can't find wc it will make a guess at the number of lines based on the file size and an average line length of 8 bytes so the progress bar may not be fully accurate but should still give you an idea.

Download

Due to the number of people asking about submitting updates I've moved Pipal hosting to github, you can now get he the latest version from its github repository.

If you aren't sure what you are doing with github just click the ZIP button on the approximately middle left and that will give you a zip file which you can decompress and use as you would the versions below.

Download Pipal 1.1 - Bug fixed, not calculating correct percentage for Hashcat masks - Reported by Moshe Zioni
Download Pipal 1.0

Analysis

This section was supposed to just contain a few sets of sample stats but as more sites are being hacked and passwords released I've decided to run analysis on any lists I can get my hands on and post the results here. The first six in the list are the original sample sets and are based on password lists from the SkullSecurity site, for the rest, I'll give whatever information I can about where the list came from.

phpBB
myspace
Hotmail
RockYou top 1000
The full RockYou list - Took over 24 hours to generate
Faithwriters
Yamaha MotoGP - This one is from here
CSDN - For more information and a download link see The Hacker News Network
T-Mobile - A small leak, barely worth analysing but hey, bad passwords are bad passwords and these are bad passwords! For more information and a download link see T-Mobile staff data and passwords hacked and published
Irish Aid - Another small leak, this time from the Irish Aid website. See the story on Help Net Security
Portal Mercosur - This is from a database dump posted on Pastebin
Foxconn had a big leak, here are three different lists from them Partners, Users, Vendors. The data can be found as a torrent on The Pirate Bay.
YouPorn - Lots of people have analysed this and posted their analysis online already but including it here so it is with all the rest. Here is a link to one that analysed a lot more passwords than I've got here, it is probably a more useful analysis.
Panda Security - The Panda Security password list as leaked by AntiSec, see Hacker News for more information. Thanks to @lobobastich for the work.
Digital Playground - Another porn site taken down, links to the passwords and to some of the info on the hack. This analysis also contains an example of a new feature being released soon, the last 3 and 5 digits are checked to see if they are valid US area or ZIP codes, if so the area is printed.
Digital Playground Store list - These are for the store (digitalplayground.com/store/login.php), not for the main site.. Download from MediaFire and more info HackBB - View through Tor.
Military Singles - This is a US military dating site, I cracked 92029 of the 118276 leaked, if you've cracked more let me know. Wonder how many of these are reused on other, more important, systems? More info from SC Magazine.
Twitter Leak - This was originally claimed to be 55k of passwords but when I de-duplicated the accounts and tidied it up a bit it came down to 34k, still not bad. There is a brief news write up in SC Magazine and a good analysis from @nilssonanders . You can get the passwords from Pastebin in five batches: 1, 2, 3, 4 and 5.
"100k" Arab and Middle East Facebook - A hacker know as "Hannibal Hacker" claims to have released 100k from a total of 1 million Facebook accounts he has. I got two copies of the file and both matched but only contained 19950 accounts, about 1/5 of what is claimed. Interestingly, from the basewords, top is "sayang" which means love but second is "brokenheart". Here is a bit of a write up on the release by hackerz.do.am.
Navy.mil sample leak - .c0mrade leaked a small sample of passwords recovered from a hack of navy.mil. Only 171 but you can see the military skew in them. This is the pastebin dump.
Quick run on LinkedIn hashes - LinkedIn hashes just got dumped, there is no definitive cracked list yet but here is the analysis of those cracked by M@LIK + Others from InsidePro.
811k of e-harmony passwords - These hashes were cracked by @f8lerror, thanks for the list. Even a quick look at this list reveals a big mistake made by the e-harmony developers, they converted all passwords to uppercase to avoid case sensitivity when their users logged in. Also made cracking a lot easier as the keyspace was massively reduced.
A much larger list (3,473,010 unique words) of LinkedIn passwords - This time a much larger list, a combination of lists from various people including @n00bznet, @christruncer, Michael S and M@LIK.
Panda security got pwned - Panda security had a bunch of account details dumped, this analysis is from a small number of Linux accounts. Pastebin Dump.
450k of Yahoo! Voice passwords - Yahoo! Voice passwords and other bits of server data were leaked. No cracking required, apparently they were stored in clear text. Dump Article.
Billabong Surfwear - Don't know much about this, just got pointed at the dump. Dump.
Kenyan Broadcasting Corporation - The Kenya Broadcasting Corporation - www.kbc.co.ke - had a server hacked by a group called the Rwandan Hackers. The user list in the pastebin dump looks to have been cut short as the users are in alphabetic order but stop at n. Interesting to see the UK football club Arsenal in the top 10 passwords. Dump.
Freshfiction.com - A site for authors, as far as I can tell mainly women. The original leak was on Pastebin but has since been removed. Thanks to @Glesec for the tip.
Maybe aim.com - Not 100% sure where this came from but I'm told it was from aim.com, a social media type site. The original leak was on Pastebin but has since been removed. There were multiple files, I couldn't find part 1 and I don't think part 6 was the last one but the last one I could find. Thanks to @Glesec for the tip.
ISTL - A UK lighting company, I was sent the analysis from an anonymous source so no idea where it came from or if it is public.
Association of Irish Festival Events - I found this dump from the AOIFE while looking for something else on pastebin. Dump.
Portsmouth Harbour Master - Dumped this morning by the NullCrew from the UK Queen's Harbour Master's website. Dump.
Plymouth Harbour Master - These are the users from the Plymouth side of the Harbour Master's site. Not really worth analysing but here for completeness.
An #Opisrael dump from Israel Audio & Music Technology Magazine. The dump contains three sections, the top, titled users, and the bottom, titled addresses, both contain passwords so I've done them both separately. Users, Addresses. Announcement, Dump
XBox Live Partial - These are from a Pastebin dump by Reckz0r, they claim to be a sample from a larger dump but unfortunately the domain hosting the larger dump is now dead. Pastebin of dump - Turns out this is probably a fake leak but using real data from somewhere else, see this report.
CVPS Machine Passwords CVPS Email Passwords - A good friend got me a partially cracked dump of Chicago VPS data. This has been ran through the new, modular, Pipal so check out the username and email address Levenshtein comparisons at the end of the report, really interesting stuff. Info on the breach from the Chicago VPS site.
World Poker - A large dump of passwords from the Amateur World Poker Tour site. I used an external list of about 200 words relating to poker and got a few hits through that, I also noticed that the colour red has quite a following. The password sdf7asdf6asdg8df is the default when resetting passwords, as an individual password it is reasonably complex, when used by a quarter of the site it isn't quite as good.
Harlech College - A small dump from an adult learning college in North Wales.
Tesco - What a surprise, Tesco got popped and data came out. Their bad security was discussed in depth over a year ago by myself and Troy Hunt and it doesn't seem to have got better since then. The dump on Pastebin. BBC article on the breach. How Troy Hunt thinks it happened.
Boxee Forum - The Boxee forum got popped and all user credentials were relesed. This is an analysis of a subset of the passwords that were taken. The dump that was analysed, thanks to m3g9tr0n. ars technica report of the hack.
Netflix - This is a small set of password from Netflix which were probably grabbed quite a while ago but have just been re-released by Derp. The dump that was analysed. cnet write up of the dump.
Manga Traders - A very large dump from the Manga Traders website. As well doing the normal checks I also grabbed a list of Manga character from Wikipedia and used that as a list to run comparisons against. After splitting the list down to individual words and filtering it to only six characters and above I got 381 words. I expected a large number of hits so was surprised when I only got two. Either the list isn't a good representation of Managa characters or users don't consider the names good enough for passwords.
Looking at years included in passwords showed a definite spike around the years 1987 to 1996, if these are taken to be years of birth it would make the average user of the site between 18 and 27 years. That seems to fit with what I'd expect of Manga followers.
The newly added email checker showed a large number of people who used their whole email address as their password and a much larger number who used the name part or slight variation. There is often discussion as to whether username disclosure from websites is a bad thing, I think this shows that it is definitely an issue.
A big thanks to freakyclown for providing the list of cracked passwords.
Serra Pre-school - A pretty bad selection of passwords taken from the website for Serra Pre-school. I was passed the cracked list so don't know how it was obtained but looking at the cracked passwords the administrator and developers have a lot of explaining to do.
As with the previous Manga Traders dump, looking at the years shows a definite spike, this time around 2000-2010, this would link up with the ages of children of parents are likely to be using the site. A look at the base words shows a definite Islamic slant to the passwords. From the username checker, 8% of users use either their username or something very close to it as their password, if you are builiding a password complexity checker it is definitely worth adding a Levenshtein distance check and disallowing any password close to the username. The colour checker almost gives a full rainbow with just indigo missing, I don't think I've seen a full one yet. A manual look at the passwords used and I'm surprised to find no mention of "monkey", there is a single "dragon" and eight "darkangel"'s.
Thanks again to freakyclown for his awesome cracking skills.
Revista Home Theatre - Don't know anything about this leak, I was just sent a link to the dump which is credited to the siph0n team.
Nothing much special in it, a few common weak passwords and as is becoming an obvious trend, quite a few usernames or partial email addresses as passwords.
This dump contains a section for email matches and username matches, this is because the dump contained both so I looked at both the username part of the email address as well as the username itself.
Baltimore Jewish Life - Another dump from the siph0n team. The site looks like it didn't do any kind of bot checking on its registration form as there are a lot of entries which look automated with obviously bogus email addresses.
There are two Chinese place names and one Chinese name in the top ten base words which again points at bot activity as I can't see there being that many Chinese Jews in Baltimore.
An interesting dump if you want to study bot activity but not if you are looking for real user passwords.
Ozsports.info - And another dump from the siph0n team. This is a group of websites for searching on Australian schools and charities as far as I can tell.
There are a lot (90%) of "password" as passwords, whether these are defaults that have never been used or just lazy users who never changed them I don't know.
Specially for this dump I've created a new checker containing Australian places, this is a quick hack with place names grabbed from a Wikipedia page and I will tidy it up later but it got some matches so shows people pick passwords close to home.
Bit of trivia, this is the first time I've seen the password "noodles" make it into the top 10 list.
PostdocJobs.com - Just to prove choosing good passwords isn't based on high IQ here is a dump from a website offering job adverts for people with PHDs.
Some obvious base words used as well as some I wouldn't have expected.
Possible iCloud brute force list - There has just been a dump of nude photos of celebraties which it is suspected were taken from iCloud, more speculation suggests that a tool called ibrute may have been used to help get into the accounts. This analysis looks at the default list of passwords that comes with ibrute.
comicbookdb.com - The dump contained around 50k of usernames, email addresses and clear text passwords. The colour checker gives us our first full rainbow of colours and a full house of seasons including both autumn and fall. Batman, Superman and dragon all make an appearance in the top 10 base words along with comic and comicbookdb. A good show from the date checker, all the months except February coming in in full form along with five days of the week and a full house for both months and days in abbreviated form. 123456 is the top overall password with trustno1 comming in at number 8, don't think I've seen that so high before.
Anonymous Indian bank - This analysis is for a dump from an anonymous Indian bank and was done by Freaky Clown. I've no other info on the hack/dump but interesting to note that 123456 is a common password anywhere in the world.
CTF365 - The CTF365 site was hacked by a person or group with the name cyberselfie who dropped this dump on pastebin - False alarm, the list was usernames and email addresses not passwords. Wonder if the passwords will show up eventually.
A dating site.com - A seemingly popular dating/social network site was hacked just before Christmas and the password list dumped on the Siph0n forum. Some really interesting findings here, first, being a site geared towards people who like accents and mainly targetted at US/UK members, it is not surprising that the top 10 passwords and base words contain the following place names:
- london
- america
- newyork
- chelsea
- liverpool
- arsenal
Next, looking at colours, red is in first place with 120 occurences, double that of blue in second. While some of these could be parts of other words, red and dating go together.
Looking at months, there is a definite spike around May and June and a second in December, looks like people are looking for love for the summer and again around Christmas, maybe for a festive fling. Looking at days, monday is double other days (only 8 occurances but still there) but its abreviation mon is way above the other days at 241 instances verses 88 for sun then dropping to 44 for fri. Are people getting into work on Monday after a lonely weekend and using work computers to sign up?
Looking at years, the majority are recent, 2008 onwards, I don't know when the site started but I'd guess it's been going for a while looking at those dates. Looking lower down the list there is a slight spike around the mid 1980's making the average site user either side of 30, again what I would expect for an online dating site.
Finally, looking at usernames and passwords, nearly 500 people used either their username or something within a Levenshtein distance of 3 from it. For people looking to build custom wordlists from username lists, it would definitely pay to use a tool which could do these little modifications as this happens on most lists I see.
I was talking to someone recently about Pipal and was trying to explain how, with just a password list, you could have a pretty good guess at the source, or at least the market sector, and you could also do some good analysis of site user demographics. I think this dump shows that up quite well.
Minecraft - Just got back from ShmooCon to find a nice dump of Minecraft passwords waiting for me, I've not read into the story so don't know any more about it than there are 1800 passwords up for grabs. Looking at them one thing that stands out is the years, a definite spike from 2000 to 2005 suggesting average site user is aged between 10 and 15 which would make sense. It also looks like kids don't like using days in their passwords, Tuesday is the only day to get a mention and that is just once.
Lizard Squad - The Lizard Squad got their database dumped and it turns out they stored their passwords in clear text. Of the 13k passwords, we can't know how many are throw away ones so the figures are probably skewed but some of it will be legit.
Neofriends - A dating site with clear text passwords dumped on Siph0n. Nothing really special here just funny that "love123", "mylove" and "fuckyou" are all in the top 10 passwords, shows both ends of the spectrum for site users. And "dragon" is in the top 10 base words again, oddly so is "emmanuel".
TEAM (The Employment Agents Movement) - The dump seem to mainly consist of randomly generated 8 character passwords but there are a few people who have gone in and set their own. A look at the Hashcat masks shows that, despite using what they may have considered to be a strong random password generator, that 10% of the passwords can be cracked with just three masks. Original announcement tweet and a write up on Help Net Security
ihimlen.dk - A dump of the credentials from a Danish comedy site by JM511. Not sure how they worked out the passwords based on the hashes and the symmetry of some of the words looks a little to perfect but including it anyway. Original announcement tweet

The following stats have been generated by other people.

Look Back on 2012's Famous Password Hash Leaks - Wordlist, Analysis and New Cracking Techniques - A great write up on all things password related in 2012 by m3g9tr0n, Thireus and CrackTheHash.
Leaked US iOS Device IDs dataset generated by @sauerlo from Lo Sauer's Code Log.
TeamPoison UN email passwords (http://pastebin.com/FEcE9WzJ) with blank pwds (000) - Generated by @Rob_OEM
TeamPoison UN email passwords (http://pastebin.com/FEcE9WzJ) 270 blank pwds removed - Generated by @Rob_OEM
Specialforces.com - Damian cracked these passwords and performed the Pipal analysis. Read his write up here.
"GoD zERo" zine Disney passwords - As cracked and analysed by N00bz.
Stratfor - Cracked by Martin Bos of Question Defence.
Stratfor version 2 - This one was cracked and analysed by Marc Doudiet of Swiss infosec.
Stratfor version 3 - Everyone is cracking Strafor passwords, this one is a full write up by Electric Alchemy.
Loads of analysis - @arex1337 has done an awesome job of collecting together and analysing a whole host of password lists.

Feedback/Todo

If you have a read through the source for Pipal you'll notice that it isn't very efficient at the moment. The way I built it was to try to keep each chunk of stats together as a distinct group so that if I wanted to add a new, similar, group then it was easy to just copy and paste the group. Now I've got a working app and I know roughly what I need in the different group types I've got an idea on how to rewrite the main parser to make it much more efficient and hopefully multi-threaded which should speed up the processing by a lot for large lists.

I could have made these changes before releasing version 1.0 but I figured before I do I want to get as much feedback as possible from users about the features already implemented and about any new features they would like to see so that I can bundle all these together into version 2. So, please get in touch if there is a set of stats that you'd like to see included.

One other thing I know needs fixing, Pipal doesn't handle certain character encodings very well. If anyone knows how to correctly deal with different encoding types, especially with regards to regular expressions, please let me know.

Where is the name from?

It comes from Pip Al as a way to celebrate my daughter and n00bz's son, Pippa and Alexander. It also turns out to be the name of a type of fig and a village in Nepal.

Credits

The speed increases added in version 2 were submitted by Stefan Venken who said a small mention would be good enough, I want to give him a big mention. Running through the LinkedIn lists would have taken many many hours on version 1, version 2 went through 3.5 million records in about 15 minutes. Thank you.

I didn't realise it when I included them, but the "Hashcat", "Character sets" and "Character set ordering" stats are all based on an original idea by iPhelix in his tool PACK. If you are interested in generating Hashcat masks then his work is well worth a read.

Support The Site

I don't get paid for any of the projects on this site so if you'd like to support my work you can do so by using the affiliate links below where I either get account credits or cash back. Usually only pennies, but they all add up.

Buy me a smoothie