Andrey Kamenov, Ph.D. Probability and Statistics
Andrey Kamenov is a data scientist working for Advameg Inc. His background includes teaching statistics, stochastic processes and financial mathematics in Moscow State University and working for a hedge fund. His academic interests range from statistical data analysis to optimal stopping theory. Andrey also enjoys his hobbies of photography, reading and powerlifting.
Exploring utility patents is no easy task. The system that is currently in place, Cooperative Patent Classification, has its issues. For example, many patent applications include multiple classifications only partially related to the invention itself. This proves useful if you are searching for something specific — on the other hand, it is not as helpful if you are interested in patent mining or visualization.
Luckily, one can easily accomplish most of these tasks with the help of advanced computational techniques. The most promising approach is the use of Natural Language Processing to classify and visualize patents based on their abstracts.
Let’s take a look at the algorithm known as Latent Dirichlet Allocation (or LDA for short). Its primary goal is to find the number of topics that are best suited for classifying a corpus of documents (patents, in our case).
Today we’ll take a look at the seasonality of peer-to-peer lending. Thanks to data provided by Prosper, we can perform a quantitative analysis of loan charge-off times and find if there are any seasonal patterns present.
First, let’s take a look at the lifespan distribution of defaulted loans. We see that the default rates increase significantly during the first several months, with the largest number of defaults registered in the eighth and ninth months after origination.
Tesla currently rules the electric vehicle market in the U.S. But despite the manufacturer’s ambitious plans, things are likely to change in the foreseeable future. Nearly every major car manufacturer has announced plans to include more electric vehicles in their lineup; General Motors plans to release 20 all-electric vehicles by 2023, and Ford has 13 EV models in the pipeline for the same year.
But what is the current state of affairs? Has the electric vehicle market really taken off outside of California? And most importantly, is the existing infrastructure ready for mass-market EV cars?
What is cloud computing and why is it important? In recent years there has been a surge in the number of internet-based services that provide shared computer resources, minimizing both upfront and continuing maintenance costs for new projects. It is often regarded as most useful for smaller setups, as it allows access to a significant amount of resources regardless of the project size.
Despite technically existing since the 1990s, the technology only really took off in 2009 (Microsoft announced its Azure service in October 2008). After this, it didn’t take long for major U.S. companies to start patenting their developments.
The “cloud” term started gaining popularity in new U.S. patents in 2010. We can already say that it was not a one-time fad — each year since then, the total number of patents including the term has increased.
A study published in 2015 suggests that mass shootings in the U.S. are contagious. Not in a sense of spreading like a disease, however — just that each shooting slightly increasing the chance of another one happening shortly afterward.
The original article used data from the Brady Campaign to Prevent Gun Violence. It appears that we now have much more complete data on shootings, thanks mostly to the Gun Violence Archive project. Additionally, it should be interesting to see if the findings still hold true.
Here’s the chart showing how the average number of mass shooting per day changed in the past two years. The graph below has been smoothed using a rolling average with gaussian weights and a 60-day window.
A question in a survey conducted by the Centers for Disease Control and Prevention asked a respondent to assess his or her own health. Just one in every six people considered their own health “excellent”; most stuck to “very good” or even just “good.” But how do we use this data to visualize the relationship between health levels and other factors?
The problem is, the scale itself is not quantitative. It doesn’t explicitly state how much better “good” is than “fair”.
In order to measure the health levels of specific groups in the general population, we’ll use a quite common procedure. It provides values between 0 percent (meaning all respondents have poor health) and 100 percent (where everyone has excellent health). The exact values of everything in between are based on the (empirical) nationwide percentages.
Let’s see how our health index fares against two of the most obvious factors. The first chart shows the speed of peoples’ self-perceived decline in health with age:
As we saw in an earlier post, one of the industries instrumental to the growth in the professional services sector is research and development. The number of people employed in this sector has grown from 330,000 to almost 480,000. Even more impressive is the fact that it peaked at almost 600,000 in 2006:
The growth in gun-related crime is becoming an issue in California. The state’s firearm homicide rate had already been rising for two years in 2016, and the data for 2017 doesn’t look very bright either.
Last year, 2,100 gun-related incidents with victims occurred in California, excluding accidental shootings. This number is the second highest of all states, second only to Illinois.
1,701 people were injured and another 1,113 were killed.
Number of shootings with victims in 2017 in California
Baltimore is a city that is particularly known for its high crime rate — so much so, in fact, that there is an entire Wikipedia article about crime in the city.
Now that 2017 is over, it’s time to look back and see if the situation has improved. At first glance, it appears to not have improved. Here’s a map showing every intentional shooting with at least one victim in Baltimore. You can click on any marker to see details of the incident.