Machine Learning Newsletter

Blog in Review 2014

This blog(I guess no matter how much I do not like this term, this website is a blog) has been around more than 3 years by now. Last year, I made a conscious decision that I should write more regularly and removing some of the posts that are all about rants and ideas about everything and keep the ones that are mostly about data. I’d like to think about this blog as data-driven medium where the blog posts should be medium to long lengths. I will also write about programming as well, but mostly it is about data and analysis.

If you ask me which wh- question do you like most(you do not ask such questions, but for the sake of argument), I’d immediately answer why. Not only it gives reasons which I think is the fundamental block for many things presence, but also it gives very purpose of the things and people. After you could answer why, everything else(what, how, which) is secondary. Icing on the cake, if you get this wh-question right, you would not have existential problems.

Why?

  1. I want to improve my writing skill. I am not a native speaker, sometimes find myself struggling to communicate what I have been thinking. There are many ways to improve one’s communication and writing is essential part of the most important communication channel that ever existed: email. Seriously, it is important.
  2. I want to improve my data analysis/programming skills. Data analysis is not what I do day to day in my job and more often than not, it is a starting point for most of my projects in order to understand what type of data we have and what type of questions that it could answer. If you cannot visualize/explore your data, you would have a poor head start and that may restrict the scope of the project. It is one thing you cannot do what is possible, and another completely different thing you do not know what is possible. The latter is the worst.
  3. To learn better about the topic and start a conversation. I wrote a blog post On Machine Learning which is about practical problems that I faced in applying machine learning to different kinds of problems. Some people reached out to me over Twitter and some via email and I get to learn many other problems of others that I had no idea of existence. Most of the topics that I write about, I would say I had some prior work but definitely not expert level. This would allow other people yell at me over the matters that they know better than me.(which is always a good thing, you get to write one thing and learn many. Internets is not a place where you could maintain a meaningful conversation anyways)
  4. To document what I have been learning. I started to grow an interest in Bayesian models last year and learned quite well Monte-Carlo methods and how they can be applied to interesting practical problems. Right now, I am not using MCMC that much since my focus is now mostly information retrievals and search. But it is always nice to relearn the topic from your writing. Not only it makes it very easy to grasp, but also other things that I learned at that time comes back because of association.
  5. Publish interesting IPython notebooks. I spent quite a lot of time IPython notebooks either for fun or profit. Blog becomes a place where I put IPython notebooks that are worth publishing.(Thanks for Pelican and IPython notebook plugin for enabling me to publish those notebooks)

How come?

It has been around 1 year since I started this blog and I really wanted to see some patterns around the audience and what blog posts are more important to drive traffic, which channels and more about my audience. Also, to see if I need to optimize for mobile even further(right now, it is accessible through mobile and good, but it is not great) I also like to think this blog’s data is a return to its audience and it reached a point that it generates its very own data.

What did you use?

Google Analytics data. It is not great but it is somehow complete and allows you to access somehow nice csv’s to use. It also allows having different time intervals if you want more granular or less granular data

Organic Search Results

If there is one thing that I like most about the blog, that is that for some of the google searches, the blog appears in top 10 and sometimes top(try googling ‘alternating least squares’) which is pretty amazing on its own, but also allows me to establish my authority on a certain topic. Second, if you think most of the mediums that drive traffic(hacker news, twitter and facebook), they are mostly seasonal and ephemeral. They create huge spikes in your traffic but that traffic does not persist over time. Search results do persist, as long as people are clicking on my links when they tpye a query and stay for a long time, which means they are happy, search engine keeps showing that link in order to make sure that the engagement is increasing. ![Organic Search Sessions](/images/work/notes/2015/1/25/organic_search_sessions_over_months.png ‘Organic Search Results’) Larger Image

I started to write regularly at the end of the January, so this graph is pretty amazing to me, growth is in the right direction. The absolute number of sessions is low albeit.

Type of Devices

I knew that most of the devices are actually desktop computer, but I did not know that it is less than 20%. Tablet’s percentage is not good either. ![Type of Devices](/images/work/notes/2015/1/25/sessions_by_device_category.png ‘Type of Device Types’) Larger Image

Desktop is still dominates in the audience.

Session Numbers of Blog Pages

It follows Zipf’s law as not all posts are created equal. In 2015, I want to change this unequal distribution into a more uniform one. Equality for posts!

![Session Numbers](/images/work/notes/2015/1/25/session_number_of_blog_landing_pages.png ‘Session Numbers’) Larger Image

Session Duration of Blog Pages

Some of them are clearly outliers as one cannot simply spend 40 minutes but I blame Google Analytics for those types of mistakes. Here is another Zipf’s Law Graph:

![Session Duration](/images/work/notes/2015/1/25/session_duration_of_blog_landing_pages.png ‘Session Duration’) Larger Image

Average Session Duration by Referrals

Apparently, Linkedin users read more than other referrals or they refer posts that are longer than others. I’d like to think the latter as one of the somehow longer post is shared in Linkedin quite heavily.

![Session Duration of Network Referrals](/images/work/notes/2015/1/25/average_session_duration_of_network_referrals.png ‘Session Duration of Network Referrals’) Larger Image

Hacker news reader seems to just skimm the post rather than taking time to read it. If you think my two posts that become in the first page of HN are quite long.

Average Session over Months

Average session duration should show where post gets a lot of love, in this case it is still On Machine Learning As most of my metrics is spiked around on September, it is mostly due to that post which get a lot love from HN, Linkedin and Twitter.

![Average Session over Months](/images/work/notes/2015/1/25/average_sesssion_per_month.png ‘Average Session over months’) Larger Image

Blog Users Flow

I am happy with most of the data that Google Analytics provide and this section is not one of them. They would just allow a weird screenshot of the users flow in the website and does not give any raw data.

![Blog Users Flow](/images/work/notes/2015/1/25/blog-users-flow.png ‘Blog User’s Flow’) Larger Image

There are a lot of drop offs when they hit the website and I guess I need to provide better tools to expose other content in the website. It seems Disqus does not do a good job in terms of recommending the relevant content in the bottom of every blog posts. Also, there is no available option to go from one post to another, you could only go back to home page and then go to the another post.

Session Numbers by Channel

If I want to look at the channels in terms of the traffic they bring, there is not a lot of surprise, except the confusion that Google Analytics causes. I have no idea what “Other” or “Direct” means in the below graph, but I am putting to stress on how important the social media for my blog is.

![Session Numbers by Channel](/images/work/notes/2015/1/25/channel_by_session.png ‘Session Numbers by Channel’) Larger Image

Session Numbers over Month

If I want to see how much HN drives traffic to my website, I just need to visualize the number of sessions over month; and the peaks are when my blog post makes it to front page.

![Session Numbers over Months](/images/work/notes/2015/1/25/n_session_per_month.png ‘Session Numbers over Month’) Larger Image

Number of Unique Users over Month

Unsurprisingly, number of unique users also follow session numbers over the month.

![Number of Unique Users over Months](/images/work/notes/2015/1/25/n_session_per_month.png ‘Number of Unique Users over Month’) Larger Image

Number of Session by Network Referrals

If I actually want to learn which social network actually drives most of the traffic, it is definitely Hacker News(HN). If you do not know know about HN, it is a community curated links and provides a nice way to discuss the content of the links, very similar to Reddit in many aspects, but its audience is mostly technology savy people. They do not disclose the number of visitors or other audience metrics, but when your website becomes the first page for 3-4 hours, it guarantees 10K users in a very small period of time. As I mentioned earlier, two of my posts made it to front page and that was more than enough to make HN the first among many other social networks.

![Number of Session by Network Referrals](/images/work/notes/2015/1/25/number_of_sessions_of_network_referrals.png ‘Number of Sessions by Network Referrals’) Larger Image

Pageviews

What about the pageviews grouped by different aspects of the audience.

Pageviews of Network Referrals

Pageviews of network referrals closely follows the number of session with couple of small twists, Twitter users are apparently more exploring more on the website and HN users only consume what they come for.

![Pageviews by Network Referrals](/images/work/notes/2015/1/25/page_views_of_network_referrals.png ‘Pageviews by Network Referrals’) Larger Image

Pageviews/Session of Network Referrals

Which social network user does most of the page views per session?

![Pageviews by Network Referrals](/images/work/notes/2015/1/25/page_views_of_network_referrals.png ‘Pageviews by Network Referrals’) Larger Image

Apparently, one Linkedin user comes, she sees another page as well.

Percentage of New Sessions/total Sessions

Percentage of new sessions follow total number of sessions loosely. When new peak appears, then the new sessions peaks as well.

![Percentage of New Sessions/Total Sessions](/images/work/notes/2015/1/25/percentage_new_sessions_vs_sessions.png ‘Percentage of New Sessions over Total Sesssions’) Larger Image

Percentage of Sessions By Browser

Chrome is the most dominant and Firefox is the second, no surprise here. However, I was not expecting Safari and its position in the list. IE does not have much weight, which I am glad, as I do not have an environment to test IE.(which is pretty great excuse)

![Percentage of Sessions by Browser](/images/work/notes/2015/1/25/percentage_of_sessions_by_browser.png ‘Percentage of Sessions by Browser’) Larger Image

Percentage of Sessions by Device Brand

Apple products get the top 2 and there are couple of Goole Nexuses and Samsungs in the top 20, and then the remaining ones are your long tail(mostly Samsungs and other Android OEMs)

First 20

![First 20 Mobile Brands](/images/work/notes/2015/1/25/percentage_of_sessions_by_device_brand_first_20.png ‘Top 20 Mobile Brands’) Larger Image

Remaining 80

![Remaining 80 Mobile Brands](/images/work/notes/2015/1/25/percentage_of_sessions_by_device_brand_from_20_to_100.png ‘Remaining 80 Mobile Brands’) Larger Image

Samsung seems to be quite dominant from a very small sample size. However, sum of Apple products is more than sum of Android phones and other phones(there are few Microsoft phones)

Distribution of Users by Country

English speaking countries have the head start in this one, and they do not disappoint. Even though UK and Canada is small countries by population comparing to Germany and France, they are leading in terms of users. I’d expect more users from China and India.

![Users by Country](/images/work/notes/2015/1/25/percentage_of_sessions_by_location.png ‘Users by country’) Larger Image

Distribution of Operating Systems of Users

Windows is the leading OS, which surprised me as the IE gets very small share from browser share. Apparently, Google succeeded to convince Windows users Chrome is better than their favorite browser.

![Users by Country](/images/work/notes/2015/1/25/percentage_of_sessions_by_location.png ‘Users by country’) Larger Image

comments powered by Disqus