Applied Dimensionality

Learning@Coursera: Introduction To Data Science

Posted at — Aug 25, 2013
Learning@Coursera: Introduction To Data Science

Got my first Coursera certificate a while ago, hooray! Got it by mistake (more on it later), but it doesn’t matter, I won’t tell anyone, whoops )

Introduction to Data Science was taught by Bill Howe from University of Washington and lasted for 8 weeks. As usual, I was very active in the first 5 weeks and dropped the ball later, when it came to peer assessed exercises. But it looks like they had some issues with grading algorithm, so it kinda said that 100% on 5 weeks was enough to get a certificate, which is fine by me.

The course itself was quite interesting and I really appreciate the breadth of the coverage they tried to provide.

The course outline can be roughly described as follows:

There were a couple of optional assignments if main track wasn’t enough:

Overall,  as you can see, it’s a pretty packed course. It also has a lot of prerequisites, I don’t really think I would get a lot of NoSQL part if I wasn’t reading about the area for the last 5-6 years and the same for Map Reduce, relational algebra/SQL, ML — there was only a week for each the topics, so it was more like “here are the main concepts distilled, off you go to the exercises”. Quite good for me, but I could understand a lot of complaints in the forums, you definitely had no chance to learn from scratch.

Since all the topics were not new, I adopted a quite funny approach to the course: I did the exercises first and listened to the lectures after (finished last one yesterday). That actually saved me from all the massive frustration everybody had with automated python submissions grader. For twitter assignments, you had to submit the python code and it was run against some new set of tweets and your script results (number of sad tweets or tweets from Ohio etc) was compared to correct result. As it turns out, nobody expected that there would be some many students (around 40k), so this auto-checker became the bottleneck and people waited hours and hours before getting their results. Everybody was obviously upset and it was especially funny on the course about massively parallel systems and scalability )

I wouldn’t recommend this course to anybody new to any of it’s areas (Python, SQL, MapReduce, data visualisation), but as recap with a good structure it was really helpful and interesting.

Kudos to prof. Bill and his TAs for pulling this through.

comments powered by Disqus