Auf ambassadorbase.at ist mein Artikel auf Deutsch verfügbar. In my job and my studies I recently finished I work with lots of different data sources and you will also meet all of them throughout your career as a data scientist. Data can be given to you as an SQL dump, XML files and many other … Continue reading Data Analysis with Microsoft Excel: Tables
Category: Data Science
Data Science related stuff like collecting data, statistical methods, machine learning, … Mostly #rstats
Finding data sets Part 2: TV, music, book ratings and sports data
The first part gave a more general overview on where to get data. This section will give you specific data sources, e.g. if you like sports, movies, books, … and so on. Over the next couple of weeks you’ll find these posts on my blog: General data sources TV, music, book ratings and sports data … Continue reading Finding data sets Part 2: TV, music, book ratings and sports data
Finding data sets Part 1: General data sources
I often encounter interesting algorithms or R packages which I want to test. The nice ones provide data for testing but often it is only dummy data. To get a good understanding of the method and its limitations real data might be required. Sometimes I would also like to explore data I have not used … Continue reading Finding data sets Part 1: General data sources
Deriving the Predicted Residual Sum of Squares Statistic
Recently I was looking into measures to evaluate a regularized least squares model. One thing I would have liked was cross-validation to be able to compare different models. When researching possibilities, I discovered PRESS (Predicted Residual Sum of Squares Statistic). The main resources I used to study the subject are: [1] Adrien Bartoli: Maximizing the … Continue reading Deriving the Predicted Residual Sum of Squares Statistic
Cat tracking data collected with my Tractive GPS Pet Tracker
For about one year I have used the Tractive Pet Tracker to track my families (and my) cats, my grandmothers cats and our road trip to Sweden (German). Now I want to share that cat tracking data and some additional data with you. In this post I will Share the raw data with you Filter … Continue reading Cat tracking data collected with my Tractive GPS Pet Tracker
Testing SegNet on real world road scenes
Recently I stumbled over SegNet, a tool which uses a convolutional decoder-encoder (type of neural net) to distinguish between 12 categories (like road, pedestrian, …) that a self-driving car might encounter. I tried this tool with several different pictures of roads and want to share the results with you. The big question is: Would I … Continue reading Testing SegNet on real world road scenes
Adding basemap.at tiles to an R leaflet plot
Recently I wanted to visualise some data in a map of Austria. R Leaflet provides a pretty good looking map by default (openstreetmap.org) but I wanted to use basemap.at, which is a map for Austria and therefore probably the most accurate map available for Austria. Actually it is not very difficult but it was the … Continue reading Adding basemap.at tiles to an R leaflet plot
Use R to connect to twitter and create a wordcloud of your tweets
Recently I wanted to create a wordcloud of my tweets and do further analysis. In this post I am going to show you how to connect to twitter in R and how to make a wordcloud from your tweets. To follow this tutorial, you need a Twitter account. First steps in R Install required libraries … Continue reading Use R to connect to twitter and create a wordcloud of your tweets
Use rvest to scrape NFL weather data
If you are following my progress in the Data Science Learning Club you might know that I am using NFL data for the tasks. For predicting sports events I think it is not only important to have statistics about the players, teams and previous games but also about the weather. From when I was a … Continue reading Use rvest to scrape NFL weather data
Learning Club 01: Find and explore a dataset
The first activity of the data science learning club I am participating in is to find and explore a dataset. I already described the data I found and will use in the last post. You can follow all my learning club related activities here. The tasks of this activity are (quoted from the thread above): … Continue reading Learning Club 01: Find and explore a dataset