Best Free Online Data Sets for Statistical Analysis And Modelling

1. Kaggle Data Sets

Kaggle offers free high quality data sets that are perfect to practice machine learning algorithms and modelling. Moreover on Kaggle one find and contribute to discussions on the data set and code related to the data set. In order to access the data base, one must register for an account, which is free. Check it out on: www.kaggle.com/datasets

Kaggle data set

Check out its documentation here for a list of data sets available in such package.

2. R package “datasets”

Another excellent free source of data sets for training and research is the package “datasets” within the free open-source software R. For those of you, new to R, R is a statistical software that provides a wide variety of statistical procedures including statistical testing and modelling, time series analysis classification & clustering and data visualisation. Install R together with the interface RStudio and get started!

In order to install the R package “datasets” and load it on R, write the following in the R Script and click run.

R datasets package install

Check out its documentation here for a list of data sets available in such package.

For example, view the data set “beaver 2” (line 4, and output in console) and if necessary export it outside of R in .csv format (line 5), in order to be able to access it within other statistical software like Excel or SPSS.