Advanced Statistics (year 2022)
Advanced Statistics (year 2022)
Academic year 2021/2022
- Course ID
- Dott. Davide Ascoli
Prof. Matteo Garbarino
Prof. Giampiero Lombardi
Prof. Michele Lonati
- Teaching period
- To be defined
- Formal authority
Sommario del corso
MARCO PITTARELLO (DISAFA) - Fundamentals of R
R is a wonderful open-source platform and language for exploring, visualizing, and analyzing data with versions for Windows, Mac OS X, and Linux operating systems. For this reason, R is a tool widely used for data science. This module aims at providing an introduction to the use of the R language within the RStudio environment. Specifically, an overview will be given to data structures (vectors, arrays, matrices, lists, and data frames), data input and export, basic data management, exploration and summarization, data visualization, control structures (conditional statements and loops), and user-written functions. The module content will be preparatory to the modules of the Advanced Statistic course.
LUIGI BOLLANI (ESOMAS) - Correspondence analysis
Correspondence analysis (CA) or multiple correspondence analysis (MCA) find their basis in dimensional reduction techniques such as principal component analysis and extend their use to qualitative variables. In particular, the CA allows the analysis of a contingency table, in the case of two qualitative variables. It supports the study of the association, carried out for example with the chi-square test, adding the reasons for the association between the modalities of the two variables. The MCA extends this methodology to several qualitative variables. CA and MCA also produce quantitative factorial dimensions with decreasing importance (i.e. variance); they allow for easy later use of clustering techniques. The module has a practical intention and uses the R environment for data processing.
MATTEO GARBARINO (DISAFA) - Multivariate statistical analysis for ecologists
The module “Multivariate Statistical Analysis for Ecologists” aims at introducing the main multivariate statistical tools (grouping, ordination, group testing and modeling) used in ecological studies. Multivariate stats will be described from the theory and practice point of view. Some of the tools are: Cluster Analysis, PCA, NMDS, RDA, CCA, Mantel test, MRPP, MANOVA, etc. Several software will be presented and discussed (PC-ORD, CANOCO, R, PAST), but only the simpler one (PAST) will be used in the lab exercises.
FRANCESCO FERRERO (DISAFA) - Linear and nonlinear mixed models
Experimental agricultural data are usually grouped by locations, years, blocks, main-plots, randomization units and individuals. A statistical model aims to describe and predict one or more factors, simplifying and understand them. Linear statistical models are an optimal solution to describe reality: they are simple, easily interpretable and adaptable to many situations. There are cases, however, in which the relationship between two variables cannot be described through a straight line, but it is necessary to interpolate the data using a nonlinear model. It is also possible that, for some experimental factors, we are not interested in the observed response for each factor level, but we are mainly interested in the overall variability produced on the experimental units. The mixed-effects model has been one of the mainstays of applied statistics in agriculture. Mixed models provide a very convenient modeling platform, to introduce random effects and account for the differences among groups, based on the estimation of variance and covariance components. The aim of the seminar will be to analyze linear and nonlinear mixed models, with practical application examples using R statistical software.
GIAMPIERO LOMBARDI, MICHELE LONATI (DISAFA) - Cluster Analysis
For the goals of many biological experiments large quantities of data are often gathered. Among the different techniques used to explore data, cluster analysis is commonly used in many fields. Cluster analysis is the task of grouping a set of objects in such a way that objects in the same group (called a cluster) are more similar to each other than to those in other clusters. During the seminar, the theoretical bases of hierarchical clustering will be shown and the different steps to group objects and check the goodness of results will be analyzed. The main clustering techniques will be immediately applied by students using clustering algorithms present in R.
DAVIDE ASCOLI (DISAFA) - Time Series Analysis in R
Analyzing time-oriented data and forecasting is increasingly important in many fields of natural sciences. The seminar will introduce the properties of environmental time series (ts) and mechanisms by which ts evolve over time, studying short-term dependencies and low frequency relationships between ts. Students will analyse ts in real time making an intense use of R to extract time frequency domain properties (e.g. autocorrelation, cross correlation, stationarity), decompose ts (trend, cycles, white noise), and apply linear dynamic univariate and multivariate models, autoregressive stochastic models AR, ARIMA, and Spectral and Wavelet analysis.
Suggested readings and bibliography