Data Sets
Application Task 2002 - support resources
Data set (XLS - 115KB) - Domestic Wine Sales (monthly from September 1981 to Aug 2001)
Data set (XLS - 26KB) - Road Fatalities (monthly from January 1989 to September 2001)
Data set (XLS - 47KB) - Estimated Resident Population (monthly from September 1993 to March 2001)
Further Mathematics - Theme: Well-being indicators
Data sets can reveal trends in society and the way in which we live as easily as they are used to look for trends in the economy. Well-being indicators are those that can be used to help make decisions on the general Well-being of our society.
Two data sets have been supplied that can provide frameworks for Well-being analyses. The first is Australian domestic wine sales by wine makers. This data set covers sales categories such as table wine sales, fortified wine sales and carbonated wines sales. As such it could be used to either investigate a particular series or to compare two series and interpret the changes in Australian wine tastes over time.
The second data set contains road fatality data for Australia, provided by the Australian Transport Safety Bureau. It is broken into various road user categories. Seasonality, trend, ratios and correlation analyses could all form the core of suggested directions that an investigation could take.
A third data set is supplied, the estimated resident population of Australia and its territories. This data set could be used in conjunction with either of the other sets to provide a frame of reference for analysis. Of course another alternative is to investigate possible links between wine sales and road fatalities.
The first starting point covers components and key assessment features for an investigation based on data set 1. The second starting point lists a number of possible directions that an investigation based around data set 2 might take.
¨ Data set 1 - The number of bottles of Australian wine sold in Australia by winemakers (monthly from Sep 1981 to August 2001)
¨ Data set 2 - Road fatalities by road user type for Australia (monthly from Jan 1989 to Sep 2001)
¨ Data set 3 - Estimated resident population for Australia and the states and territories (quarterly from Sep 1984 to Mar 2001)
Key Knowledge for Outcome 1 relevant to this theme (with corresponding key skills) would include knowledge of:
- the standard statistical terms and techniques used to display, summarise and describe univariate data for both categorical and numerical data;
- the concept of sample and population and the use of random numbers as a means of selecting a simple random sample of data from a population;
- the standard terms and techniques used to display and describe associations in bivariate data for both categorical and numerical data;
- the technique of regression as a means of modeling the relationship between two numerical variables with a straight line;
- the role of residual analysis and the coefficient of determination in making decisions about the appropriateness of a particular regression model;
- the concept of data linearisation through transformation;
- the terms used to describe standard patterns in time series in qualitative terms, the role of smoothing in helping to identify these patterns, and some simple techniques for quantifying these patterns;
- the assumptions and/or limitations that underlie the applications of statistical techniques.
All aspects of Outcome 2 and Outcome 3 are relevant.
Starting Point 1: Investigate and compare table wine sales with fortified wine sales
Investigate how Australian tastes have changed over time. Investigate table wine sales over time. Also investigate fortified wine sales (such as Sherry) over time. Compare and interpret your findings.
Component 1: selection of a random sample of the data and construction of time-series plots
Component 2: description of the data in the plot, summary and description of the distribution of the dependent variable
Component 3: deseasonalision of the data set and the use of a trend line for predictive purposes and analysis on the fit of the model
Key assessment features
The following key knowledge (with corresponding key skills) for Outcome 1, in addition to that listed under the theme, is particularly appropriate for this starting point:
- the standard statistical terms and techniques used to display, summarise and describe univariate data; the concept of sample and population and the use of random numbers as a means of selecting a simple random sample of data from a population; the technique of regression as a means of modelling the relationship between two numerical variables with a straight line; the terms used to describe standard patterns in time series in qualitative terms, the role of smoothing in helping to identify these patterns, and some simple techniques for quantifying these patterns.
Important aspects of mathematics to be considered in assessment of student work are:
- use a range of standard statistical techniques and terms to display, summarise, describe and interpret patterns in data, and outline the assumptions and/or limitations relating to the application of these skills; obtain a simple random sample from a given population using a table of random numbers or an alternative random number generator; use the technique of linear regression to model a relationship between two numerical variables; interpret the parameters in a regression equation in relation to the situation being modelled; display a time series (which may have been smoothed where necessary) graphically and use this to identify and describe any possible trend patterns; use a range of simple techniques to describe features such as seasonality and trend in time series.
Starting Point 2: Investigate road fatalities by road user type
Source: Australian Transport Safety Bureau
Some possible directions that an investigation based on data set 2 might take:
1. Seasonality
Does the number of pedestrians (or car or motorcyclists or bicyclists) fatalities seem to have a seasonal component?
¨ Select three years at random (need not be consecutive)
¨ Plot the data for each of the three years (fatalities against month) on the same time series plot (ie lines on top of each other)
¨ Comment on the seasonality
¨ Repeat with the fatalities expressed quarterly rather than monthly, comment and compare monthly figures.
¨ Determine seasonal indices.
¨ Deseasonalise data.
2. Trend
(Teachers can choose to deseasonalise the data before the analysis if they wish).
To examine trend any of the following could be investigates:
¨ Select three years consecutive monthly data for any of the variables, and construct a time series plot, carry out a regression analysis
¨ Look at annual totals for any of the variables for the entire period, construct a time series plot, carry out a regression analysis
¨ Select one month (eg Jan) and look at this month every year for the whole time period, construct a time series plot, carry out a regression analysis
3. Ratios
¨ Consider the number of motorcycle fatalities as a proportion of the population, or of the number of motorcycle registrations (data supplied in 2000).
¨ Look at the trend in this ratio over time.
¨ Which is the better explanation of how things change? Why?
4. Correlational Analyses
(Use annual or deseasonalised data, unless the seasonal patterns are the same for the two variables you choose)
¨ Construct a scatterplot for the number of fatalities (eg. pedestrians and bicyclists)
¨ Describe the relationship.
¨ Calculate correlation coefficient, coefficient of determination, regression equation. Interpret all.
Teachers are also encouraged to find their own data sets and develop their own investigation based around the theme. Some good data sets are available free from the Australian Bureau of Statistics (http://www.abs.gov.au). Teachers may also like to investigate the Australian Statistical Internet Sites page of the National Library of Australia (http://www.nla.gov.au/oz/stats.html).
Another alternative source of data is the Social Science Data Archive which is maintained by the Australian National University (http://ssda.anu.edu.au). This non-profit organisation will make available to schools a CD of suitable data sets in Excel format for a reasonable cost.
