MAST10010: Data Analysis 1

Assignment 2

Due Date: Monday October 4th, 11.59pm.

Your assignment must be submitted to Gradescope by 11.59pm Mon- r>day 30th August.

When you submit, you must select the pages for each question part,

or your work may not be marked.

Assignments submitted late will incur a penalty of 1% per hour (or

part thereof).

If you have exceptional circumstances that prevent you from meeting

the deadline, please email [email protected], and I may be able

to grant an extension.

Tutors may not help you directly with assignment questions. They

may, however, provide some appropriate guidance.

Please ask on the discussion board if you need clarification on the

wording of questions.

It is recommended to produce a single Word document which includes

all the relevant graphs, statistics and comments. If you need to include

formulas or calculations, you may include photos of handwritten notes

(or use equation editor, or any other method).

This assignment consists of four (4) questions worth a total

of 40 marks. It contributes 5% towards your final grade.

1

Instructions

Software:

You must use Minitab to produce any graphs, tables and descriptive statis-

tics.

Graphs:

must include your name/student number, which can be added by right-

clicking the graph and selecting Add → Footnote or Add → Subtitle.

must be relevant. You may look at many graphs, but you should only

include the most relevant graph for each question.

should be clear: ensure that labels and titles are correct and appro-

priate; you can add gridlines/change symbols/colour as appropriate to

make the graph clearer. There are some marks awarded for improving

upon the default from Minitab.

Mac Users: you will need to use myUniApps in order to edit the

graphs as required above.

Statistics:

Must be relevant: you will be penalised for including statistics which are

not relevant to the questions asked.

Comments:

must be in the context of the data.

should be supported by relevant statistics where possible.

should be concise and informative. Word limits, where given, must

be strictly adhered to (all word limits are a maximum, you will be

penalised for going over this limit!). You may use dot-points.

Question 1: Which wine sample pre-treatment is best for de-

termining the Strontium isotope ratio? [1 + 7 = 8 marks]

This question is based on data from the study by Caterina Durante et

al. (2015) ‘An Analytical Approach to Sr Isotope Ratio Determination in

Lambrusco Wines for Geographical Traceability Purposes’, Food Chemistry,

Vol. 173, 557–563. You can read the abstract here (link also on the LMS):

https://www.sciencedirect.com/science/article/abs/pii/S0308814614016483.

You DO NOT need the information from the research article to

answer the questions; it is provided for interest only.

2

Accurately determining isotope ratios (in this study, the Strontium iso-

tope ratio 87Sr/86Sr) can enable detection of counterfeit products. This

study considered 18 Lambrusco wine samples (which come from a particular

protected region) and two different sample pre-treatments: microwave and

low temperature mineralisation. The samples of wine were from a variety of

years (2009–2012) and locations, and were analysed by both methods. The

question you will investigate here is whether there is a difference between

these two pre-treatments.

The data is available as Asst2 2021 wine.csv on the LMS Assignment

2 page.

(a). Explain why these data are paired.

(b). Conduct a hypothesis test to determine whether there was a differ-

ence in the two pre-treatments. You should use a significance level of

α = 0.01 for this test. Show all of your calculations and steps.

Your answer needs to include:

A clear statement of the hypotheses in terms of the parameter(s)

of interest.

A calculation showing how se(estimator) was determined.

The calculation of the test statistic, and its distribution under

the null hypothesis.

A range for the P -value, based on the Minitab output below.

Your conclusion in the context of the data.

Question 2: Does the interface used for training affect ability

to learn?

[1 + 2 + 4 + 3 + 4 + 1 + 1 = 16 marks]

This question is based on simulated data for the study by J. Jung and

Y.J. Ahn (2018) ‘Effects of Interface on Procedural Skill Transfer in Virtual

3

Training: Lifeboat Launching Operation Study’, Computer Animation &

Virtual Worlds, Vol. 29, e1812. You can obtain this article from the Library

website (online Journal search).

You DO NOT need information from this article to answer the

questions; it is provided for interest only.

The study investigated training to launch a lifeboat: a sequence of more

than 30 items which need to be completed in a precise order to safely transfer

a lifeboat from its stowed position to the water. We will consider a subset of

the original study, where 32 individuals were trained using a virtual reality

training module using a head mounted display and wearables (which mea-

sure gestures); or by the traditional training method of a lecture video and

associated material. There were 16 participants in each group, and their

scores on a procedural knowledge test after the training were recorded.

The data is available as Asst2 2021 training.csv on the LMS Assign-

ment 2 page.

(a). Explain why a lecture and materials was chosen as the second treat-

ment, rather than doing nothing.

(b). Produce an appropriate graph showing procedural knowledge scores

for both groups.

(c). Comment on the effect of training method on procedural knowledge

score. You should support your comments with relevant statistics, but

do not include Minitab output.

Your comments must be less than 100 words.

(d). Calculate a 95% Confidence Interval for the difference in mean test

scores for the two groups. Show all of your calculations (do not include

Minitab output, but you may use Minitab to obtain summary statistics

and relevant distribution values).

(e). What assumptions have you made in calculating this interval? Were

they satisfied? (You need to provide evidence, in the form of one graph

and a calculation.)

(f). Without doing further calculation, would a test of the hypotheses H0 :

µ1 − µ2 = 0 and H1 : µ1 − µ2 ̸= 0 be significant at the α = 0.05 level?

Explain briefly.

(g). Does this study provide evidence of a causal relationship? Why/why

not?

4

Question 3: Unethical Behaviour at an Accounting Firm

[4 + 2 + 2 + 3 = 11 marks]

This question is inspired by the report “‘Unethical behaviour’: KPMG Aus-

tralia fined by US watchdog, Sarah Danckert (2021)The Age (https://www.

smh.com.au/business/banking-and-finance/unethical-behaviour-kpmg-australia-

fined-by-us-watchdog-20210915-p58rso.html), also linked on the LMS.

You DO NOT need information from this article to answer the

questions; it is provided for context only.

A (fictional) large accounting firm, EPCWhitte, concerned by the penalty

issued to KPMG, is investigating the proportion of staff who have simi-

larly cheated on training and qualification tests. A small survey (conducted

anonymously by an independent agency), found that 8 out of the 26 people

sampled had cheated. You should assume that the rate of cheating reported

by the article (12%) is the true rate of cheating at KPMG.

(a). Conduct an approximate Hypothesis Test (using α = 0.05) to deter-

mine if there is a difference between employees of EPCWhitte and

KPMG. Show all of your calculations and steps.

Your answer needs to (the 5 step process meets these requirements):

State the hypotheses in terms of the parameter(s) of interest.

Calculate sd(estimator).

Calculate the test statistic, and give its distribution under the

null hypothesis.

Give the P -value for the test, using Minitab (you should not use

Minitab for other parts of this question).

State your conclusion in the context of the data.

(b). Use Minitab to perform an exact Hypothesis Test (also using a signif-

icance level of α = 0.05) to determine if there is a difference between

employees of KPMG and EPCWhitte. You only need to provide the

Minitab output and a conclusion (in context).

(c). Explain why there is such a large difference between the P -values for

the tests you performed in (a) and (b). You may include an additional

calculation or a graph.

(d). Based on the evidence available, it is believed that the proportion of

EPCWhitte employees who have cheated on training and qualification

tests is no more than 40%. Researchers would like to estimate the

proportion using a 90% confidence interval based on a normal approx-

imation, with a maximum margin of error of 0.02. What sample size

would be required to achieve this? Show your calculations as well as

your answer.

5

Question 4: Interpreting Research [3 + 2 = 5 marks]

This question requires you to interpret the following small section of the

article: Nitschke, J. et al. (2021) ‘Resilience during uncertainty? Greater

social connectedness during COVID-19 lockdown is associated with reduced

distress and fatigue’, British Journal of Health Psychology, 26, 553–569.

You DO NOT need information from this article to answer the

questions; it is provided for context only.

As expected, due to lockdown, the mean frequency (1 = ‘not at

all’; 7 = ‘more than 10 times a day’) was significantly lower for

inperson interactions (2.51; SD ± 1.26) compared with online

interactions (3.18; SD ± 1.07; t (901) = 13.401, p < .001).

(a). Clearly state the null and alternative hypotheses being tested in the

excerpt above.

(b). Explain what “SD ± 1.07” is likely to mean, and suggest a reason why

it should be included.

Relevance & Formatting [-5 marks]

Up to 5 marks will be deducted if you include anything (including, but not

limited to, graphs and Minitab output) that is not specifically requested in

any question. They also will be deducted if you do not select the correct

pages for each question when you submit your assignment.

6

欢迎咨询51作业君