Technology Sharing

Matlab ANOVA

2024-07-12

한어Русский языкEnglishFrançaisIndonesianSanskrit日本語DeutschPortuguêsΕλληνικάespañolItalianoSuomalainenLatina

In order to stabilize the production process and achieve high quality and high yield, it is necessary to analyze the factors that affect product quality and find out those factors that have a significant impact. In addition to studying the mechanism, it is often necessary to conduct many experiments, analyze and compare the results, and seek for rules. The method of using mathematical statistics to analyze test results and identify the degree of influence of each factor on the results is called Analysis of Variance, denoted as ANOVA.

The test results are called indicators, the conditions that need to be examined and controlled in the test are called factors or factors, and the state of the factors is called levels. According to the number of factors, it can be divided into one-way ANOVA and two-way ANOVA.

Table of contents

1. One-way ANOVA

1.1 Mathematical model

1.2 Statistical analysis

1.3 Analysis of variance table

1.4 Matlab Implementation

(1) Balanced data

(2) Unbalanced data

1.5 Multiple Comparisons

2. Two-way ANOVA

2.1 Mathematical Model

2.2 Two-way ANOVA without interaction

​Edit 2.3 Two-way ANOVA on interaction effects

2.4 Matlab Implementation


1. One-way ANOVA

Consider only the influence of one factor A on the index of interest. A takes several levels and several tests are conducted at each level. During the test, all factors affecting the index except A remain unchanged (only random factors exist). Our task is to infer from the test results whether factor A has a significant effect on the index, that is, whether the index is significantly different when A takes different levels. The index at a certain level of A is regarded as a random variable. Judging whether the index is significantly different when A takes different levels is equivalent to testing whether the means of several populations are equal.

1.1 Mathematical model

1.2 Statistical analysis

From the additivity of the chi-square distribution we get:

1.3 Analysis of variance table

The significance level generally used in variance analysis is:α = 0.01, reject H0, saying that the influence of factor A (or the difference between the levels of A) is very significant; takeα = 0.01, do not reject H0, but takeα = 0.05 , reject H0, saying that the influence of factor A is significant; takeα = 0.05 , H0 is not rejected, and factor A is said to have no significant effect.

1.4 Matlab Implementation

The command for one-way ANOVA in the Matlab statistics toolbox is anovalIf the number of data in each group is equal, it is called balanced data. If the number of data in each group is unequal, it is called unbalanced data.

(1) Balanced data

The usage for processing balanced data is: p=anoval(x)

return value p is a probability, when p > α Accept H0 , x is an m× r data matrix, and each column of x is a level of data (here the sample size ni = m at each level). In addition, a variance table and a Box plot are also output.

Example:

  1. x=[256 254 250 248 236
  2. 242 330 277 280 252
  3. 280 290 230 305 220
  4. 298 295 302 289 252];
  5. p=anova1(x)

We get p = 0.1109 >α = 0.05, so we cannot reject the null hypothesis and accept H0, which means there is no significant difference in the productivity of the 5 workers.

The variance table corresponds to columns 1 to 4 of the one-way ANOVA table above. F = 2.262 is the 1− p quantile of the F(4,15) distribution. It can be verified that fcdf(2.262,4,15)=0.8891=1-p.

The Box plot reflects the characteristics of the productivity data of 5 workers.

(2) Unbalanced data

The usage for handling unbalanced data is: p=anova1(x,group)

x is a vector, with data arranged in sequence from the 1st group to the rth group; group is a vector of the same length as x, marking the group of data in x.

Example:

  1. clc,clear;
  2. x=[1620 1580 1460 1500
  3. 1670 1600 1540 1550
  4. 1700 1640 1620 1610
  5. 1750 1720 1680 1800];
  6. x=[x(1:4),x(16),x(5:8),x(9:11),x(12:15)];
  7. g=[ones(1,5),2*ones(1,4),3*ones(1,3),4*ones(1,4)];
  8. p=anova1(x,g)

The result is: p=0.0331<0.05, so there are significant differences in the lifespan of bulbs made using several processes.

 

1.5 Multiple Comparisons

In the problem of bulb life, in order to determine which processes have significant differences in bulb life, we first calculate the mean of each group of data:

Although A1 has the largest mean, multiple comparisons are still needed to determine whether it is significantly different from the other types. Generally, multiple comparisons require pairwise comparisons of all r populations to analyze the differences between them. The number of comparisons can be reduced according to the specific situation of the problem.

  1. clc,clear;
  2. x=[1620 1580 1460 1500
  3. 1670 1600 1540 1550
  4. 1700 1640 1620 1610
  5. 1750 1720 1680 1800];
  6. x=[x(1:4),x(16),x(5:8),x(9:11),x(12:15)];
  7. g=[ones(1,5),2*ones(1,4),3*ones(1,3),4*ones(1,4)];
  8. [p,t,st]=anova1(x,g)
  9. [c,m,h,nms] = multcompare(st);
  10. [nms num2cell(m)]

2. Two-way ANOVA

If we want to consider the impact of two factors A and B on the indicators, we can divide A and B into several levels, conduct several experiments on each level combination, and perform variance analysis on the obtained data to test whether the two factors have significant effects on the indicators separately, or further test whether the two factors have significant interactive effects on the indicators.

2.1 Mathematical Model

2.2 Two-way ANOVA without interaction

If it can be determined in advance based on experience or some analysis that there is no interaction between the two factors, each group of experiments does not need to be repeated, and

t = 1, the process is greatly simplified.

 

Two-way ANOVA table without interaction effect:

2.3 Two-way ANOVA on interaction effects

Two-way ANOVA table for interaction effects:

2.4 Matlab Implementation

Anova2 was used in the statistics toolbox to perform two-way ANOVA.

The command is:p=anova2(x,reps)

The data in different columns of x represent the changes in a single factor, and the data in different rows represent the changes in another factor. If there is more than one observation for each row-column pair ("unit"), the parameter reps is used to indicate the different labels of the multiple observations for each "unit", that is, reps gives the number of repeated experiments t.

  1. x=[58.2 56.2 65.3
  2. 49.1 54.1 51.6
  3. 60.1 70.9 39.2
  4. 75.8 58.2 48.7];
  5. [p,t,st]=anova2(x)

We obtained p=0.4491 0.7387, which indicates that the differences between various fuels and various thrusters have no significant effect on the rocket range.

  1. clc,clear
  2. x0=[58.2,52.6 56.2,41.2 65.3,60.8
  3. 49.1,42.8 54.1,50.5 51.6,48.4
  4. 60.1,58.3 70.9,73.2 39.2,40.7
  5. 75.8,71.5 58.2,51.0 48.7,41.4];
  6. x1=x0(:,1:2:5);x2=x0(:,2:2:6);
  7. for i=1:4
  8. x(2*i-1,:)=x1(i,:);
  9. x(2*i,:)=x2(i,:);
  10. end
  11. [p,t,st]=anova2(x,2)

We obtained p=0.0035 0.026 0.0001, which are all less than 0.05, so the hypothesis of equal means can be rejected. That is, it is believed that there are significant differences in the range under different fuels (factor A) and different thrusters (factor B), and the interaction is also significant.