In 1967 P.D. Welch wrote a paper titled "The Use of Fast Fourier Transform for the Estimation of Power Spectra: A Method Based on Time Averaging Over Short, Modified Periodograms" that appeared in IEEE Trans. Audio Electroacoustics, Vol. AU-15 (June 1967), pp. 70-73.
The Welch method splits a set of data into smaller sets of data and calculates the periodogram (the power spectrum) of each set. Then the frequency domain coefficients arising from calculating the periodograms are averaged over the frequency components of each data set. This results in a power spectrum which is a smoothed version of the original, with less noise. In this way, the Welch method is a way of low pass filtering data.
Welch is the one who proposed using the estimated standard error given on p.
Welch's test uses a T distribution as an approximation for the (null) sampling distribution of the test statistic, and some call it the unequal variance two-sample t test, and I suppose that these things have created a bit of confusion.
Welch's test is a test about the distribution means, whereas the W-M-W test is a test for the general two-sample problem (testing equal distributions against the general alternative) that can sometimes be used as a test about distribution means.