Inspired by the robustness of human speech processing we aim to overcome the limitations of most modern automatic speech recognition methods by applying general knowledge on human speech processing. As a first step we present a vowel detection method that uses features of sound believed to be important for human auditory processing, such as fundamental frequency range, possible formant positions and minimal vowel duration. To achieve this we took the following steps: We identify high energy cochleogram regions of suitable shape and sufficient size, extract possible harmonic complexes, complement them with less reliable signal components, determine local formant positions through interpolating between peaks in the harmonic complex, and finally we keep formants of sufficient duration. We show the effectiveness of our method of formant detection by applying it to the Hillenbrand dataset both in clean conditions, as in noisy and reverberant conditions. In these three conditions the extracted formant positions agree well with Hillenbrand's findings. Thereby, we showed that, contrary to many modern automatic speech recognition methods, our results are robust to considerable levels of noise and reverberation.