While that is true, you could have accomplished the same thing through a special syntax that marked a function as "pure", and establishing the constraint that impure functions cannot be called by pure ones. And then only allowed I/O through a set of impure base functions.
As Erik Meijer said, there are many ways to be dirty, but only one way to be pure.
Haskell allows you to annotate classes of side effects such as "changes state" or "might throw exception", not necessarily the full IO monad, so annotating as pure doesn't make sense.
While (map) is pure, (map print) is not. Thus map is a higher-order function that can create both pure and impure functions depending upon the purity of its arguments. How then would your syntactic scheme allow for higher-order functions?
Actually, in Haskell (map print) is pure; it just doesn't do what you expect it to:
>>> map print ["hello","world","!"]
<interactive>:2:1:
No instance for (Show (IO ())) arising from a use of `print'
Possible fix: add an instance declaration for (Show (IO ()))
In a stmt of an interactive GHCi command: print it
What's going on here? Let's look at the type of (map print) to find out:
>>> :t (map print)
(map print) :: Show a => [a] -> [IO ()]
(map print) is a function which takes a list of values of type a -- such that a can be shown as a String; hence the Show a => constraint -- and returns a list of values of type IO () (pronounced IO unit). These are monadic values representing computations that perform the actual IO. Hence, (map print) is a pure function which carries no side effects.
So, what the heck do we do with this strange list of IO ()s? Well, one answer is to pass them to sequence_:
Ahhh, so now we get to the impure function: sequence_! Actually, sequence_ is a pure function as well. Its type is:
>>> :t sequence_
sequence_ :: Monad m => [m a] -> m ()
sequence_ merely takes a list of monadic values and combines them into a single monadic value, discarding any of the elements' return values and returning () instead.
So if everything is a pure function, how do we actually perform the side effects? The simplest way to think of it is that our whole program is a bunch of pure functions which construct a single value representing all of the side effects that will take place over the lifetime of the program. This single value is called main:
main :: IO ()
main = sequence_ (map print ["hello","world","!"])
With this idea in mind, we can think of Haskell's runtime as taking this single value main and performing the side effects specified throughout our program.
Actually, every function in Haskell is pure. It's just that some of those pure functions produce values of type (IO a) representing IO actions and, if you sequence those actions into the main action (or a subthread's action), those actions will be performed by the runtime.
So when people say that some functions in Haskell are "impure" they mean that they produce IO actions that, if sequenced, will depend upon or cause IO effects. Thus, both
map print ["hello","world","!"] :: [IO ()]
and
mapM_ print ["hello","world","!"] :: IO ()
are equally pure or impure: They both produce actions that have side effects if sequenced. It's just that the first must be sequenced differently than the second since it produces a list of actions and not a singleton action. Since singletons can be sequenced with (>>) and (>>=) you can insert them directly into do notation, which makes many people believe that they are somehow different in terms of purity. (But they are not.)
I wasn't contradicting you but trying to reinforce the point that there's an equivalence between "impure" functions and pure functions that produce actions (having impure effects if sequenced). In particular, I wanted to highlight that (mapM_ print) is not somehow more impure than (map print). Many people seem to believe that it is.
(mapM_ print) produces a pure function that produces a single IO action, and (map print) produces a pure function that produces a list of IO actions. Neither has any effect unless called in a context that sequences the resultant actions into the main IO action that the runtime interprets (or a thread's action).
And my point is that the action that mapM_ ultimately produces is not actually executed unless you sequence it into the main action (or a thread's action). Since mapM_'s eventual action is a singleton, you can do this sequencing with any combinator taking a singleton action, for example (>>) or (>>=), but it must be sequenced nonetheless, the same as for the list of actions that (map print) produces, if you want those actions to be executed.
My point was not to argue that something like that would be better solution, but since you're asking: Having a special syntax would make the learning curve a little shallower for newcomers. And it would simplify certain constructs -- instead of having to lift IO values or using mapM_ or whatever, you could actually deal with the results from impure functions directly, no unwrapping or rewrapping needed.
While using the type system to implement an effects system is theoretically elegant, I think it's a beautiful hack that has made the language fussier and more obtuse in practice.
That would certainly be true if purity enforced by monadic I/O were the end of the story, but it isn't. While new users create a lot of hot air about monads and I/O, intermediate-experienced Haskell users just use them for various different purposes and get on with life.
At the end of the day most of us have differing opinions on what constitutes simplicity and elegance. It's certainly true that a "pure" annotation like you're proposing is a much smaller change to introduce in an imperative setting. I recollect D or Rust or something is doing this. But in the functional programming context the monadic solution is more general, and a two-function type class with 3 (IIRC) algebraic laws is not considered an overbearing amount of complexity, though there are of course interesting alternatives with their own merits.
Sure. IO plays a part of a larger system of monads and functional programming, but it's not a prerequisite for impurity to exist -- it is more like a happy confluence of various strands of functional theory. After all, Haskell had I/O before the IO monad existed (although it was apparently not a happy solution).
Personally, I find monadic I/O theoretically elegant, but it comes at the cost of clumsiness when applied to real-world programs. To me, Haskell's "do" blocks feel like an implicit admission of this clumsiness; they are a crutch to work around the fact that having to constantly wrap and unwrap data is something of a chore.
I guess I just don't see them that way. Most of the time my code is in pure land, and I don't avoid "do" when the result is more readable. I think the ugliest thing in Haskell is probably monad transformer stacks, but that's mainly because I think they're overused by folks who create more abstraction than they need as a matter of habit.
That said, the kinds of things I do on the side with Haskell tend to have a small, well-defined I/O surface, so it could be that I'm spared the worst of it by my interests. I suppose if that weren't the case I'd probably favor OCaml more than I do.
Rust used to have a "pure" annotation, but it's gone, partially because it was a pain to have to write the annotation everywhere, partially because it's not needed for memory safety anymore, partially because nobody can agree on what "pure" means.
I wouldn't have anticipated that, but now that you say it I could see why that would lead to a debate. How would you deal with mutable data structures, for instance? What if it accesses the environment, but in a fashion you could somehow guarantee were safe? In Haskell the programmer can circumvent the system with unsafePerformIO if they know something they can't convince Haskell otherwise, but it almost seems like you'd need a "pure-but-not-really" annotation to do this kind of thing in an imperative language that actually enforced purity.
C++ walked in these very same footsteps. First by not having const, then by having it, then by allowing exceptions to constness, then by introducing const_cast and finally by allowing temporarily mutable const objects.
C++ const is defective because it's a shallow const. You can modify an object through a const pointer.
The D language "fixes" this by making const transitive (and also adding an immutable annotation, which means the object is truely read-only, as in "read-only memory").
I like the uniqueness typing approach in Clean very much.
Function can manage state of variable destructively if the variable is declared unique (there can't be other references, so there can't be side effects from destructively modifying the value).
It's easy to understand. It opens more avenues for fast code, and it keeps the purity.