Talk:Email system/Draft

From Citizendium
< Talk:Email system
Revision as of 17:13, 19 August 2009 by imported>David MacQuigg (→‎Adding a machine-level explanation)
Jump to navigation Jump to search
This article has a Citable Version.
Main Article
Discussion
Related Articles  [?]
Bibliography  [?]
External Links  [?]
Citable Version  [?]
 
To learn how to update the categories for this article, see here. To update categories, edit the metadata template.
 Definition General overview of how the Internet electronic mail system works. [d] [e]
Checklist and Archives
 Workgroup category Computers [Categories OK]
 Talk Archive none  English language variant American English

This article is intended to be the most basic on how the Internet email system works (as opposed to history, applications of email, etc.) Our target audience includes non-technical professionals such as lawyers and administrators who make policy involving email systems. We will defer to subtopics details such as message formats and transfer protocols. Topics relating to email security are also proper subtopics, because it is much easier to discuss email security once you understand how the system works. Email abuse (spam, phishing, etc.) is a related topic, because it does not expand on or depend on this article.

Here is our current thinking on how this hierarchy of topics should be developed:

Email System
 Parents
   Computers > Networks > Applications > Email
                        > Internet > Email
 Subtopics
   Email Processes and Protocols
     SMTP           (RFC-5321)
     POP            (RFC-1939)
     IMAP           (RFC-3501)
     Submission     (RFC-4409)  port 587 
   Message Formats  (RFC-5322)
     Multipurpose Internet Mail Extensions (RFC-2045..2049)
     Message Headers (RFC-5322)
   Authentication Methods
     SPF
     SenderID
     DKIM
     CSV
     
 Other
   TCP
   DNS
   PGP
   Kerberos
   History
   Abuse
   Email User Programs (Webmail)

Progressing the article

First, I'm sorry; I think I missed the retitling.

From a procedural standpoint, I'm going to make suggestions on the talk page rather than directly edit them into the article. By doing so, I will be able, eventually, to Approve it on my own. If I made substantial changes, we'd need several Computers editors to approve.

Let's begin with the "lede". Right now, it's a bit too narrative and outside the CZ opening paragraph convention. An opening sentence, unless it just won't work gramatically, should restate the title in bold and explain briefly what the term means. Material about the purpose and context follows, but isn't the role of the first sentence.

I'd avoid just saying "see textbooks" in the introduction. Very short definitions in the opening paragraphs can be appropriate, and then use wikilinks, Related Pages, Bibliography, and External Links.

Since I know you are using "Actor" as a term of art, a brief explanation would help.

Next, start a subhead for "Architecture".

You have a lot of terms with external footnote definition, such as Transmitter, Relays, MDA (not defined except in the graphic; do think of the reader using text-to-speech), etc. External references, and even footnote definition, are often our last preference.

You can define some as subtopics in the article. For example, I'd define Transmitter and Relay under subheadings, perhaps as second-level subheads, and internally wikilink using a structure such as [[#Relays|Relays]]. Certainly, that subhead can be brief and then, preferably, link to at least a stub article, where the external references can be heavier. I personally dislike having much beyond citations and abbreviation expansions in footnotes; if an explanation, as for Relay, is important, it should be in the main article.

"Let's follow a message from start to finish." should form yet another top-level section, with appropriate subheads for readability.

Consider a section introducing administration, error handling, and defense, again that primarily links to other articles.

Rev. 2

I've posted the second revision of this article at User:David_MacQuigg/Sandbox/Email_System following the suggestions above. Feedback from the editor on the second rev tells me maybe we want to add a few footnotes back in. I'll wait for more on that. --David MacQuigg 01:04, 20 May 2009 (UTC)

Rev. 3

Resynchronizing...should I be looking at the page here or the sandbox?

If the page here, several first comments with a first cup of coffee.

If our goal is to be the best self-contained reference, I wouldn't refer to hard copy texts in the lede, even as introductions. Instead, I'd even redlink to tutorials subpages, or at least link to online presentations.

Good elementary discussions of these topics can also be found in most texts on computer networks.[1]

<ref name=PnD>{{citation | author = L. Peterson, B. Davie | title = Computer Networks: A Systems Approach | edition = 4th | year = 2007 | contribution = Sect. 9.1.1 Electronic Mail}}</ref> And yes, Bruce Davie is good people.

Next, we do have the usual CZ lede conventions, such as bolding the article title, or as close as grammatically possible, in the first sentence. As far as the title, maybe it's me, but I still wince at "email" rather than "electronic mail" as a formal title.

For the layman, however, I do think we need a very basic definition of the problem: passing "envelopes" around a "postal system". I've added a few words. Howard C. Berkowitz 12:59, 6 July 2009 (UTC)


Email Message Transfer

This is a table of definitions that can be worked into a subtopic article that delves into more technical details later on... -- Eric Gearhart

Term Definition
MTA Mail Transfer Agent; the software on the server side for moving email messages around and forwarding them to other email server hosts
MDA Mail Delivery Agent; the server that accepts mail for a user from a remote MTA and holds it until the user's mail client (their MUA) downloads the message
MUA Mail User Agent; a fancy name for an email client such as Mozilla Thunderbird or Microsoft Outlook. Nowadays a MUA can actually reside in a web browser or in a mobile phone as well
SMTP Simple Mail Transfer Protocol; the protocol used to transfer mail from one mail system to another. Uses port 25 or 587 for unencrypted message transfer.
POP Post Office Protocol; A protocol where a client connects, downloads mail from the server and then deletes that mail from the server. Mail that is downloaded then "sticks" on the computer the user retrieves their mail from. Contrast with IMAP.
IMAP Internet Message Access Protocol; IMAP differs from POP in that messages are left on the server; this allows a user to "float" between different clients at different locations but still have access to all their mail
Mutter...I have other names for Microsoft Outlook...Howard C. Berkowitz 20:10, 6 July 2009 (UTC)
I've added the table, and expanded it in the article Email processes and protocols. --David MacQuigg 18:15, 16 August 2009 (UTC)

Rev. 03dmq

I've got a new rev at User:David_MacQuigg/Sandbox/Email_System_03. This is basically the same as Rev.2, but I've completely re-written the introductory paragraphs. The new introduction puts major emphasis on the distinction between machine-level and administrative-level entities. The previous text was clear to non-expert readers (most students), but readers with some experience in email systems were getting confused. I think the problem might be that the experts were skimming the introduction, and reading the article thinking we were still talking about machine-level entities. This new introduction should get through to all but the most narrowly-focused experts. There will always be a few who say things like an agent can't be a person. It's a little like arguing over the meaning of the word 'hacker'.

I thought about Eric's suggestion of adding a sidebar with definitions, but I think the best place for that level of detail will be in the subtopic on Email Message Transfer. In this top article, I would like to let words like actor and agent (now lower case) have their plain-english meanings (individuals or organizations), and introduce the special meaning (computer processes) in the more technically detailed subtopic. CS students already have plenty of experience using words like actor to mean an object in a computer program.

Non-technical readers will probably never get to the subtopics. I think it is OK that they won't learn acronyms like MTA. Think of an administrator in a government agency, or a staffer on some congressional subcommittee. If they understand this top article, we will have accomplished a lot.

Other issues from the comments above:

- Definition of roles such as Transmitter, Receiver, etc.
  I have added a subtopic, Email Agents, defining these roles
  more precisely in terms of their responsibilities, e.g.
    Transmitter
    - Spam Prevention
      - rate limits, content analysis, alerts
      - respond to spam reports
      - maintain reputation
    - Authentication
      - RFC compliance
      - IP authorization (SPF, SID, CSV, ...)
      - signatures & key management (DKIM ...)
      - Return Address validation code
 See http://open-mail.org/MHSmodels.html for more examples.
 
- Administration, error handling, and defense.
  See Related Articles: Message Transfer and Email Abuse.  I'm not sure
  what you have in mind for an introductory section in the main article.
  Unless we can really add value, I would keep the main article short.

- Email vs Electronic Mail.
  My preference for the title is Email, but either way is OK with me.
  I recall reading a discussion on Wikipedia, in which the decision was
  email, over e-mail and electronic mail.  If we are writing for the next
  generation, email is the word they grew up with.
 
- Postal System Analogies
  Most of these are too superficial to add any value, and may actually add
  confusion.  The intermediate post office is more like a router than an
  SMTP relay.  SMTP relays are like workstations within a post office that
  perform special functions, such as reading zip codes.  The one analogy I
  have found useful is between the MAIL FROM address in an email session,
  and the Return Address on a postal envelope.  This will fit nicely in the
  subtopics on Message Transfer and Simple Mail Transfer Protocol.
 

--David MacQuigg 07:29, 8 July 2009 (UTC)

Someone really must write MTA, as defined by the Kingston Trio. The song actually is a good description of a loop. Howard C. Berkowitz 19:07, 12 August 2009 (UTC)

Adding a machine-level explanation

After struggling with how to be more clear on the different model levels (administrative vs machine-level), I decided to add a whole new subtopic Email Processes and Protocols, which basically repeats the simple explanation in the main article, but this time using a machine-level model, pretty much the same as the model in RFC-5598. It also includes an expanded version of Eric's table of acronyms. The earlier machine-level subtopic Email Message Transfer now has some overlap with the new subtopic, but it is still worth keeping, since it provides a detailed example of an SMTP session. Maybe we should rename this subtopic as SMTP Session, and link to it from other articles, like the main SMTP article.

--David MacQuigg 18:59, 12 August 2009 (UTC)

Let me review it a little later. Lower-case editorial observation: unless an article title is a proper name, such as Simple Mail Transfer Protocol, it should not have other than initial caps (i.e., Email processes and protocols).
Understood. The article itself follows the correct style. I was using caps to "delimit" the title in the discussion on this Talk page. I'll use double quotes in the future ( "Email processes and protocols" ). --David MacQuigg 20:00, 12 August 2009 (UTC)
Now, as a lower layers and operating system sort of person, are we truly talking about a machine level, or a host level? It would seem to me that everything you are describing as "machine" could be a virtualized host process, moving among multicore processors, and even among clustered or distributed machines, invisible to the mail process itself. Howard C. Berkowitz 19:06, 12 August 2009 (UTC)
I make a distinction between machines, programs, and processes, as distinction that unfortunately is blurred in terms like MTA. When I don't clarify, I'm usually talking about a process. Let me know if there are any places in the article needing clarification (e.g. A key responsibility of the MSA is to ... ). --David MacQuigg 20:00, 12 August 2009 (UTC)
I'd emphasize process over machine. What if your mail server is physically four boxes, each with a quad-core processor? Howard C. Berkowitz 21:05, 12 August 2009 (UTC)
I agree. Processes are the elements, the smallest unit, in the "machine-level" model. The diagram in Email processes and protocols shows these processes grouped by machine. This makes the figure a little more concrete, and easy for students to understand. Other than that, we don't need to discuss machines. The only change in the figure would be to separate the Relay and Delivery processes on the Mailstore machine, and give each a separate connection to the Mailstore disks. --David MacQuigg 14:38, 17 August 2009 (UTC)
Using "network layer" in the second paragraph is also confusing, especially when you later say routers are out of scope. What you are describing, I think, is a message transfer sublayer of the application layer. Howard C. Berkowitz 03:01, 17 August 2009 (UTC)
We could also call it an "overlay network", and be more consistent with textbook models (e.g. Section 9.4 Overlay Networks, Peterson & Davie 4th ed.). "Sublayer" seems to imply something entirely enclosed within a layer. The nodes and links in this "Administrative Layer" are organizations and their relationships (e.g. the Recipient's network might include his Receiver, one or more Forwarders, and his Delivery Agent). Each of these administrative nodes might include any number of SMTP Relays and other processes on the Application Layer. The nodes and links on the Application Layer are the mail-handling processes and the TCP connections between them. The nodes and links on the layer below that are the routers and physical links between them.
The essential thing we have done here is define an Administrative Layer, and invoke the principle of Separation of Concerns to simplify the discussion of each layer. Perhaps I should put the link to the Separation of Concerns article right here where we first talk about network layers, instead of in the "processes and protocols" subtopic. The explanation above could then go in the subtopic where being concise is not so critical. --David MacQuigg 14:38, 17 August 2009 (UTC)
Much as Bruce Davie is a nice guy, I'd much prefer to use RFC rather than textbook definitions. BGP and AS definitions extensively discuss administrative relationships, but not as layers. For that matter, such a distinction is fundamental to provider-provisioned and customer-provisioned VPNs, with the PE and CE model. These are treated not as layers, but as interdomain border problems. Howard C. Berkowitz 14:47, 17 August 2009 (UTC)
Neither Peterson & Davie, nor the next most popular text, Tanenbaum, use RFC terminology in their discussions of email systems. The problem is that the terminology of the RFCs has evolved from the acronyms and mysterious jargon of experts, without any tutorial purpose. Terms like MSA and MDA, for example, can have many definitions (machine, program, process, or even "service"), and they depend on context to resolve this ambiguity. The RFC terms may be adequate for email experts, but in my opinion are inappropriate for students who may spend as little as one day studying these systems. What I have done (in Email processes and protocols) is expand on Eric's suggestion. Provide a table of the most common RFC terms. Most students don't need these, and the few who might become email system administrators will need to read the RFCs anyway.
As for the separation of administrative concerns into a separate administrative layer, I think this is entirely consistent with the examples of overlay networks in Peterson & Davie. The essential thing is that the nodes and links in the overlay network can treat the underlying network as transparent to their own arrangements, be it an encrypted VPN link or an agreement to whitelist all mail from a particular domain. Overlay networks are "ad hoc". The don't need to have the same status as the Application Layer. They can even be completely private arrangements, running on top of the public Internet.
I think the key questions we need to resolve are 1) Who are we writing these articles for?, and 2) For this intended audience, how well do these articles describe real systems, enhance our understanding of key current issues like security, and facilitate better decisions regarding email systems. The audience I have in mind for this top article is administrative folks, like maybe a staffer on a congressional committee. For the subtopics, we can assume a little more technical sophistication, maybe students of computer networks. The typical treatment of email systems at this level is very superficial. These articles intend to give students an in-depth understanding, without the burden of reading the RFCs.
Maybe what we really need here is two alternative presentations. I have tried, but been unable, to write an article that is both clear to students, and follows the modeling and terminology in RFC-5598. Maybe someone else can give it a try. --David MacQuigg 17:26, 17 August 2009 (UTC)
I've added a Figure 2 to Email processes and protocols using the RFC-5598 terminology. Is this helpful? Should we do away with Figure 1 entirely, and change the text to use these RFC terms? --David MacQuigg 02:25, 19 August 2009 (UTC)
Let me answer indirectly; the problem is one that I understand to be CZ policy. While an author certainly can add a synthesizing explanation to an authoritative term, I don't think we can be either inventing terms, or, in the case of network architecture, using textbook terms rather than RFC or ISO. I've seen any number of huge errors in textbooks.
Davie is an ATM and MPLS expert; I don't know Peterson's background. If this were a textbook by someone like Dave Crocker, I'd be more comfortable, but I haven't read these textbooks and can't speak to their correctness. I have read RFCs (admittedly not the latest email ones, but I can do that). Now, "domain" and "subdomain", as well as "overlay network", are reasonably well accepted RFC terms in quite a number of areas, such as VPNs and BGP.
OK. Time is short. I'll do my best to follow your instructions and minimize discussion. I'll use RFC terminology in the subtopics, but I would like to stick with plain English in the top article. I'm still assuming that in the top article, we are addressing non-technical folks, and in the subtopics, university students familiar with computer networks. --David MacQuigg 19:43, 19 August 2009 (UTC)
Get away from layering completely. The IETF deprecates it and most textbooks get the OSI nuances wrong -- trust me, I spent over six years of my life in OSI committees. You might want to look at my user page for some flaming on layering. I have found OSI architectural errors in Tanenbaum. Howard C. Berkowitz 03:24, 19 August 2009 (UTC)
I can see there are a lot of land mines in the field of network layering, so how about we think not of layers, but "perspectives". We are looking at email systems from two different perspectives, and applying the principle of Separation of Concerns to decide how to do the split. Take a look at the intro, which has been re-written in these more general terms, and let me know if I should continue along this line with the rest of the article. --David MacQuigg 19:43, 19 August 2009 (UTC)
Again more from a policy standpoint, we really should (preferably) have our own Separation of Concerns articles, or at least import and reformat it. An external link to Wikipedia won't fly.
I'm still not getting a clear sense of direction here. Before I do a lot of work importing and reformatting Wikipedia articles, etc., I need to know if the new intro is OK. I haven't made any fundamental changes, just avoided talking about network layers. Separation of Concerns is a principle broad enough that we should be able to apply it without hitting any land mines. --David MacQuigg 23:12, 19 August 2009 (UTC)
I am not a fan of inventing terms for non-technical folk, and I think you are talking about inventing rather than things commonly accepted. Things that are used in introductory CS textbooks, by definition, aren't going to non-technical folk. Frankly, I'd rather see terminology taken from an Outlook or Thunderbird manual than a CS textbook. "Domain", to me, is a fairly well accepted and straightforward term, with "borders" and "gateways" between them. Howard C. Berkowitz 20:45, 19 August 2009 (UTC)
I don't think I am "inventing" any new terms, but please be more specific if you see a problem. The biggest problem that I have encountered in discussions with specialists is the notion that once a word like agent is used in an acronym like MTA, you can no longer use its plain-English meaning. I disagree with that, particularly if we are addressing non-specialists. "Actor" is another word commonly used in object-oriented programming, but our students have no difficulty with the dual meaning.
Using "domain" instead of "agent" has the problem of confusion with domain names, which often have no correlation with the administrative entities in Figure 1. I just can't think of a better word than agent in an article for non-specialists. If each agent would use one and only one domain name, we wouldn't have this confusion. But that isn't the way real agents do business, and this confusion actually is a problem that raises its ugly head for non-experts looking at email headers. ( This email is from Yahoo. Why in hell does it say Akamai!!?)
Please imagine yourself as a student who wants to learn how email systems work. Is this top article clear in what it covers? What topics would you like to see added or removed, assuming we have only one hour for a lecture, and 3 hours for study of this topic? --David MacQuigg 23:12, 19 August 2009 (UTC)
  1. See Bibliography [PnD07] and [Stevens04].