RDFTM comments: Uschold from Steve Pepper on 2005-03-18 (public-swbp-wg@w3.org from March 2005)

From: Steve Pepper <pepper@ontopia.net>
Date: Fri, 18 Mar 2005 15:26:23 +0100
To: "SWBPD list" <public-swbp-wg@w3.org>
Message-ID: <FOEHKIENIPCJNPNFKGJNOEFKGGAB.pepper@ontopia.net>
This posting contains the editors' resolution of comments on the RDFTM
Survey posted by Mike Uschold:
http://lists.w3.org/Archives/Public/public-swbp-wg/2005Mar/0008.html

When discussing specific issues, please use the issue ID as part of
the Subject: line.


* RDFTM-MU001: Accepted with comments
|
| After reading it through, I realize there are some things that could be
| improved, in terms of the document structure, though most of the content
| is fine.  When publishing a survey, there is always a question of
| whether to arrange the content according to specific issues (as you do
| in section 4) or by published approaches or system (as you do in section
| 2).  You have done both, which has pros and cons. The major con of the
| current draft (IMHO)  is that the good analysis is so far removed from
| the initial requirements. Also, the major existing systems are not
| directly assessed against the major requirements (what you are calling
| 'issues') until the end of the document.

We agree that "doing both" is unsatisfactory. The solution is to merge
section 2.2 with section 4.3.


* RDFTM-MU002: Rejected
|
| After listing the issues in section 2, I thought they would form an
| excellent structuring device for the rest of the paper. The issues are
| in many ways, requirements, and thus can be regarded as highly specific
| criteria for assessing the suitability of each approach. You do this in
| section 4, but it would be better to be weaved through the discussion of
| the approaches. It would be good to have a big summary table with all
| the issues and approaches and a tick or some indicator of how well a
| given approach works for a given issue/requirement.

This idea is good in theory but would not work in practice. Most of
the proposals are somewhat immature and incomplete. A big summary
table would only tell us this and it would divert attention to the
details when it is the big picture that is actually most important in
this survey. A summary table would be an excellent idea for the next
deliverable, however.


* RDFTM-MU003: Accepted with comments
|
| Of course, it is much easier for me to say this, that it is for the
| authors to do a major re-org of the document.  Given that the purpose of
| this document is ultimately to lay the groundwork for coming up with a
| proposal for RDF/TM interoperability, it is not actually necessary to
| have specific detailed sections on each of the existing approaches.  It
| would suffice to have brief descriptions of each.  The details for each
| approach could be introduced in the analysis section which considers
| each major requirement, one by one (which is currently done in section
| 4).  Then the main document would consist of:
|         1.      Introduction, (as now).
|         2.      Requirements, Issues and Evaluation Criteria (keep as
| is, but add something about what end users require from translated
| representations (e.g. querying?, data translation? semantic
| integration?)
|         3.      Existing Translation Approaches (much shorter than
| current section)
|         4.      Analysis: an elaboration of section 4 which pulls in
| much of the details currently in section 3, but is always focused on
| particular requirements, issues and/or criteria.
|         5.      Conclusion - needs spruced up, it is too brief and says
| little. It might be good to summarize next steps here.

Some of these ideas have been incorporated, but without doing a major
re-org.


* RDFTM-MU004: Accepted
|
| The broad criteria of naturalness and completeness are also important
| and worth keeping.


* RDFTM-MU005: Accepted with comments
|
| It would be good to state the assumptions/expectations of the reader.
| What are they assumed to know? Not that many people will be very
| familiar with BOTH TMs and RDF. So pointers to simple tutorial material
| on each would be appropriate. In addition, it might be helpful to have a
| 1-2 page introduction to each in an appendix.

Assumptions are now clearly stated and pointers to tutorial material
provided for those who are not familiar with both paradigms.


* RDFTM-MU006: Accepted with comments
|
| Grammatical quibble: "Topic Maps is a model" has apparent number
| disagreement. Can fix by saying something like: "Topic Maps provide a
| model"  or "at the heart of TMs is a model". This issue arises in
| several places.

Topic Maps *is* a standard; it *is* a paradigm for (whatever). On the
other hand, topic maps *are* artifacts. There is a distinction.
Whenever we capitalize both words we use the singular; otherwise we
use the plural. (Takes a little getting used to, but it does become
natural after a while :-)

But you are right that it is imprecise to say that "Topic Maps is a
model".


* RDFTM-MU007: Accepted with comments
|
| 2.1 Translation Features
|
| Completeness:  By the definition given, a complete translation need not
| be reversible. One can translate from  source to target and lose no
| information one way, but lose information the other way. Perhaps you
| mean to say that complete means there is no loss either way?  That is
| only possible when the two formats are capable of representing the exact
| same set of concepts, i.e. have same expressive power. Yet RDF and TM
| both have some important differences. This means it is inherently
| impossible to have complete translations. There WILL be some loss in
| translation (or at least it seems that way to me).
|
| Fidelity: there are two different notions here, both are important, and
| you are only addressing one.
|         1.      Naturalness: which seems to be inversely related to the
| need for workarounds.
|         2.      Accuracy: this is the most common meaning of 'fidelity'
| in my experience
|
| Accuracy is an important criteria referring to the correctness of the
| translation - you seem to be ignoring this.
|
| These criteria are not the same. You can have a perfectly accurate
| translation that is not very natural. If you really just mean
| naturalness, then perhaps use that word, not 'fidelity' which suggests
| accuracy.

These definitions will be tightened up and we will use the term
"naturalness" instead of fidelity.


* RDFTM-MU008: Accepted with comments
|
| More importantly, what is the practical import of 'unnatural'
| translations. If they are intended for human consumption, then they will
| be much harder to read. If not, then what does it matter? If the
| information is correctly translated, and queries are correctly answered,
| who cares if the translation is 'unnatural'?  Might it have impact on
| query response times? What other consequences of being unnatural are
| there? By itself, it may not pose any real problems.

We agree and this was the very reason for avoiding the term
"naturalness" in the first place. However, as you point out,
"fidelity" is ambiguous. The new definitions will attempt to explain
the consequences of unnatural translations.


* RDFTM-MU009: Accepted with comments
|
| 2.2 Major Issues
|
| It seemed surprising to talk about issues first. I would have expected
| you to present first the requirements. Some may be easy to meet. The
| ones that are hard to meet are the 'issues'. Perhaps by 'issues' you
| really do mean requirements.  If so, perhaps that could be made
| explicit?
|
| To some extent, this is just a minor terminology point, you are calling
| 'requirements' 'issues'. But there is more. The requirements fall into 3
| categories.
| 1. general: naturalness, completeness
| 2. specific language translation capabilities
| 3. end user requirements: what will the translators be used for?
| querying? data translation? semantic integration?  You talk about this
| much later in the document, it needs to be brought forward into this
| section.

You are right that we should not be talking about issues first.
Section 2.2 is being merged with 4.3 to avoid this problem. The more
specific requirements that you suggest will be part of the next
deliverable but are considered to be too detailed for the requirements
(!) of this one.


* RDFTM-MU010: Accepted with comments
|
| 2.2.1 TM issues
| In what sense is identity a TM issue? It seems to be an issue for
| translating from TM to RDF. One can argue that this is an 'issue' for
| RDF because RDF cannot do these things. It is also an 'issue' for people
| interested in using RDF when they need to translate from TMs.
|
| Indeed, most of the "TM Issues" talk about are things that RDF cannot
| do, and vice versa. You might change the section headings to reflect
| this, since 'issue' is a bit ambiguous. e.g. what you call "TM/RDF
| Issues" might be called: "Issues in translating from TM/RDF to RDF/TM"
| Better still, call them requirements?

It is true that identity is a more general issue. As part of the
rewrite, issues will not be made specific to one paradigm or the
other, which should solve this problem.


* RDFTM-MU011: Accepted with comments
|
| I had difficulty understanding the essential nature of and difference
| between 'modeling the model' and 'mapping the model'.  Is the
| distinction one between syntax and semantics? You don't quite say that,
| but you do say the one is essentially a semantic approach.
|
| I would like to see this distinction explained better. It is used as a
| basis of comparison for all approaches, so it is important.  For me, the
| terms were unhelpful, I could not adequately relate the meaning to the
| words in the term.  It would be fine to think of other terms that worked
| better  -- no need to be tied to terms from old papers, if they are not
| helpful.  On the other hand, if this is just me, and if most readers are
| likely to find the terms helpful in understanding their meaning, then
| they are fine.
|
| Later on, you do use different terms.  I think the document would read
| better if you got the terminology straight in the beginning and used it
| consistently throughout.

The new version will move more quickly to the "semantic mapping vs.
object mapping terminology" to avoid this problem. Please review the
next version and let us know if it is still difficult to understand
the distinction.


* RDFTM-MU012: Accepted with comments
|
| 3.3 The O...y Proposal
|
| Excellent observation about the impact of syntactic presentation,
| comparing to "3rd RDF basic abbreviated form"
|
| I'm amazed that you say the translations are more or less complete. How
| can that be, when there are so many things that RDF has that TMs lack
| and vice versa (from section 2.2) how are containers handled? If you
| throw them away when translating to TM, you can't get them back, so this
| is incomplete. Ditto for the many other 'issues' in section 2.2.

It is in the nature of object mappings such as that proposed by
Ogievetsky that they can fairly easily be quite complete because they
hang on to everything as objects. Each paradigm has enough machinery
to do this. Although specific semantic constructs (like containers)
may not be the same across paradigms, they are represented using lower
level building blocks (in this case, triples that exhibit a certain
pattern) and can thus be captured at the object level.


* RDFTM-MU013: Accepted
|
| I expected that in the completeness assessment for each approach, there
| would be much discussion of these issues; indeed that would be a major
| part of the discussion on the adequacy of the different approaches.
| Indeed, one good way to structure the whole document would be to start
| with the requirements for a good translation, note which are easy, and
| which are challenging, then for each challenging one, note some possible
| ways to approach them. You kind of do this in section 4. I think it
| would be better to move much of the content of section 4 to the front of
| the document.
|
| At some point, it will be necessary to analyze why there is such an
| explosion of new statements (1 to 26 after a round-trip translation). Is
| there reason to hope that new approaches could do better, and still
| achieve semantically accurate translation?
|
| It would be helpful to give an explanation with an example illustrating
| why the extra statements are added.


* RDFTM-MU014: Accepted with comments
|
| RDF2TM mapping. The extra information seems a bit over the top, one has
| to do a lot of manual work annotating the RDF specifically for the
| purpose of mapping it to TM. This is unsatisfactory. In other words, the
| translation is human-assisted which may not be practical in many cases.
| This point is hinted at, but not stressed enough, IMHO.

Good point. This is an issue that was addressed at the Boston F2F and
the consensus seemed to be that while a solution might *allow* extra
information, it should not *require* it.


* RDFTM-MU015: Accepted with comments
|
| 4. Analysis
|
| Overall, good section, good analysis.
|
| My only problem is that this stuff seems like it should have come much
| sooner, as per my prior comment.

Or at least be in one replace, rather than both sooner and later? We
have opted for collecting this after the descriptions of the existing
proposals, as noted above. As we see it, those descriptions should
inform the analysis and clarify the issues, rather than the issues
being used to inform the descriptions.


* RDFTM-MU016: Accepted with comments
|
| What is the import of the fact that none of the approaches discuss how
| to represent RDF containers and collections, language tags, XML and
| typed literals?  Does this mean they are really hard? Does it matter?

It probably means that they simply weren't seen as being the most
important thing to tackle first (and it could also reflect the fact
that most of the work was done by people more familiar with Topic Maps
than RDF). We don't believe that these issues are hard; we do believe
that they matter.


--
Steve Pepper <pepper@ontopia.net>
Chief Strategy Officer, Ontopia
Convenor, ISO/IEC JTC 1/SC 34/WG 3
Editor, XTM (XML Topic Maps 1.0)
Received on Friday, 18 March 2005 14:27:00 UTC