Authors: Aaron Gustafson
This document is intended as a starting point for engaging the community and standards bodies in developing collaborative solutions fit for standardization. As the solutions to problems described in this document progress along the standards-track, we will retain this document as an archive and use this section to keep the community up-to-date with the most current standards venue and content location of future work and discussions.
URLs have been a fantastic way to enable people to reference content on the web. With the addition of anchor points within Web documents (via name and id), authors were empowered to link to specific sections of a document or other documents across the web. Unfortunately, however, that utility requires action on the part of every web page author to include those anchor points. Numerous systems have cropped up to enable automated id-ing of headings (e.g., Kramdown’s auto_id_prefix), but they are seldom used. They are also limited to headings only and provide no direct access to flow content.
Enabling users to link directly to arbitrary text content would overcome these current limitations and greatly increase the richness of a link in the context of social sharing, reporting, and legal/scientific/scholarly writing.
In terms of W3C recommendations, we have a handful of extant URL fragment types:
a) elements with a name attribute (obsolete in HTML5, but still supported for backwards compatibility) and a unique id attribute value on flow content. Both can be referenced as part of a URL via the fragment identifier.t for time, xywh for spatial data, and track to specify a specific audio or video track. (An id component has also been proposed, but is still being discussed due to implementation challenges.) Example: example.com/media#xywh=160,120,320,24&t=10,20&track=audio&track=videoid of a child element in order to display only that element. SVG also supports an svgView(…) functional keyword that enables a link to alter the SVG’s display using one or more semicolon-separated named functions: viewBox(...), preserveAspectRatio(...), transform(...), zoomAndPan(...), and viewTarget(...). These functions correspond to parameter values for attributes on the root svg element. Example: MyDrawing.svg#svgView(viewBox(0,200,1000,1000))This is not a new challenge and solutions have been proposed and implemented in other media. It’s worth considering their approaches and their relative applicability in this situation:
epubcfi(/6/4[chap01ref]!/4[body01]/10[para05]/2/1:3[yyy])) and is heavily dependent on document structure. Document structure within an ePub is unlikely to change often, making this a far more robust strategy than it would be on the web (where redesigns and dynamic content are common).
Taking a page from the direction Media Fragments have gone, we’re recommending a name-value component in the fragment. This would enable User Agents to disambiguate anchor references from arbitrary text searches. It should avoid collisions with existing id values in almost all cases.
https://domain.tld/page.html#search=arbitrary%20text%20search
Note: The named keyword could be anything not currently reserved as part of another specification. We recommended sticking to something that is human readable and brief or an abbreviation of a human readable (and familiar) term. Other options include "s" (a common shorthand for "search"), "query", and "q". The question mark (?) could also be used as it is allowed in URLs but should not cause collisions with arguments passed on the GET string as it would exist after the fragment signifying hash (#).
In order to enable linking to arbitrary text, some degree of text search is required. While a phrase or sentence contained within a single element would present few challenges from a "find and highlight" scenario, the following situations should be considered:
<a href="#search=arbitrary%20text">arbitrary text</a>)Every User Agent supports some form of in-page search. These tools are familiar to users and have already addressed many of the challenges presented above. It makes sense for User Agents to tie this sort of feature directly in with their existing search implementation. Therefore, we are recommending that the existence of an arbitrary text search in a URL trigger a UA’s "find in page" function after the page is fully rendered. The arbitrary text fragment should be supplied as the search value.
When it comes to creating links adhering to this API, authors can, of course, create them by hand, but it is much more likely that users will generate these links within their User Agent of choice. As such, User Agents will need to provide one or more mechanisms to generate these URLs when a user has selected text in a page. For example:
CSS currently supports the :target pseudo-class selector, which may seem applicable in this scenario as well, but isn’t actually appropriate. Given that an arbitrary text string may show up more than one time and the user may shift focus between those instances, we are proposing a new CSS pseudo-class selector: :result.
/* applies to all arbitrary text search results */
:result {
background: #cec;
}
/* applies to the currently focused arbitrary text search results */
:result:focus {
background: #ffa;
}
# (e.g., a hashtag)?%20 for space)?Exploration within the IndieWeb community (which refers to these as "fragmentions") and elsewhere have demonstrated what is possible with client-side implementations of arbitrary text fragments using JavaScript. Some argue that this sort of functionality should exist at the discretion of each site owner, but a strong counterargument is that universal applicability of this functionality would benefit all users and seems in alignment with the Priority of Constituencies.
It’s worth pointing out that there are pros and cons to both approaches:
| Pros | Cons | |
|---|---|---|
| Document | Author controls the experience | Author must define the experience for users to benefit |
| Client |
|
Authors may not be able to control the experience |
Kevin Marks and others within the IndieWeb community have explored a variety of approaches for denoting arbitrary text searches in a URL.
#arbitrary%20text%20searchThe IndieWeb is in favor of re-purposing the existing fragment identifier (a single hash character) with general agreement that spaces (which will need to be encoded) will provide enough differentiation from existing fragment identifiers. To use this approach, any raw hash (#) or whitespace would need to be encoded. It’s worth noting that this approach could lead to some potential confusion for the User Agent when dealing with a link that references both an existing anchor and arbitrary text.
##arbitrary%20text%20searchAn earlier proposal from the IndieWeb community involved using a double hash prefix. The URL spec does’t allow for hash characters in a fragment, so these links will fail strict validation. HTML now also allows for the hash character to be used in an id attribute, leading to some potential confusion for the User Agent when dealing with a link that references both an existing anchor and arbitrary text.
There has been some discussion over whether white space should be represented as an encoded space (%20) or a plus (+) in the arbitrary fragment. The latter would require additional encoding of any literal plus characters in the arbitrary text.