MSEdgeExplainers

Arbitrary Text Fragments

Authors: Aaron Gustafson

Status of this Document

This document is intended as a starting point for engaging the community and standards bodies in developing collaborative solutions fit for standardization. As the solutions to problems described in this document progress along the standards-track, we will retain this document as an archive and use this section to keep the community up-to-date with the most current standards venue and content location of future work and discussions.

Introduction

URLs have been a fantastic way to enable people to reference content on the web. With the addition of anchor points within Web documents (via name and id), authors were empowered to link to specific sections of a document or other documents across the web. Unfortunately, however, that utility requires action on the part of every web page author to include those anchor points. Numerous systems have cropped up to enable automated id-ing of headings (e.g., Kramdown’s auto_id_prefix), but they are seldom used. They are also limited to headings only and provide no direct access to flow content.

Enabling users to link directly to arbitrary text content would overcome these current limitations and greatly increase the richness of a link in the context of social sharing, reporting, and legal/scientific/scholarly writing.

Current State of Fragments on the Web

In terms of W3C recommendations, we have a handful of extant URL fragment types:

Other Approaches

This is not a new challenge and solutions have been proposed and implemented in other media. It’s worth considering their approaches and their relative applicability in this situation:

API Recommendation

Taking a page from the direction Media Fragments have gone, we’re recommending a name-value component in the fragment. This would enable User Agents to disambiguate anchor references from arbitrary text searches. It should avoid collisions with existing id values in almost all cases.

https://domain.tld/page.html#search=arbitrary%20text%20search

Note: The named keyword could be anything not currently reserved as part of another specification. We recommended sticking to something that is human readable and brief or an abbreviation of a human readable (and familiar) term. Other options include "s" (a common shorthand for "search"), "query", and "q". The question mark (?) could also be used as it is allowed in URLs but should not cause collisions with arguments passed on the GET string as it would exist after the fragment signifying hash (#).

Challenges

In order to enable linking to arbitrary text, some degree of text search is required. While a phrase or sentence contained within a single element would present few challenges from a "find and highlight" scenario, the following situations should be considered:

Consuming Arbitrary Text Fragments

Every User Agent supports some form of in-page search. These tools are familiar to users and have already addressed many of the challenges presented above. It makes sense for User Agents to tie this sort of feature directly in with their existing search implementation. Therefore, we are recommending that the existence of an arbitrary text search in a URL trigger a UA’s "find in page" function after the page is fully rendered. The arbitrary text fragment should be supplied as the search value.

Generating Arbitrary Text Fragments

When it comes to creating links adhering to this API, authors can, of course, create them by hand, but it is much more likely that users will generate these links within their User Agent of choice. As such, User Agents will need to provide one or more mechanisms to generate these URLs when a user has selected text in a page. For example:

Displaying Arbitrary Text Fragments

CSS currently supports the :target pseudo-class selector, which may seem applicable in this scenario as well, but isn’t actually appropriate. Given that an arbitrary text string may show up more than one time and the user may shift focus between those instances, we are proposing a new CSS pseudo-class selector: :result.

/* applies to all arbitrary text search results */
:result {
  background: #cec;
}

/* applies to the currently focused arbitrary text search results */
:result:focus {
  background: #ffa;
}

Open Questions/Challenges


Appendix A: Document or Client?

Exploration within the IndieWeb community (which refers to these as "fragmentions") and elsewhere have demonstrated what is possible with client-side implementations of arbitrary text fragments using JavaScript. Some argue that this sort of functionality should exist at the discretion of each site owner, but a strong counterargument is that universal applicability of this functionality would benefit all users and seems in alignment with the Priority of Constituencies.

It’s worth pointing out that there are pros and cons to both approaches:

Pros Cons
Document Author controls the experience Author must define the experience for users to benefit
Client
  • Users enjoy a shared experience across the Web
  • Every site gets the upgraded experience
  • Browsers already offer a robust in-page text search
Authors may not be able to control the experience

Appendix B: Other API Explorations

Kevin Marks and others within the IndieWeb community have explored a variety of approaches for denoting arbitrary text searches in a URL.

#arbitrary%20text%20search

The IndieWeb is in favor of re-purposing the existing fragment identifier (a single hash character) with general agreement that spaces (which will need to be encoded) will provide enough differentiation from existing fragment identifiers. To use this approach, any raw hash (#) or whitespace would need to be encoded. It’s worth noting that this approach could lead to some potential confusion for the User Agent when dealing with a link that references both an existing anchor and arbitrary text.

##arbitrary%20text%20search

An earlier proposal from the IndieWeb community involved using a double hash prefix. The URL spec does’t allow for hash characters in a fragment, so these links will fail strict validation. HTML now also allows for the hash character to be used in an id attribute, leading to some potential confusion for the User Agent when dealing with a link that references both an existing anchor and arbitrary text.

Additional considerations

There has been some discussion over whether white space should be represented as an encoded space (%20) or a plus (+) in the arbitrary fragment. The latter would require additional encoding of any literal plus characters in the arbitrary text.


Related issues | Open a new issue