By Joy Chao in Insight — Jun 21, 2024

Capture Your Memories in the Disappearing Internet

Markers, Cameras, and iMac - The Design Game of Webpage Annotation by Cubox.

One spring day in 2024, we finally decided to heed the strong calls from our user community and introduce the web page annotation feature to the Cubox browser extension.

In the Cubox apps, annotation functions like multi-color highlighting, annotation management, review, and the floating card function on web pages... It seemed all we needed to do was transfer this mature interaction to our extension.

But is that really all that annotation is capable of?

Annotations reflect the relationship between people's thoughts and the reading of text. In the medieval period, handwritten manuscripts already had marginalia. After the spread of printing, people began to mark directly on the pages of books. The original pencils and ink pens easily damaged the pages and caused contamination until the 1950s when highlighters were invented. These allowed for bright colors to be left underneath or even on top of the text without seeping through to the back of the paper. Since then, highlighters have been the ideal tool for book annotation and are still in use today. Subsequently, various tools in digital media have also gradually supported similar annotation modes: underlining, note-taking. The interaction of highlighters from paper to screen is intuitive enough to be widely applicable, leaving little to complain about.

However, looking around, how many people have stopped reading paper books, and in the more distant future, will those who have never experienced paper book annotation still relate to highlighters, or even know what they are?

These thoughts ignite our curiosity: Is it possible to think outside the box and design a new annotation system specifically for digital media?

Let's Forget About the Marker

Interactive elements can affect the reading experience. For some readers, the process of highlighting with a marker pen represents a path of contemplation and review. However, highlighting on a digital interface is somewhat different from paper: we don't need to strike through each character, simply drawing from the starting point to the endpoint will select all the text in between.

However, this subtle difference does not change a fact: highlighting requires precise aiming and cannot be too relaxed. It's like the moment before releasing an arrow in archery, one must hold their breath and concentrate to ensure a precise hit afterwards. Between dense lines of text, using an ultra-fine line to navigate and pinpoint is a tense and serious task.

To redesign the highlighting feature, let's be bold and forget about the marker pen for a moment.

If we consider saving or highlighting as "making a mark" on information, we find that there are many models worth learning from in the real world: A bookmark in a book serves as a marker for a whole page's location; taking a picture or scanning a page marks the entire page in another medium. These methods don't require character-level operations, thus, they are more efficient than using a marker pen.

Even this book reading light caught my eye. Initially, I thought it was a smart scanner that only needed to be placed at the corresponding position to work, which made me wonder if it was possible to make it smaller in size, to hold it and mark the text at the current location with just a button press, and have the mark saved to the cloud simultaneously. (Even though it turned out to be just a light, I still believe there are many possibilities with this glass panel—it acts as a bridge between digital and paper.)

In summary, why isn't there a "marker pen" that can automatically locate the sentence we want to highlight, requiring only a single click? Our goal then becomes more specific: find an object easier to operate than characters, to achieve one-time marking without the need to go over each word.

Naturally, we expanded our scope to paragraphs and noticed the inherent logic in the HTML structure of web pages: the same elements within a single type of area often represent a viewpoint or theme. Using them as the marking object would save a lot of trouble. In fact, many writing applications are adopting the block mode, simplifying operations by using paragraphs as objects. Even instant messaging apps weaken the use of line breaks, prioritizing the logic of sending with a return, essentially binding the content’s paragraph structure to its meaning itself. Each paragraph, each message, becomes an independent, atomized piece of data, making all related interactions much simpler.

After a long detour, it turns out that the answer isn't as "fresh" as we imagined, lacking the excitement of discovering a new continent from the unknown. But we are also nearing our destination.

Capture Like Taking Photos

What could be more intuitive and faster than raising a camera to capture the content within a book? Tear out the page and stuff it into a folder? Yes, that’s exactly what we aim to do. Directly send paragraph contents to Cubox without worrying about the original content being “torn” or destroyed, meaning disappearing from the internet.

The iPhone camera’s text capture feature is a great demonstration. Utilizing smart prompts and the aid of a finger’s selection, when a user presses the button, they can feel the entire process of content being “torn out”. Here, the boundary between reality and the virtual experiences a slight blur.

For reading, the mouse cursor acts as the camera lens, pointing at the information needed. It's an element that is worth making good use of, but also requires careful restraint because the mouse cursor carries many other functions when browsing web pages, and people do not want to be constantly reminded to "mark some content!" when reading. Outside the cursor, we don't want any other interference from the Cubox interface, everything must be hidden, subtly and silently effective.

Ultimately, we made significant simplifications based on the concept of system screenshot shortcuts. On the browser, we only need one shortcut key to open the "lens": Hold the shortcut key option (or alt) → Open camera lens → Find the target content → Click the shutter to capture.

Scan Like Panorama Mode

A paragraph may be a single topic, but a topic also might be spread across three, ten, or even more paragraphs. If a reader had to perform ten such capture actions as described earlier, the task of shooting would become tedious and painful, not to mention the captured content would still need to be manually merged and organized later.

When a camera’s viewfinder can’t encompass the necessary people or scenery, we use panorama mode to perform a scan-like shoot. The capture of annotations also needs a panorama mode.

Our first solution was to introduce a ‘recording’ mode, which allows for the scanning of multiple pieces of content. For example, once ‘capture’ mode is activated, the mouse stays in capture mode, allowing a scan from point A to B before confirming the exit from capture mode. The issue is, we would need to design another button or shortcut for this, adding another layer to the user’s learning curve, which is less than ideal.

How can we expand on the existing foundation directly? We pondered how the phone’s camera button works—click to take a photo, long press to record a video. This is a very subtle distinction, we don’t need to foresee everything. With the same shortcut key and mouse operation, we can maintain continuous recording. Next, no additional design is necessary, simply: hold down the mouse → glide over additional paragraphs → all passed paragraphs are selected and treated as one group of content.

We’ve attempted to finalize the merging of the same elements upon release to indicate to users that, yes, this is the entire area you want. This is scan mode, not high-speed burst shooting.

This design also addresses a long-standing issue: when should multiple annotations merge, and when should they not? With this design, even if the content is adjacent, users can choose between two annotation methods: annotate as separate entries or as one. As long as the intention differs, the interaction will too, resulting in different saved data, eliminating the need for additional action buttons or prompt pop-ups.

Although the capture interaction cannot be operated with one hand, one-handed operation is not common in desktop-based learning reading. We have also retained the traditional highlighting functionality as a supplement.

Properly Save Content

Once the capture is completed, the subsequent actions include highlighting, note-taking, clipping, and the AI that will be added later. How can we accommodate all?

We want both the mouse and keyboard to continue to facilitate subsequent actions with the following solution: a menu appears near the captured content, supporting direct operations with both mouse and keyboard. To ensure the most basic operations are always within reach, we placed highlighting and clipping at the very top, with support for direct action via cursor or keyboard enter key.

Annotations and clippings have similar functions in reality. If a user is accustomed to using annotations, their frequency of using clippings will significantly decrease, as Cubox tries to save the complete page information regardless of the method. So, the question arises: which action should be designed above, which below, and should we provide an option to let users configure this menu themselves? Considering the exclusivity of these two functions to some extent, we’ve only added one small feature — remembering the last used action. For the same user, simply repeat the ’capture - enter’ action, and your most frequently used function is ready at your disposal.

At this point, the entire process from capture to completion is accomplished: Capture - Select Action - Execute Action.

While progressively adding features, we have not increased the complexity of interactions or the learning curve, keeping everything natural and intuitive.

Making Visuals and Interactions Delicious

The previous version of the expansion interface theme was dark, and we had hoped it would stand out against most pages, but we later realized that with so many web pages featuring dark mode, it was hard to truly achieve this. In the new version, we’ve rethought and introduced a new visual concept of “multi-layer transparency.” By layering two levels of frosted glass effects, we’ve achieved a balance of isolation and harmony with surrounding content, as well as a sense of spatial dimension.

Have you noticed our new floating icon? There’s a little Easter egg here – try hovering your mouse over it and see what happens!

Perhaps the classic iMac G4 has long inspired us in the river of time. Its surrounding transparent material is so vibrant, separating and yet corresponding with the world around it, that it’s simply irresistible. (Experience the charm of a design from 22 years ago under a 4K lens: Apple iMac G4: Retro Review）

Beyond visuals, sound is another way to make experiences delightful. I have a fondness for the mechanical texture and the reliability it conveys, like the sound of a camera shutter. Since phone cameras emulate such mechanical sounds, why can’t the Cubox extension? Therefore, for the first time, we’ve experimented with sound effects on Cubox, with toggle options available (and there’s still much optimization to do). With a combination of elastic animations, visual, and sound effects, we’re trying to give interactions a real “touch” feeling.

Timeless Classics, Everlasting Memories

We almost missed the most crucial feature - “One-click Bookmarking” remains a well-known function among our veteran users. Under the perspective of capture, one-click bookmarking can also be seen as a “capture” of the complete page. Thus, we implemented the same shortcut logic for it, which means: holding the option key to enter capture mode. However, if you don’t select any content in this mode and directly press ‘s’ (Save), it will bookmark (capture) the entire page. In short, the shortcut for one-click bookmarking is option+s, it’s easy to remember as it’s a continuation of the capture shortcut.

However, that’s not enough. In many scenarios, we only want to use the mouse and avoid touching the keyboard. Due to the necessity to accommodate multiple functions, the new version of the extension can no longer carry the one-click bookmark function on the app icon itself. Hence, we moved it to a new Cubo floating button, clicking it allows one-click bookmarking, and hovering over it will reveal the operation menu.

Yet the floating button wasn’t created solely for one-click bookmarking. Besides bookmarking, it has other significant purposes: it increases the visibility of operation guidance, enables the Cubox extension to blend seamlessly into the webpage itself for more convenient interactions, prompts you swiftly to view AI interpretations that have been automatically generated, and allows you to position the function in areas other than a crowded toolbar (you can press and move it up or down).

The second classic feature we must mention is the highlighter interaction mentioned earlier. In fact, after a thorough experience with the internal test version, we not only retained the highlighter interaction but also supported all the functions mentioned above with it. In scenarios where there’s a need to mark and bookmark beyond just paragraphs, it’s still indispensable. The interaction method is similar to previous versions - simply select text with cursor to use it.

Here we provide a quick summary of Cubox’s content operation features.

After selecting content with “Smart Capture” or “Cursor Selection”, you can:

Highlight a section or multiple sections, writing inspirational notes
Clip this content into Cubox, where all clippings from the page are automatically combined into one page
Save one or more links from the selected content into Cubox

Following this:

The relevant webpage will automatically begin article parsing and AI Insight in the Cubox backend (this feature needs to be enabled in the mobile apps)
Whenever any content is highlighted, Cubox will automatically save the entire page and parse the article to restore your highlights in real-time
With both highlighting and clipping, Cubox will automatically attempt to save a full-page snapshot as an important backup for future reference
Data from highlighting or clipping syncs automatically across Cubox’s different platforms, allowing you to continue reading, reviewing, and exporting at any time

A key term here is “automatic”. You don’t have to do anything or receive any prompts and disturbances, once an operation is completed, all irrelevant elements will disappear from your interface.

Cubox quietly aids every knowledge worker in resisting the disappearing of the most important parts of memory and thought by the internet. You don’t need to keep saving content, transferring territory to copy and paste, or struggle to recall where that sentence you once read is located. All you might need to do is “mark it” when inspiration strikes, and leave the rest to Cubox.

Final Thoughts

The fun in the design process does not ensure the excellence of the design itself. Ultimately, it needs to withstand the test in the hands of every user.

Currently, the new version of the Cubox extension is online in the Chrome, Safari, and Firefox application stores, with an update for Edge coming soon. Meanwhile, we are planning improvements including but not limited to:

There is a shortcut key conflict with system text editing, which we are optimizing with an update coming soon;
It is not possible to directly edit collected cards, we will optimize this and provide a prominent entrance for editing, archiving, deletion… update coming soon;
The one-click collection function of the hover button is too concealed, we will add guiding instructions, update coming soon;
Ongoing optimization of annotation creation, restoration success rate, and stability;
Improvement in the display effect of highlights on pages with different colors, especially dark web pages;
Optimization of sound effects logic to reduce disturbance or offer options;
AI Insight will support real-time, streamlined output to allow viewing page summaries and key questions with a single click while browsing;
Direct positioning and viewing of welcome phantom highlights generated by AI Insight in the original webpage;
Interaction in the browser extension is different from that in the main App, we need more time to consider how to enhance cross-platform consistency.

Besides the changes in the new capture interaction, the original collection menu and new hover button also have many changes with diverse underlying thoughts. Please do experience them and provide feedback.

The real design process isn’t as orderly as described here. At times it resembles a game where inspiration isn’t born from A to B but often blooms like fireworks in the mind when a goal is proposed, encompassing all past seen, touched, and experienced inspirations. The process of design is capturing these sparks. This writing is like explaining each particle in the fireworks, and why it takes its path, any implied significance is doomed to be futile. “Expressing design with language is another kind of design,” akin to the significance of “review” with respect to “annotation”. The writing process is our journey of “redesigning from a different angle”.

During this process, we also imagined more intuitive and efficient interactions, like using gestures or eye movements on devices like Vision Pro to capture. We certainly look forward to the arrival of more efficient days, yet often question ourselves: for learners, is it truly better to have higher information-processing efficiency? Are we pursuing faster task completion, or enhancing our minds and bodies to engage the world better? If the answer is always “faster”, then why do we need annotations? Why not just ask AI when needed? While many AI applications strive for this ideal, we are skeptical.

At least from Cubox’s standpoint, in the ancient realm of reading, one should treat learning as a process and the process as the goal itself. It’s not about eagerly seeking new technologies, looking for a “second brain” to replace ours. What difference does it make when the “first brain” is idle every day? What we need is to read, review, think, create, recognize, and understand the world. Reading is a bridge to the pinnacle experience of our physical beings, irreplaceable, especially after the rise of AI. We must remember this on the path to efficiency.

Lastly, thank you to the Cubox design team and all developers for their hard work and exceptional contributions, to all Cubox users for their valuable feedback and consistent support over the years. May the new Cubox extension help you capture more daily inspirations, enabling you to enjoy thinking while reading the world.