contents of this page
Concepts¶
Basics¶
API Keys¶
Access to the Parse.ly API is provided to you when you receive an API Key from our developers. An API Key is a unique string your application uses to identify itself to Parse.ly’s servers. You can contact us for an API Key.
Authentication¶
To comply with the emerging OAuth web standard, all requests to the Parse.ly API must use OAuth for authentication. However, because the Parse.ly API does not require any user to authorize access to personal resources, we use a form of OAuth called 2-legged OAuth, aka “Signed Fetch”. More information can be found in our OAuth Background document.
Implementation
OAuth’s primary reason for existence is to provide an open standard for cross-service API access delegation. For example, if a user needs to give Facebook.com authorization to download information about that user’s favorite music from Last.fm, Facebook.com would request authorization tokens from Last.fm via the OAuth protocol. The user would then authorize Facebook’s access to Last.fm via an interactive web-based screen. This prevents the user from having to share his Last.fm logon credentials with Facebook.com.
This three-step procedure is also known as 3-legged OAuth, and is often lovingly referred to as “the OAuth dance”. The 2-legged version used by Parse.ly skips the interactive step. The differences are well-described in the presentation, Wherefore Art Thou, OAuth?
Also, OAuth should not be confused with OpenID, a complementary but very different standard. Whereas OpenID provides a “single sign-on” mechanism to users for numerous web-based services, OAuth is a protocol that is used to authorize access to data associated with user accounts.
HTTP, REST and JSON¶
The Parse.ly API is exposed as a set of HTTP resources, following the RESTful pattern of API design (see REST). Each of these resources have various operations supported, which are mapped to URLs and HTTP methods. All request and response formats use JavaScript Object Notation, aka JSON (see JSON). If you wish not to use the direct HTTP API, we also provide language bindings, described later on in this documentation.
Note
For simplicity, Parse.ly only provides JSON as an output format for its API. If other formats – like YAML, XML, Protocol Buffers, or Thrift – would make more sense for your application, let us know.
Account Management¶
The purpose of integrating the Parse.ly API into your site is to offer content recommendations to your users, based on the interactions they have with your content. Parse.ly’s servers stores and analyzes a cached copy of content from your site. They also store information about your site visitors that allows the system to personalize that content to your users’ tastes and interests.
Therefore, we offer a abstractions both for managing the sources of content that Parse.ly will cache/analyze and the profiles of your visitors/users that will power our recommendations.
User Profiles¶
A UserProfile resource is associated with a single unique user. These are generally used following one or both of the following patterns:
- Identifiable Profile: If your site already features a user account system, you can integrate UserProfile with identifiable user accounts in your site. For example, if you have a user named “John Smith” with the user ID “john.smith”, you can link that account with a corresponding Parse.ly UserProfile, and ensure his data follows him whenever he is logged in to your site.
- Anonymous Profile: If you want to offer Parse.ly recommendations to anonymous visitors, you need to utilize our JavaScript Tracker, which identifies anonymous visitors in a way similar to systems like Google Analytics. Our tracker will automatically create UserProfile resources that correspond to your unique visitors, and use metrics from their browsing experience to power content recommendations, without requiring the user to log in at all.
Note: though the JavaScript Tracker is required for Anonymous Profiles, we also encourage you to use the tracker with Identifiable Profiles, since this will enhance the data Parse.ly collects about your users and allow it to make better recommendations. Only advanced API users tend to eschew the tracker altogether; for example, when a UserProfile can be populated with reading history/behavioral data from an existing source.
UserProfile resources can also include personal information used to enhance or otherwise link that profile with other information sources. We refer to this as Profile Metadata. Howevever, the main purpose of a UserProfile is to:
- Capture any explicit interests (e.g., topics, people, events) the user cares about
- Capture data about the user’s interactions with content on your site
As a Parse.ly API user, you own this data. But, you derive value from it primarily by using our Query API, described later in this document. That is the API that allows you to recommend articles to specific users based on their reading behavior and tastes.
Interests¶
Interest resources specify topics that resonate with a user. These can be programmatically generated to populate the UserProfile with information about a user’s interests. Users with largely overlapping Interest sets are given similar recommendations.
Interest resources also include a rank, that allows you to specify which interests are more important or less important to a user. These influence results ordering in the Query API.
Data Sources¶
In order for Parse.ly to recommend content, it needs a DataSource resource to tell it from where to find that content. As a user of our API, you can configure FeedURL resources as part of your data sources, and Parse.ly will automatically monitor that feed (expected in RSS/Atom format). Updates are delivered to Parse.ly in near-real-time.
Implementation
How does Parse.ly do near-real-time delivery of content from your RSS/Atom feeds? Parse.ly embraces an emerging standard known as the PubSubHubbub protocol. We integrate with our PubSubHubbub provider – currently, Superfeedr – via the XMPP / XEP-0060 standard. In fact, one of our engineers has open sourced our Superfeedr XMPP wrapper for Python; check out the sfpy project.
You can also organize your Data Sources by assigning tags to them, which can be used in the Query API for filtering data returned by Parse.ly.
Channels¶
Parse.ly’s API offers a few DataSource instances configured out of the box; these go by the name Channel. Two Channel resources are available by default:
- web_wide_news: access to over 4000 mainstream news sources
- web_wide_blogs: access to over 120K blog sources
If your primary purpose in using our API is to increase page views of your own content, then use of our pre-configured Channel resources may not be of much use to you. However, if you are building a system that pulls together content from around the web to recommend to your users – for example, if you are providing a page for career advice and want to recommend articles to the user from around the web based on their career interests – then use of these channels is essential and saves you the work of setting up your own RSS/Atom feeds to monitor for high-quality content.
Note
We have been experimenting with offering some different channels to our API users aside from news and blog content. Here are some ideas we have tossed around.
- status_updates: pull status updates from online social networks
- business_reviews: reviews of businesses (e.g. restaurant, shop), written by users and editors on top online review sites
- videos: videos and captions/summaries taken from top user-uploaded video sites
- images: images and captions/summaries taken from online image providers
We have also started organizing feeds from our web_wide_news and web_wide_blogs channels into categories. Here are some of the categories we’ve thought about:
- cat_technology: technology sources
- cat_business: business and financial news sources
- cat_politics: political coverage and commentary
- cat_entertainment: arts and entertainment
- cat_celebrity: celebrity news and gossip
Feel free to contact us if any of these channels (or others that you think of) would be of particular use to you.
Queries¶
Once you have your resources set up for DataSource, UserProfile, and Interest, you are ready to get to the core value of our API: retrieving content recommendations via queries.
Queries are executed against a single UserProfile, and return personalized content recommendations based on that profile’s interests and sources. Our recommendations are powered by a proprietary combination of naive Bayesian inference and collaborative filtering (see Algorithms and Technical Background). Results are scored based on how closely the article matches that user profile’s interests and reading behavior. The higher the score, the more likely that content will resonate with that user, and thus the more likely that user is to click on that article and enjoy reading it.
The result of a Query API call is a paginated list of Item resources, where each Item corresponds roughly to an article, with fields common to RSS/Atom feeds, like title and summary and link. An Item also includes fields that expose Parse.ly’s analysis results, like score_explanation.
You are free to display the results from our API any way you like, subject to our Terms of Use. We also provide a Whitelabel JavaScript Widget which can be used for standard integration use cases.
Our standard result set is sorted by score and recency, however, other sorting orders and search methods are available. For example, we make it possible to do full text searching and dated queries.
Reading Behavior¶
As users interact with content on your site, they are giving you valuable information about what articles, topics, and areas of interest resonate with them. Parse.ly allows you to tap into this valuable user data to connect your users with content they’ll love.
However, in order for our system to work properly, we need to be able to track what a user does with your content and analyze that content for clues about the user’s interests. We offer two forms of informing our system about your user’s reading behavior, known as explicit actions and implicit actions.
- Implicit: by installing the JavaScript Tracker, our system monitors what parts of your site the user is visiting, what links they are clicking on, and how long they are spending on each piece of content. We analyze this content to build up statistical text profiles for each user (see our algorithms). We use these metrics and data to power a model of the user’s interest based on their implicit behavior on your site. Without the user even realizing he is using a content recommendation system, his past behavior on your site is creating virtuous circles of valid recommendations / positive browsing experiences from the Parse.ly API. Our recommendations also downplay content that would not resonate with that user and avoid boring and unengaging browsing experiences.
- Explicit: if you use our Whitelabel JavaScript Widget in its advanced form, or if you want to build a specialized reading interface powered by Parse.ly recommendations, you will prefer our explicit reading behavior model. In this model, you actually make API calls to Parse.ly to tell it what a user has done with a piece of content on your site. The actions currently available are: Starred, Shared, Marked Read, Marked Unread, Archived, and Deleted. These actions act as a form of user training, but are meant to be seamlessly integrated into your site.
Our users often mix the implicit and explicit models for best results, but good results can often be achieved with the implicit model alone. We are constantly improving each to make it work even better.
Real-Time Updates¶
Parse.ly’s backend infrastructure is powered by message queues and “real-time” protocols like XMPP and PubSubHubbub. We therefore process data in “near-real-time” as that data flows into our system. An obvious use case for Parse.ly’s API is in real-time applications, where updates are delivered to your client as they are made available rather than you having to poll our service for new information.
However, at this time, our API does not expose a real-time interface. Every query to the Parse.ly API gives back a JSON document which represents a “point-in-time” snapshot of the latest data we have for that user.
We are exploring different avenues for implementing a real-time interface, for example by offering an XMPP API ourselves, or by offering a PubSubHubbub hub. If the real-time use case is particularly important to you, If you are interested in this option and would like to sponsor our development of a real-time content recommendations API, please contact us!