Package 'newsanchor'

Title: Client for the News API
Description: Interface to gather news from the 'News API', based on a multilevel query <https://newsapi.org/>. A personal API key is required.
Authors: Frie Preu [aut, pro], Yannik Buhl [aut, cre], Lars Schulze [aut], Jan Dix [aut, pro]
Maintainer: Yannik Buhl <[email protected]>
License: MIT + file LICENSE
Version: 0.1.1
Built: 2025-02-06 04:42:20 UTC
Source: https://github.com/correlaid/newsanchor

Help Index


Builds query URL for newsapi.org.

Description

build_newsanchor_url adds a list of query arguments to a given News API endpoint.

Usage

build_newsanchor_url(url, query_args)

Arguments

url

NEWS API endpoint.

query_args

named list of parameters that are needed to query the endpoint. Check the News API documentation to see which endpoint requires which parameters.

Value

httr URL.


Concatenate character vector to comma-separated string.

Description

collapse_to_comma_separated is a helper function that concatenates a character vector to a comma-separated string. If the input vector has only one element, the element will be returned unchanged.

Usage

collapse_to_comma_separated(v)

Arguments

v

character vector.

Value

string with elements of v separated by comma.


Extracts data frame with News API articles from response object.

Description

extract_newsanchor_articles extracts a data frame containing the News API articles that matched the request to News API everything or headlines endpoint.

Usage

extract_newsanchor_articles(metadata, content_parsed)

Arguments

metadata

data frame containing meta data related to the request, see extract_newsanchor_metadata.

content_parsed

parsed content of a response to News API query

Value

data frame containing articles.


Extracts metadata.

Description

extract_newsanchor_metadata extracts meta data from the response object and the parsed content.

Usage

extract_newsanchor_metadata(
  response,
  content_parsed,
  page = NULL,
  page_size = NULL
)

Arguments

response

httr response object

content_parsed

parsed content of a response to News API query

page

Specifies the page number of your results that was returned. Defaults to NULL.

page_size

The number of articles per page that were returned. Defaults to NULL.

Value

data frame containing meta data related to the query.


Extracts data frame with News API sources from response object.

Description

extract_newsanchor_sources extracts a data frame containing the News API sources that matched the request to News API sources endpoint.

Usage

extract_newsanchor_sources(metadata, content_parsed)

Arguments

metadata

data frame containing meta data related to the request, see extract_newsanchor_metadata.

content_parsed

parsed content of a response to News API query

Value

data frame containing sources.


Get resources of newsapi.org

Description

get_everything returns articles from large and small news sources and blogs. This includes news as well as other regular articles. You can search for multiple sources, different language, or use your own keywords. Articles can be sorted by the earliest date publishedAt, relevancy, or popularity. To automatically download all results, use get_everything_all().

Please check that the api_key is available. You can provide an explicit definition of the key or use set_api_key().

Valid languages for language are provided in the dataset terms_language.

Usage

get_everything(
  query = NULL,
  query_in_title = NULL,
  sources = NULL,
  domains = NULL,
  exclude_domains = NULL,
  from = NULL,
  to = NULL,
  language = NULL,
  sort_by = "publishedAt",
  page = 1,
  page_size = 100,
  api_key = Sys.getenv("NEWS_API_KEY")
)

Arguments

query

Character string that contains the searchterm for the API's data base. API supports advanced search parameters, see 'details'. Either query or query_in_title must be specified.

query_in_title

Character string that does the same as above _within the headline only_. API supports advanced search parameters, see 'details'. Either query or query_in_title must be specified.

sources

Character vector with with IDs of the news outlets you want to focus on (e.g., c("usa-today", "spiegel-online")).

domains

Character vector with domains that you want to restrict your search to (e.g. c("bbc.com", "nytimes.com")).

exclude_domains

Similar usage as with 'domains'. Will exclude these domains from your search.

from

Character string with start date of your search. Needs to conform to one of the following lubridate order strings: "ymdHMs, ymdHMsz, ymd". See help for lubridate::parse_date_time. If from is not specified, NewsAPI defaults to the oldest available date (depends on your paid/unpaid plan from newsapi.org).

to

Character string that marks the end date of your search. Needs to conform to one of the following lubridate order strings: "ymdHMs, ymdHMsz, ymd". See help for lubridate::parse_date_time. If to is not specified, NewsAPI defaults to the most recent article available.

language

Specifies the language of the articles of your search. Must be in ISO shortcut format (e.g., "de", "en"). See list of all languages using newsanchor::terms_language. Default is all languages.

sort_by

Character string that specifies the sorting variable of your article results. Accepts three options: "publishedAt", "relevancy", "popularity". Default is "publishedAt".

page

Specifies the page number of your results that is returned. Must be numeric. Default is first page. If you want to get all results at once, use get_everything_all from 'newsanchor'.

page_size

The number of articles per page that are returned. Maximum is 100 (also default).

api_key

Character string with the API key you get from newsapi.org. Passing it is compulsory. Alternatively, function can be provided from the global environment (see set_api_key()).

Details

Advanced search (see also www.newsapi.org): Surround entire phrases with quotes (") for exact matches. Prepend words/phrases that must appear with "+" symbol (e.g., +bitcoin). Prepend words that must not appear with "-" symbol (e.g., -bitcoin). You can also use AND, OR, NOT keywords (optionally grouped with parenthesis, e.g., 'crypto AND (ethereum OR litecoin) NOT bitcoin)').

Value

List with two dataframes:
1) Data frame with results_df
2) Data frame with meta_data

Examples

## Not run: 
df <- get_everything(query = "stuttgart", language = "de")
df <- get_everything(query = "mannheim", from = "2019-01-02 12:00:00")

## End(Not run)

Returns all articles from newsapi.org in one data frame

Description

get_everything searches through articles from large and small news sources and blogs. This includes breaking news as well as other regular articles. You can search for multiple sources, different language, or use your own keywords. Articles can be sorted by the earliest date publishedAt, relevancy, or popularity. To automatically download all results, use get_everything_all()

Please check that the api_key is available. You can provide an explicit definition of the api_key or use set_api_key().

Valid languages for language are provided in the dataset

terms_language. To automatically download all results for one search, use get_everything_all

. Please check that the api_key is available. You can provide an explicit definition of the api_key or use set_api_key

For valid searchterms see data(searchterms)

Usage

get_everything_all(
  query = NULL,
  query_in_title = NULL,
  sources = NULL,
  domains = NULL,
  exclude_domains = NULL,
  from = NULL,
  to = NULL,
  language = NULL,
  sort_by = "publishedAt",
  api_key = Sys.getenv("NEWS_API_KEY")
)

Arguments

query

Character string that contains the searchterm for the API's data base. API supports advanced search parameters, see 'details'.

query_in_title

Character string that does the same as above _within the headline only_. API supports advanced search parameters, see 'details'.

sources

Character string with IDs (comma separated) of the news outlets you want to focus on (e.g., "usa-today, spiegel-online").

domains

Character string (comma separated) with domains that you want to restrict your search to (e.g., "bbc.com, nytimes.com").

exclude_domains

Similar usage as with 'domains'. Will exclude these domains from your search.

from

Marks the start date of your search. Must be in ISO 8601 format (e.g., "2018-09-08" or "2018-09-08T12:51:42"). Default is the oldest available date (depends on your paid/unpaid plan from newsapi.org).

to

Marks the end date of your search. Works similarly to 'from'. Default is the latest article available.

language

Specifies the language of the articles of your search. Must be in ISO shortcut format (e.g., "de", "en"). See list of all languages on https://newsapi.org/docs/endpoints/everything. Default is all languages.

sort_by

Character string that specifies the sorting of your article results. Accepts three options: "publishedAt", "relevancy", "popularity". Default is "publishedAt".

api_key

Character string with the API key you get from newsapi.org. Passing it is compulsory. Alternatively, function can be provided from the global environment (see set_api_key).

Value

List with two dataframes:
1) Data frame with results_df
2) Data frame with meta_data

Examples

## Not run: 
df <- get_everything_all(query = "mannheim")
df <- get_everything_all(query = "stuttgart", language = "en")

## End(Not run)

Returns selected headlines from newsapi.org

Description

get_headlines returns live top and breaking headlines for a country, specific category in a country, single source, or multiple sources. You can also search with keywords. Articles are sorted by the earliest date published first. To automatically download all results, use get_headlines_all().

Please check that the api_key is available. You can provide an explicit definition of the key or use set_api_key().

Valid searchterms are provided in the data sets terms_category, terms_country or terms_sources.

Usage

get_headlines(
  query = NULL,
  category = NULL,
  country = NULL,
  sources = NULL,
  page = 1,
  page_size = 100,
  api_key = Sys.getenv("NEWS_API_KEY")
)

Arguments

query

Character string that contains the searchterm.

category

Character string with the category you want headlines from.

country

Character string with the country you want headlines from.

sources

Character vector with with IDs of the news outlets you want to focus on (e.g., c("usa-today", "spiegel-online")).

page

Specifies the page number of your results that is returned. Must be numeric. Default is first page. If you want to get all results at once, use get_headlines_all from 'newsanchor'.

page_size

The number of articles per page that are returned. Maximum is 100 (also default).

api_key

Character string with the API key you get from newsapi.org. Passing it is compulsory. Alternatively, a function can be provided from the global environment (see set_api_key).

Value

List with two dataframes:
1) Data frame with results_df
2) Data frame with meta_data

Examples

## Not run: 
df <- get_headlines(sources = "bbc-news")
df <- get_headlines(query = "sports", page = 2)
df <- get_headlines(category = "business")

## End(Not run)

Returns all headlines from newsapi.org

Description

get_headlines returns live top and breaking headlines for a country, specific category in a country, single source, or multiple sources. You can also search with keywords. Articles are sorted by the earliest date published first. To automatically download all results, use get_headlines_all.

Please check that the api_key is available. You can provide an explicit definition of the api_key or use set_api_key

Valid searchterms are provided in terms_category, terms_country or terms_sources

Usage

get_headlines_all(
  query = NULL,
  category = NULL,
  country = NULL,
  sources = NULL,
  api_key = Sys.getenv("NEWS_API_KEY")
)

Arguments

query

Character string that contains the searchterm

category

Category you want headlines from

country

Country you want headlines for

sources

Character string with IDs (comma separated) of the news outlets you want to focus on (e.g., "usa-today, spiegel-online").

api_key

Character string with the API key you get from newsapi.org. Passing it is compulsory. Alternatively, function can be provided from the global environment (see set_api_key).

Value

List with two dataframes:
1) Data frame with results_df
2) Data frame with meta_data

Examples

## Not run: 
df <- get_headlines_all(query = "sports")
df <- get_headlines_all(category = "health")

## End(Not run)

Returns selected sources from newsapi.org

Description

get_sources returns the news sources currently available on newsapi.org. The sources can be filtered using category, language or country. If the arguments are empty the query return all available sources.

Usage

get_sources(
  category = NULL,
  language = NULL,
  country = NULL,
  api_key = Sys.getenv("NEWS_API_KEY")
)

Arguments

category

Category you want to get sources for as a string. Default: NULL.

language

The langauge you want to get sources for as a string. Default: NULL.

country

The country you want to get sources for as a string (e.g. "us"). Default: NULL.

api_key

String with the API key you get from newsapi.org. Passing it is compulsory. Alternatively, function can be provided from the global environment (see set_api_key).

Value

List with two dataframes:
1) Data frame with results_df
2) Data frame with meta_data

Examples

## Not run: 
get_sources(api_key)
get_sources(api_key, category = "technology")
get_sources(api_key, language = "en")

## End(Not run)

Makes a GET request to News API.

Description

make_newsanchor_get_request makes a GET request to News API.

Usage

make_newsanchor_get_request(url, api_key)

Arguments

url

News API url with query parameters and scheme specified. See build_newsanchor_url.

api_key

News API key.

Value

httr response object.


Parses content returned by query to the News API.

Description

parse_newsanchor_content parses the content sent back by the News API to an R list.

Usage

parse_newsanchor_content(response)

Arguments

response

httr response object

Value

R list.


Sample Response Object

Description

A sample response object generated using 'get_everything'.

Usage

sample_response

Format

An object of class list of length 2.

Details

This response object was mainly created for demonstrating purposes. The data set is used in the "Scrape New York Times Online Articles" vignette. The object was created using the following query.

Value

List with two dataframes:
1) Data frame with results_df
2) Data frame with meta_data

Examples

## Not run: 
response <- get_everything(query   = "Trump",
                           sources = "the-new-york-times",
                           from    = "2018-12-03",
                           to      = "2018-12-09") 

## End(Not run)

Add API key to the .Renviron

Description

Function to set you API Key to the R environment when starting using newsanchor package. Attention: You should only execute this functions once.

Usage

set_api_key(path = stop("Please specify a path."))

Arguments

path

character. Path where the environment is stored. Default is the normalized path.

Value

None.

Author(s)

Jan Dix <[email protected]>

Examples

## Not run: 
set_api_key(tempdir()) # you will be prompted to enter your API key.

## End(Not run)

Checks validity of a category.

Description

stop_if_invalid_category checks whether a given category is valid for News API and stops with an error if this is not the case.

Usage

stop_if_invalid_category(category)

Arguments

category

category to check as a string.


Checks validity of a country

Description

stop_if_invalid_country checks whether a given country is valid for News API and stops with an error if this is not the case.

Usage

stop_if_invalid_country(country)

Arguments

country

country to check as a string.


Checks validity of a language

Description

stop_if_invalid_language checks whether a given language is valid for News API and stops with an error if this is not the case.

Usage

stop_if_invalid_language(language)

Arguments

language

language to check as a string.


Checks validity of a source

Description

stop_if_invalid_source checks whether a given source is valid for News API and stops with an error if this is not the case.

Usage

stop_if_invalid_source(source)

Arguments

source

source to check as a string.


Terms Category

Description

The dataframe 'provides possible categories (e.g., sports) you want to get headlines for. This dataframe is relevant in conjunction with get_headlines.

Usage

terms_category

Format

An object of class data.frame with 7 rows and 1 columns.


Terms Country

Description

This dataframe provides possible countries you want to get news from. This dataframe is relevant in conjunction with get_headlines.

Usage

terms_country

Format

An object of class data.frame with 54 rows and 1 columns.


Terms Language

Description

This dataframe provides possible languages you want to get news for. This dataframe is relevant in conjunction with get_everything.

Usage

terms_language

Format

An object of class data.frame with 14 rows and 1 columns.


Terms Sources

Description

This dataframe provides possible news sources or blogs you want to get news from. This dataframe is relevant in conjunction with get_everything.

Usage

terms_sources

Format

An object of class data.frame with 138 rows and 1 columns.