Notebooks

Perform Sentiment Analysis

Overview

Perform sentiment analysis using the Athena SDK in Athena Notebooks and produce a CSV output. Utilize Jupyter Notebooks for interactive coding, combining live code, equations, and visualizations.

Setting Up the Environment

SDK Installation

To begin using the Athena SDK and dotenv library for your sentiment analysis project, start by installing these packages. This setup is essential to interact with Athena Intelligence and manage your environment variables.

First, open your Athena Notebook. From the left-side file explorer, select or create a new notebook file. In the main area, click on a code cell and enter the following commands to install the Athena SDK and dotenv library:

plaintext
!pip install athena-intelligence
!pip install python-dotenv

After running these commands, the necessary packages will be installed. You will see output messages indicating the installation status.

For more information about the capabilities and parameters of the Athena SDK, visit the Athena website and navigate to the Developers section. Here, you can explore detailed documentation, guides, and sample notebooks to enhance your understanding of the SDK and its applications.

Install Athena SDK and dotenv

Start by installing the Athena SDK and the dotenv library. You can find more information about the SDK on the Athena website under the Developers section.

Initializing SDK

Next, you need to import the necessary components from these libraries and load your environment variables. Here's how you can accomplish this:

python
from athena_sdk import AthenaClient
from dotenv import load_dotenv
import os

# Load environment variables from a .env file
load_dotenv()

# Initialize the Athena client with your API key
api_key = os.getenv('ATHENA_API_KEY')
client = AthenaClient(api_key)

In the code snippet above, make sure your .env file contains your API key in the format:

Once you have initialized the Athena client, you can start using various SDK functionalities to interact with the Athena Intelligence platform.

The load_dotenv() function loads the environment variables from the .env file, and the AthenaClient is initialized using the API key.

python
ATHENA_API_KEY=your_api_key_here

Performing Sentiment Analysis

Data Collection

To search for news articles related to companies using the Athena SDK, you'll start by setting up a search query and applying filters to get recent articles. Here’s a step-by-step guide.

Once your environment is ready, you can create a search query to find news articles about a specific company. For instance, if you want to search for news about Barclays, your query could be 'Barclays News.'

To focus on the most recent news, you can apply a filter to only return articles published within the last few days. This can be done using the TBS (Time-Based Search) parameter. Set the filter to exclude articles older than three days to ensure you get the latest updates.

Here is an example code snippet to search for news articles about Barclays from the last three days:

python
from athena_sdk import AthenaClient
import os
dotenv.load_dotenv()

# Initialize the Athena client
client = AthenaClient(api_key=os.getenv('ATHENA_API_KEY'))

# Define the search query and filters
company_name = 'Barclays'
query = f'{company_name} News'
date_filter = 'last 3 days'  # TBS parameter

# Execute the search
search_results = client.search_news(query, date_filter=date_filter, max_results=5)
for result in search_results:
    print(result['url'])

In this example, the search_news function is used to perform the search, with the query parameter set to 'Barclays News' and the date_filter set to 'last 3 days' to get recent articles. The max_results parameter limits the number of results to five URLs.

The search_results will contain the list of URLs for the recent news articles about Barclays. You can then iterate over these URLs to analyze their content further or perform sentiment analysis as needed.

This approach can be easily adapted to search for news articles about different companies by modifying the company_name variable and the search query accordingly.

Use Date Filters in Your Search Queries

When conducting sentiment analysis, make sure to include date filters in your search queries. For instance, to find recent articles about a company, you can specify a date range like 'company news within the last three days.' This helps capture the most relevant and timely content for your analysis.

Scraping Content

To scrape content from URLs using the Athena SDK, you need to follow these steps. First, retrieve the list of URLs you want to analyze. These URLs should point to the news articles or web pages containing the content you need for sentiment analysis.

Once you have the URLs, you will iterate over the URL list. For each URL, you will use the Athena SDK to fetch the content. Here's an example of how you can do this in a Jupyter Notebook within Athena Intelligence:

python
import athena
import requests

# Initialize Athena client
client = athena.Client(api_key='your_api_key_here')

# List of URLs to scrape
urls = [
    'http://example.com/article1',
    'http://example.com/article2'
]

# Function to scrape content from a URL
def scrape_url(url):
    try:
        response = requests.get(url)
        response.raise_for_status()  # Check for HTTP errors
        return response.text
    except requests.exceptions.RequestException as e:
        print(f'Error scraping {url}: {e}')
        return None

# Iterate over URLs and fetch content
for url in urls:
    content = scrape_url(url)
    if content:
        # Process content as needed
        print(f'Successfully scraped content from {url}')

In this code snippet, the scrape_url function uses the requests library to fetch the content of each URL. It also includes error handling to manage any issues that may arise during the request, such as network errors or invalid URLs.

When handling problematic URLs, it's essential to implement error handling to skip over these URLs and continue with the next one. This prevents the process from stopping due to a single problematic URL. The requests.exceptions.RequestException in the code above catches most common issues, such as connection errors, timeouts, and invalid responses.

This approach ensures that your scraping process is robust and can handle various issues without interruption. After scraping the content, you can proceed to analyze it using methods such as sentiment analysis.

Sentiment Analysis

Using GPT-4, you can analyze the sentiment of the content by providing a specific prompt. The prompt should be structured as follows:

  • "For the below content, assess whether it's positive, negative, or neutral."

  • "Provide the output as a numeric sentiment score (-1 for negative, 0 for neutral, and 1 for positive)."

  • "Include a summary explaining why the content received that score."

The structured output from GPT-4 will include both a sentiment score and a sentiment summary. This structured format ensures that the results can easily be parsed and incorporated into a CSV file or other formats for reporting and further analysis.

Use a Clear Prompt

When using GPT-4 for sentiment analysis, structure your prompt clearly. For example: 'For the below content, assess whether it's positive, negative, or neutral (-1, 0, 1). Provide a score and a summary explaining the score.'

Exporting Results

To format and export the sentiment analysis results into a CSV file, you will first need to set up the CSV structure. This structure will include the headers: Company, Sentiment Score, Sentiment Summary, and URL.

After initializing the Athena SDK and loading the necessary environment variables, you can create the CSV file using the pandas library in Python. Start by initializing a DataFrame with the columns mentioned earlier. Once the sentiment analysis results are ready, append each result to this DataFrame.

Here is an example of how to set up and export the CSV file:

  1. Create an empty DataFrame with the required columns:

  2. For each company's sentiment analysis result, create a new row with the company's name, sentiment score, sentiment summary, and the corresponding URL.

  3. Once all results are added, export the DataFrame to a CSV file:

This will generate a CSV file named sentiment_score.csv with the sentiment analysis results. The structure of the CSV will have columns for the company name, sentiment score, sentiment summary, and the URL of the analyzed news article.

python
results.to_csv('sentiment_score.csv', index=False)
python
new_row = {
    'Company': 'Example Company',
    'Sentiment Score': 1,
    'Sentiment Summary': 'Positive news event.',
    'URL': 'http://example.com/article'
}
results = results.append(new_row, ignore_index=True)
python
import pandas as pd
results = pd.DataFrame(columns=['Company', 'Sentiment Score', 'Sentiment Summary', 'URL'])

Frequently Asked Questions

  • If you encounter errors, use logging to identify the issue. Review error messages, check if all required packages are installed, and validate API key and environment variables.

  • Yes, you can analyze other data with Athena Intelligence by modifying the search queries and data inputs to match your needs.

  • To get an API key, sign up on the Athena Intelligence website. After registration, you can request an API key from the Developers section.

  • Structured output is organized results, often in a specified format like JSON or CSV, making the data easy to read and process programmatically.