JPMorgan Chase Financial Fraud Detection

Github Repo: https://github.com/SYEDIBRAHIMKHALIL/jpmc-financial-fraud-project

JPMorgan Chase Financial Fraud Detection: A Comprehensive Guide

Overview

This blog provides a detailed exploration of a project designed to analyze large financial transaction datasets for detecting potential fraud, as part of the JPMorgan & Chase job simulation experience. The project utilizes a subset of the PaySim dataset, which was originally created for fraud detection research.

Dataset Reference

The dataset used in this project originates from the following research:

  • E. A. Lopez-Rojas, A. Elmir, and S. Axelsson: “PaySim: A financial mobile money simulator for fraud detection”. The 28th European Modeling and Simulation Symposium-EMSS, Larnaca, Cyprus, 2016.

The goal of this project is to analyze the dataset, identify fraudulent transactions, and visualize various aspects of the transaction data to better understand patterns and anomalies.

Project Setup

1. Clone the Repository

To start working on this project, first, clone the repository to your local machine:

git clone https://github.com/syedibrahimkhalil/jpmc-financial-fraud-project.git
cd jpmc-financial-fraud-project

2. Set Up a Virtual Environment

Creating a virtual environment helps in managing dependencies and ensures that the project runs in a clean Python environment.

python -m venv venv

Activate the virtual environment:

  • On Windows: venv\Scripts\activate
  • On macOS/Linux: source venv/bin/activate

3. Install Required Libraries

Next, install the required Python libraries using pip:

pip install pandas matplotlib

Usage Guide

This section describes how to use the scripts provided in the repository to analyze the financial transaction data.

Running the Main Script

The main script, task1.py, performs various data analysis tasks and generates visualizations. Run the script using the following command:

python task1.py

Functionality Overview

Here’s a breakdown of the functions included in the script:

  • exercise_0(file): Reads the dataset CSV file and prints the first few rows.
  • exercise_1(df): Returns the list of column names in the DataFrame.
  • exercise_2(df, k): Returns the first k rows from the DataFrame.
  • exercise_3(df, k): Returns a random sample of k rows from the DataFrame.
  • exercise_4(df): Returns a list of unique transaction types in the DataFrame.
  • exercise_5(df): Returns the top 10 transaction destinations by frequency.
  • exercise_6(df): Filters and returns rows where fraud was detected.
  • exercise_7(df): Returns a DataFrame that counts the number of distinct destinations each source has interacted with, sorted in descending order.
  • visual_1(df): Generates bar charts for transaction types and their breakdown by fraud status.
  • visual_2(df): Produces a scatter plot for Cash Out transactions, showing the relationship between origin and destination account balance delta.
  • exercise_custom(df): Allows custom analysis (e.g., transactions by hour).
  • visual_custom(df): Creates a visualization based on the custom analysis.

Code Examples

Here are some code snippets that demonstrate how to use the provided functions.

Example 1: Retrieving Column Names

This example shows how to read the dataset and retrieve the column names.

import pandas as pd

df = pd.read_csv('transactions.csv')
column_names = exercise_1(df)
print(column_names)

Example 2: Visualizing Transaction Types

This example demonstrates how to generate bar charts for transaction types and their breakdown by fraud status.

import matplotlib.pyplot as plt

print(visual_1(df))
plt.show()

Screenshots of Visualizations

Here are some visualizations generated from the dataset analysis:

1. Transaction Types Bar Chart & Transaction Types Split by Fraud Bar Chart

2. Cash Out Transactions Scatter Plot

3. Distribution of Transaction Amounts

Conclusion

This project provides a solid foundation for analyzing financial transaction data with the goal of fraud detection. By following the steps outlined in this blog, you can set up your environment, explore the dataset, and gain insights into transaction behaviors that could indicate fraudulent activities.

Contributing

Contributions to this project are welcome! If you find any issues or have ideas for improvements, feel free to open an issue or submit a pull request.

Contact

If you have any questions or feedback, please reach out:


Leave a Comment

Your email address will not be published. Required fields are marked *

Scroll to Top