Github Repo: https://github.com/SYEDIBRAHIMKHALIL/jpmc-financial-fraud-project
JPMorgan Chase Financial Fraud Detection: A Comprehensive Guide
Overview
This blog provides a detailed exploration of a project designed to analyze large financial transaction datasets for detecting potential fraud, as part of the JPMorgan & Chase job simulation experience. The project utilizes a subset of the PaySim dataset, which was originally created for fraud detection research.
Dataset Reference
The dataset used in this project originates from the following research:
- E. A. Lopez-Rojas, A. Elmir, and S. Axelsson: “PaySim: A financial mobile money simulator for fraud detection”. The 28th European Modeling and Simulation Symposium-EMSS, Larnaca, Cyprus, 2016.
The goal of this project is to analyze the dataset, identify fraudulent transactions, and visualize various aspects of the transaction data to better understand patterns and anomalies.
Project Setup
1. Clone the Repository
To start working on this project, first, clone the repository to your local machine:
git clone https://github.com/syedibrahimkhalil/jpmc-financial-fraud-project.git
cd jpmc-financial-fraud-project
2. Set Up a Virtual Environment
Creating a virtual environment helps in managing dependencies and ensures that the project runs in a clean Python environment.
python -m venv venv
Activate the virtual environment:
- On Windows:
venv\Scripts\activate
- On macOS/Linux:
source venv/bin/activate
3. Install Required Libraries
Next, install the required Python libraries using pip:
pip install pandas matplotlib
Usage Guide
This section describes how to use the scripts provided in the repository to analyze the financial transaction data.
Running the Main Script
The main script, task1.py
, performs various data analysis tasks and generates visualizations. Run the script using the following command:
python task1.py
Functionality Overview
Here’s a breakdown of the functions included in the script:
exercise_0(file)
: Reads the dataset CSV file and prints the first few rows.exercise_1(df)
: Returns the list of column names in the DataFrame.exercise_2(df, k)
: Returns the firstk
rows from the DataFrame.exercise_3(df, k)
: Returns a random sample ofk
rows from the DataFrame.exercise_4(df)
: Returns a list of unique transaction types in the DataFrame.exercise_5(df)
: Returns the top 10 transaction destinations by frequency.exercise_6(df)
: Filters and returns rows where fraud was detected.exercise_7(df)
: Returns a DataFrame that counts the number of distinct destinations each source has interacted with, sorted in descending order.visual_1(df)
: Generates bar charts for transaction types and their breakdown by fraud status.visual_2(df)
: Produces a scatter plot for Cash Out transactions, showing the relationship between origin and destination account balance delta.exercise_custom(df)
: Allows custom analysis (e.g., transactions by hour).visual_custom(df)
: Creates a visualization based on the custom analysis.
Code Examples
Here are some code snippets that demonstrate how to use the provided functions.
Example 1: Retrieving Column Names
This example shows how to read the dataset and retrieve the column names.
import pandas as pd
df = pd.read_csv('transactions.csv')
column_names = exercise_1(df)
print(column_names)
Example 2: Visualizing Transaction Types
This example demonstrates how to generate bar charts for transaction types and their breakdown by fraud status.
import matplotlib.pyplot as plt
print(visual_1(df))
plt.show()
Screenshots of Visualizations
Here are some visualizations generated from the dataset analysis:
1. Transaction Types Bar Chart & Transaction Types Split by Fraud Bar Chart
2. Cash Out Transactions Scatter Plot
3. Distribution of Transaction Amounts
Conclusion
This project provides a solid foundation for analyzing financial transaction data with the goal of fraud detection. By following the steps outlined in this blog, you can set up your environment, explore the dataset, and gain insights into transaction behaviors that could indicate fraudulent activities.
Contributing
Contributions to this project are welcome! If you find any issues or have ideas for improvements, feel free to open an issue or submit a pull request.
Contact
If you have any questions or feedback, please reach out:
- Syed Ibrahim Khalil: syedibrahimkhalil@protonmail.com
- Website: www.syedibrahimkhalil.com
- Github: www.github.com/SYEDIBRAHIMKHALIL