User Tools

Site Tools


data-analysis

Differences

This shows you the differences between two versions of the page.

Link to this comparison view

Both sides previous revisionPrevious revision
Next revision
Previous revision
Last revisionBoth sides next revision
data-analysis [2023/04/15 11:55] – [Matplotlib] dblumedata-analysis [2024/01/15 09:06] – [GnuPlot] dblume
Line 25: Line 25:
 Here's an example command given the following two files, data.csv and gnuplot_instructions.gpi Here's an example command given the following two files, data.csv and gnuplot_instructions.gpi
  
-  gnuplot -e "f='data.csv'; t='Rating'" gnuplot_instructions.gpi+  gnuplot -e "f='data.csv'" gnuplot_instructions.gpi
  
 <file csv data.csv> <file csv data.csv>
 +date,col1,col2
 1992-01-01,5,14 1992-01-01,5,14
 1992-02-01,4,15 1992-02-01,4,15
Line 58: Line 59:
 set xdata time set xdata time
 set xlabel 'Date' set xlabel 'Date'
-set ylabel 'Value'+set xtics "1992-01-01", 2629746  # start, increment in seconds 
 +#set ylabel 'Value'
  
 # #
Line 70: Line 72:
 # #
 set datafile sep ',' set datafile sep ','
 +set key autotitle columnhead  # use the first line as title
 +firstrow = system('head -1 '.f. ' | tr "_," " "')
 +set xlabel word(firstrow, 1)
 +set ylabel word(firstrow, 2)
  
 # #
Line 77: Line 83:
 # #
 #plot f using 1:4 with lines, f using 1:3 with linespoints #plot f using 1:4 with lines, f using 1:3 with linespoints
-#plot f using 1:2 with lines title t, f using 1:3 with linespoints title 'Legend 2' +#plot f using 1:2 with lines, f using 1:3 with linespoints title 'Legend 2' 
-plot f using 1:2 with linespoints title t+plot f using 1:2 with linespoints 
 +</file> 
 + 
 +If you're making a "histogram" (actually a box chart with histogram style on X,Y points)... 
 +<file bash gnuplot_instructions.gpi> 
 +# Mostly the same as above, until... 
 + 
 +# Set your X axis format 
 +set style histogram clustered gap 1 
 +set style fill solid border -1
  
 +# Finally, plot with boxes
 +plot f using 1:2 with boxes
 </file> </file>
  
Line 91: Line 108:
         * NumPy: Fundamental, the other projects rely on it.         * NumPy: Fundamental, the other projects rely on it.
         * Matplotlib: Matplotlib is a plotting library for the Python programming language and its numerical mathematics extension NumPy.         * Matplotlib: Matplotlib is a plotting library for the Python programming language and its numerical mathematics extension NumPy.
 +    * Plotly: Generates interactive Javascript plots
  
 Here's [[https://github.com/Manishms18/Air-Passengers-Time-Series-Analysis/blob/master/Air_Passenger_with_explanations.ipynb|an example analysis of AirPassengers over time]]. Here's [[https://github.com/Manishms18/Air-Passengers-Time-Series-Analysis/blob/master/Air_Passenger_with_explanations.ipynb|an example analysis of AirPassengers over time]].
Line 111: Line 129:
  
 **Get this**. NumPy is the fundamental package for scientific computing in Python. It is a Python library that provides a multidimensional array object, various derived objects (such as masked arrays and matrices), and an assortment of routines for fast operations on arrays, including mathematical, logical, shape manipulation, sorting, selecting, I/O, discrete Fourier transforms, basic linear algebra, basic statistical operations, random simulation. **Get this**. NumPy is the fundamental package for scientific computing in Python. It is a Python library that provides a multidimensional array object, various derived objects (such as masked arrays and matrices), and an assortment of routines for fast operations on arrays, including mathematical, logical, shape manipulation, sorting, selecting, I/O, discrete Fourier transforms, basic linear algebra, basic statistical operations, random simulation.
 +
 +==== Plotly ====
 +
 +Undecided whether to use this. See [[https://towardsdatascience.com/matplotlib-vs-plotly-lets-decide-once-and-for-all-dc3eca9aa011|Matplotlib vs. Plotly: Let’s Decide Once and for All]]. Need to experiment. plotly.py is an interactive, open-source, and JavaScript-based graphing library for Python. Built on top of plotly.js, plotly.py is a high-level, declarative charting library that includes over 30 chart types, including scientific charts, 3D graphs, statistical charts, SVG maps, financial charts.
  
 ==== Matplotlib ==== ==== Matplotlib ====
Line 130: Line 152:
 | # | Set data as integer format | | # | Set data as integer format |
 | . | Create scatterplot | | . | Create scatterplot |
 +
 +{{:general:visidata_airpassengers.png?400|}}
  
 **Pros**: Super fast and easy. **Pros**: Super fast and easy.
Line 146: Line 170:
 set xdata time set xdata time
 set format x '%Y' set format x '%Y'
-set xlabel 'Year' +set key autotitle columnhead  # use the first line for titles in legend 
-set ylabel 'Passengers'+set xlabel 'Year'             # except for X, where we show Year not Month
 set datafile sep ',' set datafile sep ','
  
 # You can use: lines, points, linespoints # You can use: lines, points, linespoints
-plot 'AirPassengers.csv' using 1:2 with lines title 'Airline Passengers'+plot 'AirPassengers.csv' using 1:2 with lines
 </file> </file>
  
Line 158: Line 182:
 {{:general:airpassengers.png?400|}} {{:general:airpassengers.png?400|}}
  
-**Pros**: Fast and easy. Render to text or png pretty easily. +When you change ''term'' to dumb, then depending on your terminal size you get output like: 
-**Cons**: Not that pretty. GPI file takes some tweaking.+ 
 +<code> 
 +700 +-------------------------------------------------------------------------------+ 
 +    |      +          +      +          +      +          +      +          | 
 +    |                                                           #Passengers ******* | 
 +    |                                                                            *  | 
 +600 |-+                                                                          *+-| 
 +    |                                                                      *    * * | 
 +    |                                                                     **    * * | 
 +500 |-+                                                                 **    * *-| 
 +    |                                                              **    * *    * * | 
 +    |                                                        *     **    * *    * | 
 +    |                                                       * *    **    *  ***    *| 
 +400 |-+                                               **    * *    **     ***   +*| 
 +    |                                          *     * *    * *  **  * **         | 
 +    |                                          **    * *  *** *****  ***            | 
 +    |                                          **   **  ***    * *    *             | 
 +300 |-+                                  *     * ****   ***                       +-| 
 +    |                             **    * *  **  ***    *                           | 
 +    |                      **   ** *  *** ****    *                                 | 
 +200 |-+              *     **  *    ***    *                                      +-| 
 +    |         **  * * * ***  **     * *                                             | 
 +    |  **   ** * ***  **                                                            | 
 +    |*********  **      +      +          +      +          +      +          | 
 +100 +-------------------------------------------------------------------------------+ 
 +  1949   1950  1951   1952   1953  1954   1955   1956  1957   1958   1959  1960   1961 
 +                                          Year 
 +</code> 
 + 
 + 
 + 
 +**Pros**: Fast and easy. Render to text or png pretty easily. Sometimes better text renderings than VisiData
 +**Cons**: Not that pretty without customizations. GPI file takes some tweaking.
  
 ===== MatPlotLib ===== ===== MatPlotLib =====
  
-stuff+<code python> 
 +import pandas as pd 
 +data = pd.read_csv('AirPassengers.csv'
 +data['Month'] = pd.to_datetime(data['Month']) 
 +data = data.set_index(['Month']) 
 + 
 +import matplotlib.pylab as plt 
 +plt.figure(figsize=(10,5)) 
 +plt.xlabel("Year"
 +plt.ylabel("Airline Passengers"
 +plt.plot(data) 
 +plt.show() 
 +</code> 
 + 
 +{{:general:matplotlib_airpassengers.png?400|}} 
 + 
 +**Pros**: Theres [[https://github.com/Manishms18/Air-Passengers-Time-Series-Analysis/blob/master/Air_Passenger_with_explanations.ipynb|so much more you can do]]. 
 +**Cons**: Heavyweight.
data-analysis.txt · Last modified: 2024/05/06 22:22 by dblume