Baseline commands
# Never execute first. Start with identity + metadata.
sha256sum sample
file sample
stat sample
exiftool sample
strings -n 6 sample | less
strings -el sample | less # UTF-16LE / wide strings
binwalk sample
ent sample # quick entropy signal
# Fast grep for obvious indicators
strings sample | grep -iE "http|powershell|cmd\.exe|rundll32|regsvr32|FromBase64String|URLDownloadToFile|flag"
strings -el sample | grep -iE "powershell|http|\.exe|\.dll|Enable-Macros"
What you are trying to answer
# Questions to answer before deeper analysis
- What is the real file type?
- Is it a container? (zip / OOXML / PDF with embedded file)
- Is there scripting? (VBA / JS / PowerShell)
- Is there obfuscation? (base64 / hex / XOR / compression)
- Is there an execution trigger?
- Office: AutoOpen / Document_Open / Workbook_Open
- PDF: /OpenAction / /AA / JavaScript / Launch
- Are there remote references?
- URL / UNC path / template injection / remote object
Red flags
High entropy → packed / encrypted / compressed blob
Wide strings only → often PowerShell / Windows-centric sample
Office file with macros → go straight to oletools
PDF with /JS /Launch → suspicious interaction or exploit path
External template / remote URL → staging or lure document
Many object streams → PDF object hiding / obfuscation
Base64 in wide strings → often EncodedCommand or staged script
Identify the Office family
# Modern Office = OOXML zip container
.docx / .xlsx / .pptx → ZIP-based XML package
.docm / .xlsm / .pptm → ZIP-based + macros allowed
# Legacy Office = OLE compound document
.doc / .xls / .ppt → OLE / CFBF
# Quick identification
file invoice.docm
oleid invoice.docm
zipinfo invoice.docm # OOXML internals if zip-based
7z l invoice.docm # list contents safely
OOXML internals to inspect
# Unpack to inspect XML relationships and embeds
mkdir oo && 7z x sample.docm -ooo
find oo -maxdepth 3 -type f | sort
# Interesting files / paths
[Content_Types].xml
_rels/.rels
word/document.xml
word/_rels/document.xml.rels
word/vbaProject.bin # macros
word/embeddings/ # embedded OLE / package
word/media/ # images / possible lure
customXml/ # staging / hidden data sometimes
# Search relationships / URLs / template injection
grep -RniE "http|https|mhtml|file://|\\\\|template|oleObject|external" oo/
Legacy OLE docs
oleid sample.doc
oledump.py sample.doc
oledump.py -i sample.doc # stream indicators
olemeta sample.doc
# Common suspicious streams
Macros / VBA
ObjectPool
Ole10Native # dropped file / embedded package
\x01CompObj / \x05SummaryInfo
# Extract embedded data / stream
oledump.py -s 8 -d sample.doc > stream.bin
Oletools workflow
oleid sample.docm # macro / encrypted / external refs
olevba sample.docm # dump VBA, compressed streams
olevba --decode sample.docm # decode VBA strings
mraptor sample.docm # macro suspicion scoring
msodde sample.doc # DDE abuse detection
# Search for execution primitives
olevba sample.docm | grep -iE "AutoOpen|Document_Open|Workbook_Open|Shell|CreateObject|WScript\.Shell|PowerShell|cmd\.exe|URLDownloadToFile|XMLHTTP|ADODB\.Stream"
What to look for in macros
# Typical malicious flow
1. Auto* trigger
2. String building / obfuscation
3. Decode blob (base64 / hex / split chars)
4. Write file (ADODB.Stream)
5. Execute via Shell / WMI / rundll32 / regsvr32 / PowerShell
# Suspicious keywords
CreateObject
GetObject
Shell
WScript.Shell
MSXML2.XMLHTTP
WinHttp.WinHttpRequest
ADODB.Stream
Environ
Chr / Asc / StrReverse / Replace / Split
FromBase64String / Base64Decode
# Template injection / remote object clues
attachedTemplate
mhtml:
http://
https://
\\server\share\template.dotm
RTF / lure document notes
# RTF can carry embedded OLE / exploit markers
file sample.rtf
rtfobj sample.rtf
strings sample.rtf | grep -iE "objdata|objclass|equation|ole2link|powershell|cmd"
# If object extracted
rtfobj -s all sample.rtf
foremost -i extracted.bin -o out/
# Phishing-ish signs
- fake invoice / shipping theme
- images urging "Enable Content"
- tiny / hidden OLE object
- one-stage downloader macro
Fast PDF triage
pdfid.py sample.pdf
pdf-parser.py -a sample.pdf
peepdf -f sample.pdf
pdfinfo sample.pdf
qpdf --check sample.pdf
# pdfid flags that deserve attention
/JS /JavaScript
/OpenAction
/AA
/Launch
/EmbeddedFile
/ObjStm
/XFA
/RichMedia
/AcroForm
Object inspection with pdf-parser
# Overview + suspicious objects
pdf-parser.py -a sample.pdf
pdf-parser.py --search javascript sample.pdf
pdf-parser.py --search OpenAction sample.pdf
pdf-parser.py --search EmbeddedFile sample.pdf
# Inspect one object and decode streams
pdf-parser.py -o 12 sample.pdf
pdf-parser.py -o 12 -f sample.pdf # apply filters
pdf-parser.py -o 12 -w sample.pdf # dump raw stream
pdf-parser.py -o 12 -d obj12.bin sample.pdf
What specifically to inspect
# Common malicious / suspicious features
/OpenAction # auto action on open
/AA # additional actions
/JS /JavaScript # embedded JS
/Launch # command or file launch
/EmbeddedFile # dropped content / lure
/ObjStm # object streams hide objects
/XFA # interactive forms, often abused
/URI # external URL
/SubmitForm # exfil / callback possibility
# Quick grep after dumping
strings sample.pdf | grep -iE "/JS|/OpenAction|http|cmd|powershell|launch|uri"
Embedded files & attachments
# qpdf and mutool can help rebuild / inspect
mutool show sample.pdf trailer
mutool show sample.pdf 12
mutool extract sample.pdf # extract embedded files when possible
pdfdetach -list sample.pdf
pdfdetach -saveall sample.pdf
# Then triage extracted payloads
file extracted/*
sha256sum extracted/*
oledump.py suspicious.bin # if extracted object is OLE
JavaScript in PDF
# Dump JS-ish streams and beautify
pdf-parser.py --search /JS sample.pdf
pdf-parser.py -o 23 -f -d js23.bin sample.pdf
js-beautify js23.bin > js23.pretty.js
# Things to search for
app.launchURL
this.submitForm
exportDataObject
util.printf
unescape
fromCharCode
replace / split / reverse
heap spraying patterns
# PDFs also hide data in encoded streams
FlateDecode
ASCIIHexDecode
ASCII85Decode
RunLengthDecode
Practical PDF workflow
pdfid.py sample.pdf
pdf-parser.py -a sample.pdf
pdf-parser.py --search OpenAction sample.pdf
pdf-parser.py --search JS sample.pdf
pdfdetach -saveall sample.pdf
mutool extract sample.pdf
strings sample.pdf | grep -iE "http|powershell|cmd|OpenAction|Launch"
# If JS or embedded file appears, pivot to that artifact
Find PowerShell hidden in files
strings sample | grep -i powershell
strings -el sample | grep -i powershell
grep -aoiE "-enc|-encodedcommand|FromBase64String|IEX|Invoke-Expression" sample
# Wide / OOXML / macro outputs often contain UTF-16LE text
strings -el sample | grep -iE "FromBase64String|New-Object|DownloadString|WebClient|Invoke-Expression"
Decode common EncodedCommand
# PowerShell -enc is usually UTF-16LE bytes then base64
echo 'BASE64_HERE' | base64 -d | iconv -f UTF-16LE -t UTF-8
# Save from strings output and decode
cat blob.b64 | tr -d '\n' | base64 -d | iconv -f UTF-16LE
# If not UTF-16LE, try raw UTF-8
echo 'BASE64_HERE' | base64 -d
Typical script deobfuscation steps
# Common stages
- Join split strings
- Replace [char] / Chr() sequences
- Decode base64 / hex
- Reverse strings
- Gzip / Deflate decompress
- Find final command or URL
# Useful patterns to grep
FromBase64String
GZipStream
DeflateStream
IEX
DownloadString
Invoke-WebRequest
Start-BitsTransfer
Reflection.Assembly
# If script is one line, reformat before reasoning
sed 's/;/;\n/g' script.ps1 | less
Archives / containers
7z l sample
7z x sample
unzip -l sample.zip
tar -tf sample.tar
cabextract sample.cab
Carving / embedded content
binwalk -e sample
foremost -i sample -o carve/
bulk_extractor -o bulk/ sample
Encodings
base64 -d
xxd -r -p
python3 -c "import zlib,sys;print(zlib.decompress(sys.stdin.buffer.read()))"
Useful greps
strings sample | grep -E "http[s]?://|\\\\[A-Za-z0-9._$-]+\\|[0-9]{1,3}(\.[0-9]{1,3}){3}"
strings sample | grep -iE "user-agent|cookie|api|token|cmd|rundll32|schtasks|reg add|mshta|wscript|cscript"
strings -el sample | grep -iE "http|powershell|download|template|payload|dll"
YARA / signatures
yara rules.yar sample
yara -r rules/ sample_dir/
# Good pivot targets for YARA later
URLs / domains
macro keywords
PDF JavaScript API names
PowerShell decode primitives
known mutex / paths / strings
Section-wise interpretation
# High confidence suspicious combinations
Office macro + AutoOpen + URLDownloadToFile
PDF /OpenAction + /JS + /EmbeddedFile
Wide strings + -enc + FromBase64String
OOXML relationship to remote template
RTF with embedded OLE + powershell strings
# Low confidence alone
High entropy only
Long base64 only
External URL only
One suspicious keyword only
Minimal safe dynamic approach
# Use a disposable VM snapshot, no production system
- snapshot first
- disable shared folders
- isolate network or use controlled gateway
- record hashes before / after
# Linux payloads
strace -f -o strace.log ./sample
ltrace -o ltrace.log ./sample
# Document-focused dynamic checks belong in Windows lab, not REMnux
- open suspicious Office/PDF only in disposable VM
- capture process tree, file writes, net connections
REMnux-friendly pivot tools
inetsim # fake services / sinkhole
tcpdump -i any -w run.pcap
fakenet-ng # if available in your lab workflow
python3 -m http.server 8000 # simple callback sink only
# Dynamic analysis goals
- confirm URL / C2 path
- confirm dropped file names
- confirm exact PowerShell line
- confirm persistence attempt